Auditing Pentaho Setup using PDI

- Pentaho

Problem statement:

One cannot simply upgrade your Pentaho to a higher version just like that. It takes detailed effort on each phase of the upgrade. Any simple mistake can cause enormous losses for your company. Even though we know the general steps to be followed during upgrade we might miss a few key points. In effect, we cannot downgrade your version and retrieve your data/config which you have lost due to small negligence.

Apart from that we also need to do the auditing before the upgrade. This can be a tiresome process for people who are not aware of the architecture. How can we identify those criteria? How can you pass in your auditing of the Pentaho setup? Is it easy for a beginner to do this? You can find a solution to all these queries here.

Solution :

With regard to the above problem, we came up with “Chimney” – a detailed report on what changes to be done during the upgrade with efforts and suggestions. This helps you to calculate your possibility for a smooth version upgrade. Gives you an idea on your What? Why? How? Questions.

What Chimney Does?

A dynamic report will be generated based on the customer with the following category taken into consideration.

  • Infrastructure

  • Authentication

  • Pentaho Repository and Security

  • Pentaho Data Integration (PDI)

You can expect the answers to the following questions on Chimney.

  • How much effort needed to do this change?

  • What are the skills required to proceed?

  • What will happen if do not take this into consideration?

  • What challenges will I face when I proceed with this idea?

How Chimney Does?

Now let’s dive into some technical stuff on how we get those details in the form of a report. Do not worry as we will cover only a few topics here. For more details please visit our website.

1: Depreciated Steps

What do you mean by the term Depreciated? – In Pentaho upon every upgrade, they will be few enhancements in the steps as well as removing them to replace it with an alternate step. These steps are called as Depreciated Steps. Now your mind would raise a question as ‘ How to find them? ‘

We have the list of steps you need to zoom in while you upgrade to the latest version.

Depreciated/Changed steps
Avro input SAP input XML Input
Text file input OpenERP object delete Example step
Greenplum Bulk Loader OpenERP object input Text file output
Aggregate Rows Step OpenERP object output IBM Websphere MQ Producer
Get Previous Row Fields Palo cell input JMS Consumer
Google Analytics Input Palo cell output JMS producer
LucidDB Bulk Loader Palo dim input IBM Websphere MQ Consumer
Streaming XML Input Palo dim output Script

Compare them against the list of steps present in the transformation file (.ktr) we can get the depreciated steps available if any.

Solution :

Below KTR helps you to find the Depreciated step if you have used any. You can provide the directory path in the XML input step which helps you to read all the KTR files and list the name of KTR and step which needs to be looked into.

img 1

This is the base transformation for all the auditing we do while upgrading Pentaho with respect to the PDI section.


Challenges :

  • We cannot use the alternative step every time as the logic differs.

  • The old configuration details cannot be completely used in the new one.

  • We might need to change the flow of the transformation in some cases like MQ producer and MQ Consumer.

2: Metadata Injection Support

One of the key features of Pentaho is the Metadata Injector step. This was initially supported by only a few steps in the transformation. Due to this many difficulties were faced at the time of designing a dynamic architecture. Later on, Pentaho introduced this facility for many steps upon every upgrade.

But our concern lies in what are those steps? How can we use that to make our existing solution/logic dynamic?

Solution :

For all those above questions we have the solution in the previous KTR where we can filter out the steps we needed. In addition, it helps you to find the step which does support Metadata injection before but the feature added in the latest version.

For example, the below step ‘String Cut’ do not support Metadata Injection in version 7.1 but it does in the latest version 8.3. You can use this feature to dynamically cut strings with different logic under different scenarios.

Challenges :

  • Even after using Metadata Injection we can see some performance issue.

  • Not all the Metadata injection can be used for making your KTR dynamic.

  • There are few steps where Metadata injection is allowed but not for all fields in it.

3: Authentication Provider

In this final part of the blog, we will see the role of an Authentication provider in Pentaho Upgrade. Every time we upgrade we will have the below default configuration.

  • – Default security Provider – Jackrabbit

  • ApplicationContext-spring-security.xml – Authentication provider – anonymousAuthenticationProvider” (or) “daoAuthenticationProvider”

This is the least level of security that Pentaho provides but most of the use-case does not handle the same. We do customization to it and get enhanced authentication.

We need to identify those customizations and suggest the changes as well as the impacts on the existing one.

Solution :

The below KTR helps you to find the existing configuration in your system and compare against the default one and produce the following result.

  • Impacts – (Example – Less privacy on the repository as it is folder/file-based)

  • Suggestions – (Example – Use Multi Authentication for enhanced security)

  • Efforts – ( Example – Total time required for the upgrade is 3-5 weeks)

  • Benefits – (Example – Isolated user/role management as per multi-authentication provider)authproviderpng


sample report

For more details, do visit our website…………….