Connecting Pentaho+ With SAP Systems
Pentaho+ with SAP Connectivity using RFC and BAPI
Business Need
When it comes to ERP, many great companies have committed to SAP system.
SAP system consists of a number of fully integrated modules, which covers virtually every aspect of the business management.
So the complete enterprise data is stored in SAP and now companies need to translate data into information to plan for future business strategies and decisions.
Combining SAP with business intelligence tools like Pentaho+ can create real added value for client business.
So, is it possible to connect Pentaho+ with SAP system to extract data using ETL process?
Solution
We can use Pentaho+ Data Integration ETL tool to connect to SAP systems and extract data using RFC call or BAPI functions.
A user can thereby integrate the BAPI into Pentaho’s extract-transform-load (ETL) process to get to the SAP data.
ETL is a method to transform extracted data of various formats to fit operational needs on a target database or data warehouse.
Steps
SAP Input Step
This SAP Input step in Pentaho+ Data Integration calls functions on SAP systems and can load tables via the function RFC_READ_TABLE or customized functions like BAPI.
Prerequisites
We need to get sapjco3.jar and sapjco3.dll from the marketplace and copy these files into the lib folder of data-integration.
Below-mentioned is the description of each field in SAP Input step
Step name The name of this step as it appears in the transformation workspace.
Connection The SAP system properties need to be defined in a connection with the SAP Database.
Function ThDefine the function name to call on the SAP system.
Input Define the Input parameters of the SAP function.
Output Define the Output parameters of the SAP function.
Sample RFC/ZRFC Function Usage In SAP Input Step
Remote Function Call (RFC) is the standard SAP interface for communication between SAP systems. Data transactions are not limited to getting data from the server, but can insert data into server records as well. SAP can act as the Client or Server in an RFC call.
Here ZRFC_READ_TABLE function is used to return data from a table which we provide as input.
Here field denotes the input parameter name while SAP Parameter Name is the name of the parameters used by default in ZRFC SAP function.
In the input section, we need to provide inputs such as
DELIMITER - The output data will be separated based on the delimiter we provide as input. Eg : ";" NO_DATA - If the output data has null values then it will be replaced with the value provided in NO_DATA. Eg : "NA" QUERY_TABLE - This parameter defines the name of the SAP table. Eg : "BSEG ROWCOUNT - This parameter is limit the number to rows to be returned in output ROWSKIPS - By default set as 0 FIELDNAME - We need to provide the list of columns and its data that needs to be returned from the SAP table. FILTER - We can define the Filter conditions to limit the data retrieval. Eg : PERIOD EQ '2013'
WA is the output field which returns the table data in single column seperated by the delimiter provided in input parameter
Sample BAPI Function Usage In SAP Input StepM
BAPI (Business Application Programming Interface) is a set of interfaces to object-oriented programming methods that enable a programmer to integrate third-party software into the proprietary R/3 product from SAP.
In the input section we need to provide input conditions to restrict the output data that matches to input conditions.
Based on the conditions provided in Input, the output section will retrieve the list of columns and its respective data.
Common Problems
Most of the time when ETL runs with SAP system, we would face communication link failure. So make sure the connectivity between SAP and Pentaho Plus ETL system is good.
Sometimes due to high data volume in SAP system, Pentaho+ ETL process would consume longer duration to fetch data from SAP system. Make sure to increase the memory heap space of Pentaho+ PDI and also use better hardware infrastructure.
The user password often expires at regular intervals. Make sure there is provision for external configuration of user credentials so that we do open the ETL code often to change the credentials.