How to use R-Language in Pentaho+ Data Integration(PDI)

TenthPlanet-Pentaho-BDA-BlogBanner-How-to-use-R-Language-Web

R is a programming language for statistical computing and graphics. The language provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification and clustering) processes and is considered to be highly extensible. Since, R is an open source language, the ability for choice of research is very high, with various libraries supporting to its cause. Also, R is one of the many preferred languages used by data scientists worldwide. Can create a subset of larger data in Pentaho+ Data Integration using R’s machine learning libraries.

Pentaho+ Data Integration and R-Language

Pentaho+ is a Business Intelligence (BI) software that provides Pentaho+ Data Integration as part of one of its services. Using Pentaho Plus Data Integration (PDI) once can access, prepare and blend data faster, it also provides seamless orchestration for building data pipeline services.

It also provides capabilities to operationalize R-Language so that advanced statistical computing and machine learning models can be integrated on the fly. R-Language can be integrated in Pentaho Plus Data Integration via two ways;

  • R-Script Executor (Pentaho+ Professional Edition)
  • Execute R-Script (Plugin for Pentaho+ Community Edition)

While R-Script Executor can be used for both manual coding and script load, the Execute R-Script plugin can be used only to execute R-Script from the path provided.

Integrating R in Pentaho+ Data Integration(PDI)

To integrate R-Language in Pentaho+ Data Integration(PDI), the following steps need to followed;

  • Install the R-Language
  • Set ‘JAVAHOME’ path (the path to your JAVA installation, if using Linux or Mac Systems)
  • Set the R Environment Variables;
    • R_HOME (Path to the root directory of your R installation)
    • R_LIBS_USER (Path to the directory where R installs your packages)
    • PATH (Append the PATH variable with the directory that contains the R executable)
  • Restart DI Server and Spoon so that the environment changes take effect
  • Configure Spoon with rJava (jri.dll)

Once, the above steps are done, you can restart the Pentaho Plus Data Integration Server and Spoon once again and start a new transformation to look whether the changes have taken place.