Pentaho Data Services Made Easy

TENTHPLANET BIG DATA ANALYTICS BLOG web Data Services in Pentaho Regular

Data Services in Pentaho

1. INTRODUCTION

Data Service refers to a service provider that enables data access on-demand to users regardless of their geographic location. The information is stored in the Server/Cloud and is accessible by a wide range of systems and devices. Data services can eliminate redundancy and streamline costs by housing critical data in one location, enabling the data to be accessed and/or updated by multiple users while ensuring a single point for updates. Data services in IT is a term for third-party services that help to manage data for clients. Many uses of this term involve services that are also called “data as a service” (DaaS) – these are Web-delivered services offered by cloud vendors that perform various functions on data.

2. HOW DO DATA SERVICES WORK?


When combined with data virtualization, data services provide an abstraction layer from the details of stored data. Data virtualization provides the storage platform while data services do the programmatic work of retrieving data from the platform. Data services automate the work of locating heterogeneously-stored data and provide developers and data analysts with simple programmatic tools to find and extract the data they need with little effort. In an application, data services act as middleware, independently finding and delivering data to the application requests. Data services are essentially web services for data.

3. WHAT ARE THE BENEFITS OF DATA SERVICES?

By making it easy to find and deliver data from anywhere, we can choose storage that is cost-effective and convenient to maintain. Once created, data services are reusable, making it possible for the organization to save a great deal of time on future development.

4. PENTAHO – DATA SERVICE IMPLEMENTATION

A Pentaho Data Service is a virtual table that contains the output of a PDI transformation. You can connect to and query a regular Pentaho Data Service from any Pentaho tool, such as Report Designer, the PDI client (Spoon), Analyzer and CTools as well as other compatible tools like RStudio, DBVisualizer, or SquirreL.

  • We must have a Pentaho Server and repository to publish the data service.

4.1 Creating a Regular Pentaho Data Service

  • We can create Pentaho Data Service to any step in a transformation.

The above transformation shows that the Data Service is created to the Output of the Final Output.

  • To create a Pentaho Data Service to any Step imply the following,

Right click on the Step, then click on the “Data Services”, Click “New..”

Then the following window will appear.

  • Service Name: Give a Unique Data Service Name. No other data service name should be the same as published to Pentaho Server.

  • Output Step: Select the Step for which the Data Service should provide.

  • Data Service Type: Select “Regular” or “Streaming”. A streaming data service is commonly used when creating a streaming data dashboard with CTools .

  • Service Cache: Adjusts how long data results are cached. Consider using this technique if either of the following applies. Your result set contains modest data sizes or You query Big Data sources. Increasing the cache duration can help subsequent follow-on queries run more quickly.

  • Query Pushdown: Consider using this technique if both of the following apply. Your transformation contains the Table Input or MongoDB Input steps. You are using simple or complex WHERE clauses that include AND, IN, or other specific operators in your query. Limits for the WHERE clause construction appear in Pentaho Data Service SQL Support Reference and Other Development Considerations.

  • Parameter Pushdown: Your transformation contains any step that should be optimized, including input steps like REST where a parameter in the URL could limit the results returned by a web service. You do not use more complex WHERE clauses in your query that might contain IN or OR keywords.

  • Driver Details: While Connecting the Data Service to the Non-Pentaho Tool. Download this driver and install within the tool.

Click “OK”, Data Service will be created.

4.2 Test a Data Service

  • Follow the below steps to open the test window

Click on the Test… The below window will open…

Execute SQL to run the query, Apply Filter and test the results. “Data Seurity_Virtualization” is the name of the Data Service created.

Max Rows: Select 100, 500, 1000. The Number of rows to be viewed.

4.3 Publish Data Service to the Pentaho Server

  • Connect the PDI to the Pentaho Repository or Pentaho Server.

  • Save the Transformation that containing the Data Service into the Pentaho Server.

  • Saved .ktr files were found in the server as follows

  • Create a pentaho Data Service Connection in the PDI as follows

  • Connection Name: Give a name for the connection

  • Connection Type: Pentaho Data Services

  • Access: Native (JDBC)

  • Settings: Host Name – IP Address of the Pentaho Server

Port Number – 8080 default or check with Admin

Web App Name – pentaho or pentaho-DI

User Name: check with Admin

Password: check with Admin

  • Test the PDI connection in the Pentaho Server as follows

    • Click “Manage Data Sources” and click on the PDI Connection Created, then click the Edit Button and click Test.

  • Connection to Data Base Succeeded.

4.4 Access the Data Service in the Pentaho Tool

  • Drag a Table Input and provide the Pentaho Data Service Connection created

  • Click “Get SQL Select Statement” and access the required Data Service created.

  • Preview the Data. Apply Filter Condition and test the results.

5. Disadvantages of Data Service:

For instance, companies worry about what happens if the provider’s service goes down and the speed of the server.