Querying Google BigQuery from Pentaho Report Designer

tenthplanet blog pentaho Querying Google BigQuery from Pentaho Report Designer

Connecting Pentaho Report Designer to Google Big Query

1. Pre-requisites

  • The following Pre-Requisites are in place to enable connectivity of Pentaho Report Designer 5.4 to Google Big Query.
  • Pentaho Report Designer 5.4 or higher and optional Pentaho BA Server deployed and running
  • BigQuery JDBC Drivers & Dependency JARs
  • Google Account
  • API Access to Google Big Query
  • Google Chrome Browser

2. Enable Google BigQuery API

Pre-requisites

To enable Google BigQuery for a prototype the following pre-requisites are applied:

✓A Google Account is available

✓Credit Card details can be used and entered for billing of BigQuery Usage (linked to the Google Account)

Enable Google BigQuery API

1. Enabling the Google BigQuery API can be achieved via the following steps:

2. Connect with the browser to https://code.google.com/apis/console or https://console.developers.google.com/apis/library

3. Login with the Google Account that will be used for the connectivity

4. The Google API Project main page will be shown

5. Click on “BigQuery API”

6. There are limitations in consuming the BigQuery API, as it costs resources of Google Cloud Storage. Review the Pricing for BigQuery as per your needs, in

7. Sometimes, it will take few minutes or an hour to get activated (as it involves pricings and approvals of financial transactions)

Billing for Google BigQuery

Google BigQuery are subject to be charged based on consumption. Pricing or any related doubts you can verify in below URLs.

✓Free Trial: https://cloud.google.com/free-trial/?hl=en_US

✓Pricing Details: https://cloud.google.com/pricing/

✓FAQs: https://cloud.google.com/free-trial/?hl=en_US#faq

✓Contact Sales: https://cloud.google.com/contact/

Billing can be enabled for a Google Account by visiting URL

https://console.cloud.google.com/billing

This can be enabled by another way of Google BigQuery ‘Try it Free’ option, a 3 step process.

3. Validating Google BigQuery

To ensure the Google BigQuery API is successfully activated, a simple test can be executed via the Google BigQuery Web Interface. Google provides a set of samples that can be used for the validation of the BigQuery connectivity. To validate the activation of the BigQuery API for the account defined in the previous section, navigate to https://bigquery.cloud.google.com/ Check between different browsers, if you face any inconsistency in Web UI interface.

Sample steps to validate the activation of the Google BigQuery API:

1. Using Chrome Browser, navigate to https://bigquery.cloud.google.com/

2. Login with the Google Account registered for the BigQuery API

3. Click or Expand item “Publicdata:samples” and select a sample database

4. Choose table ‘natality’. You can see the list of fields in the table in the right pane of the window.

5. You can see results in the results pane at the bottom. Note, these query running will result in charges in your account. All is fine to connect Google Big Query from external client apps.
Installation

4. Installation of Google BigQuery JDBC Drivers

Visit URL https://code.google.com/p/starschema-bigquery-jdbc/downloads/list . You can see the list of downloads available for different tools.

Please select the ‘bqjdbc-1.4.jar’ in https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/starschema-bigquery-jdbc/bqjdbc-1.4.jar

Place the downloaded jar file in report-designer tool at path, …/report-designer/lib (use slash accordingly based on Windows or Linux systems)

Start Pentaho Report Designer tool now using report-designer.bat or report-designer.sh accordingly.
Installation

5. Datasource Configuration

✓In Pentaho Report Designer tool, go to File > New Report or choose option ‘New Report’ in the Report Designer default wizard.

✓Go to menu, Data > Add Data Source > JDBC

✓Choose Database type as ‘Generic Database’ and Connection Type as ‘JDBC’, in the data source creation dialog box.

✓Fill in the details as below (currently we are using authentication approach using Google Service Account credentials; Other option available is by accessing via OAuth 2.0 Web Client credentials). Depends on the choice of authentication we follow, respective access parameters to be used. For our case,

Name: GBQ_Connection

JDBC URL: jdbc:BQDriver:?withServiceAccount=true?transformQuery=true

Driver: net.starschema.clouddb.jdbc.BQDriver

User: <Client ID of the service Account, created under your Google account>

Password: <Location of the .p12 key file (for the service account), stored in your local drive>

For Reference, Creating Service Accounts and an associated key file can be done by using options in URL

https://console.cloud.google.com/iam-admin/serviceaccounts/

Click on Test option, to ensure that you could able to establish connection with Google Account successfully.

6. Validate Client Connectivity with Google BigQuery (from PRD)

You can validate the client connectivity Pentaho Report Designer(PRD to able to connect with Google Big Query Database tables), by running below sample query in the data source creation wizard of Pentaho Report Designer as usual.

SELECT
DATE(pickup_datetime) as pickup_date,
SUM(passenger_count) as passenger_count,
SUM(trip_distance) as trip_distance,
SUM(fare_amount) as fare_amount,
SUM(total_amount) as total_amount
FROM
[nyc-tlc:green.trips_2014]
GROUP BY
pickup_date
ORDER BY
pickup_date;

You can see the result of above query, as you preview results.