Clustering and Loadbalancing PBA with standalone postgres and Apache web service

tenthplanet blog pentaho Clustering and Loadbalancing PBA with standalone postgres

Clustering and Load balancing in Pentaho BA server

Prerequisite:

Multiple pentaho server
Stanalone postgres
Apache web server

Initialize Postgres Database For Both the Pentaho Servers

Change Default Passwords:
To change the passwords, go to the pentaho/server/pentaho-server/data/postgresql directory and use any text editor to change the passwords in these SQL scripts:
create_quartz_postgresql.sql
create_jcr_postgresql.sql
create_repository_postgresql.sql
pentaho_mart_postgresql.sql
Run SQL Scripts:
Run these scripts from the PSQL Console window in the pgAdminIII tool:
\i <your filepath>/create_quartz_postgresql.sql
\i <your filepath>/create_repository_postgresql.sql
\i <your filepath>/create_jcr_postgresql.sql
\i <your filepath>/pentaho_mart_postgresql.sql
Check the postgres schema whether the components are present. (Quartz, Hibernate, JCR, Pentaho Operations Mart)

Specify Data Connection Between Pentaho BA server and Tomcat web server
Set Up JNDI Connections for the Pentaho Server
Stop the Tomcat and Pentaho servers.
Consult your database documentation to determine the class name and connection string for your database.
Edit the /tomcat/webapps/pentaho/WEB-INF/web.xml file.
At the end of the <web-app> element, in the same part of the file where you see <!– insert additional resource-refs –>, add the following XML snippet:
<resource-ref>
<description>postgres</description>
<res-ref-name>jdbc/postgres</res-ref-name>
<res-type>javax.sql.DataSource</res-type>
<res-auth>Container</res-auth>
</resource-ref>
Save and close the web.xml file.
Edit the /tomcat/conf/context.xml with a text editor. Alternatively, you can modify the /tomcat/webapps/pentaho/META-INF/context.xml file if you want this data connection to be available only to the Pentaho Server. Adding JNDI connections to the context.xml makes them available to all of the webapps deployed to this Tomcat instance.
Anywhere inside of the Context element, add the following XML snippet:
<Resource name=”jdbc/postgres”
auth=”Container” type=”javax.sql.DataSource”
factory=”org.apache.tomcat.jdbc.pool.DataSourceFactory”
maxActive=”20″
maxIdle=”5″
maxWait=”10000″
username=”postgres”
password=”password”
driverClassName=”org.postgresql.Driver”
url=”jdbc:postgresql://ip:port/myDataSource”
/>
The example above shows a simple PostgreSQL configuration. Replace the Resource name, username, password, driverClassName, and url parameters, or any relevant connection settings, to match your database connection information and the details you supplied in the web.xml file earlier.
Save and close the context.xml file.
Delete the pentaho.xml filed located in the /tomcat/conf/catalina/directory. The pentaho.xml is a cached copy of the context.xml file you modified. Since the cache is not usually configured to update frequently, you have to delete the pentaho.xml file and let Tomcat recreate it when it starts up. (If you could not find pentaho.xml in specified path, skip to next step. It is because of pentaho repository have not cached.)
Start the Tomcat and Pentaho servers. Tomcat can now properly connect to your data.

Configure PostgreSQL Pentaho Repository Database
1. Set Up Quartz on PostgreSQL Pentaho Repository Database:
Open the pentaho/server/pentaho-server/pentaho-solutions/system/quartz/quartz.properties file in any text editor.
Locate the #_replace_jobstore_properties section and set the org.quartz.jobStore.driverDelegateClass as shown here.
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
Locate the # Configure Datasources section and set the org.quartz.dataSource.myDS.jndiURL equal to Quartz. For example, on Tomcat, this setting should be:
org.quartz.dataSource.myDS.jndiURL = Quartz
Save the file and close the text editor.
2. Set Hibernate Settings for PostgreSQL:
Open the hibernate-settings.xml file in a text editor. Find the <config-file> tags and confirm that it is configured for PostgreSQL.
<config-file>system/hibernate/postgresql.hibernate.cfg.xml</config-file>
Save the file if you made changes, then close the file.
Open the postgresql.hibernate.cfg.xml file in a text editor.
Make sure that the password and port number match the ones you specified in your configuration. Make changes if necessary, then save and close the file.
3. Modify Jackrabbit Repository Information for PostgreSQL:
Navigate to the pentaho/server/pentaho-server/pentaho-solutions/system/jackrabbit and open the repository.xml file with any text editor.
Following the table below, locate and verify or change the code so that the PostgreSQL lines are not commented out, but the MySQL, Oracle, and MS SQL Server lines are commented out.

ItemCode Section
Repository<FileSystem class=”org.apache.jackrabbit.core.fs.db.DbFileSystem”>

<param name=”driver” value=”org.postgresql.Driver”/>

<param name=”url” value=”jdbc:postgresql://ip:port/jackrabbit”/>

</FileSystem>

DataStore<DataStore class=”org.apache.jackrabbit.core.data.db.DbDataStore”>

<param name=”url” value=”jdbc:postgresql://ip:port/jackrabbit”/>

</DataStore>

Workspaces<FileSystem class=”org.apache.jackrabbit.core.fs.db.DbFileSystem”>

<param name=”driver” value=”org.postgresql.Driver”/>

<param name=”url” value=”jdbc:postgresql://ip:port/jackrabbit”/>

</FileSystem>

Persistence Manager (1)<PersistenceManager class=”org.apache.jackrabbit.core.persistence.bundle.PostgreSQLPersistenceManager”>

<param name=”url” value=”jdbc:postgresql://ip:port/jackrabbit”/>

</PersistenceManager>

Versioning<FileSystem class=”org.apache.jackrabbit.core.fs.db.DbFileSystem”>

<param name=”driver” value=”org.postgresql.Driver”/>

<param name=”url” value=”jdbc:postgresql://ip:port/jackrabbit”/>

</FileSystem>

Persistence Manager (2)<PersistenceManager class=”org.apache.jackrabbit.core.persistence.bundle.PostgreSQLPersistenceManager”>

<param name=”url” value=”jdbc:postgresql://ip:port/jackrabbit”/>

</PersistenceManager>

Database Journal<Journal class=”org.apache.jackrabbit.core.journal.DatabaseJournal”>

<param name=”revision” value=”${rep.home}/revision.log”/>

<param name=”url” value=”jdbc:postgresql://ip:port/jackrabbit”/>

<param name=”driver” value=”org.postgresql.Driver”/>

<param name=”user” value=”jcr_user”/>

<param name=”password” value=”password”/>

<param name=”schema” value=”postgresql”/>

<param name=”schemaObjectPrefix” value=”cl_j_”/>

<param name=”janitorEnabled” value=”true”/>

<param name=”janitorSleep” value=”86400″/>

<param name=”janitorFirstRunHourOfDay” value=”3″/>

</Journal>

Note: If you have a different port or different password, make sure that you change the password and port number in these examples to match the ones in your configuration.

Perform Tomcat-Specific Connection Tasks
1. Download Driver and Apply to the Pentaho Server:
Download a JDBC driver JAR from your database vendor or a third-party driver developer.
Copy the JDBC driver JAR you just downloaded to the pentaho/server/pentaho-server/tomcat/lib folder.
Copy the hsqldb-2.3.2.jar file to pentaho-server/tomcat/lib if you want to retain the sample provided by Pentaho.
2. Modify JDBC Connection Information in the Tomcat Context XML File:
Go to the pentaho-server/tomcat/webapps/pentaho/META-INF directory and open the context.xml file with any file editor.
Comment out the resource references that refer to databases other than PostgreSQL, such as MySQL, MS SQL Server, and Oracle. Then, add the following code to the file if it does not already exist. Be sure to adjust the port numbers and passwords to reflect your environment, if necessary.
<Resource name=”jdbc/Hibernate” auth=”Container” type=”javax.sql.DataSource” factory=”org.apache.tomcat.jdbc.pool.DataSourceFactory” initialSize=”0″ maxActive=”20″ maxIdle=”10″ maxWait=”10000″ username=”hibuser” password=”password” driverClassName=”org.postgresql.Driver” url=”jdbc:postgresql://ip:port/hibernate” validationQuery=”select 1″/>
<Resource name=”jdbc/Audit” auth=”Container” type=”javax.sql.DataSource” factory=”org.apache.tomcat.jdbc.pool.DataSourceFactory” initialSize=”0″ maxActive=”20″ maxIdle=”10″ maxWait=”10000″ username=”hibuser” password=”password” driverClassName=”org.postgresql.Driver” url=”jdbc:postgresql://ip:port/hibernate” validationQuery=”select 1″/>
<Resource name=”jdbc/Quartz” auth=”Container” type=”javax.sql.DataSource” factory=”org.apache.tomcat.jdbc.pool.DataSourceFactory” initialSize=”0″ maxActive=”20″ maxIdle=”10″ maxWait=”10000″ username=”pentaho_user” password=”password” driverClassName=”org.postgresql.Driver” url=”jdbc:postgresql://ip:port/quartz” validationQuery=”select 1″/>
<Resource name=”jdbc/pentaho_operations_mart” auth=”Container” type=”javax.sql.DataSource” factory=”org.apache.tomcat.jdbc.pool.DataSourceFactory” initialSize=”0″ maxActive=”20″ maxIdle=”10″ maxWait=”10000″ username=”hibuser” password=”password” driverClassName=”org.postgresql.Driver” url=”jdbc:postgresql://ip:port/pentaho_operations_mart” validationQuery=”select 1″/>
<Resource name=”jdbc/PDI_Operations_Mart” auth=”Container” type=”javax.sql.DataSource” factory=”org.apache.tomcat.jdbc.pool.DataSourceFactory” initialSize=”0″ maxActive=”20″ maxIdle=”10″ maxWait=”10000″ username=”hibuser” password=”password” driverClassName=”org.postgresql.Driver” url=”jdbc:postgresql://ip:port/hibernate” validationQuery=”select 1″/>
Make sure that the validationQuery variable for your database is set as follows: validationQuery=”select 1″
Save the context.xml file, then close it.
Clear the tomcat/work and tomcat/temp directories.
Start the tomcat and pentaho servers.

Initialize and Configure Pentaho Repository
1)Clean Up Pentaho Repository:
After you have initialized and configured your repository, you should clean up these files by following these steps.
Locate the …pentaho-server/tomcat directory and remove all files and folders from the temp folder.
Locate the …pentaho-server/tomcat directory and remove all files and folders from the work folder.
Locate the …pentaho-server/pentaho-solutions/system/jackrabbit/repository directory and remove all files and folders from the final repository folder.
Locate the …pentaho-server/pentaho-solutions/system/jackrabbit/repository directory and remove all files and folders from the workspaces folder.
You now have a configured repository and are ready to move to the next step for clustering.
Configure Jackrabbit Journal
1. Locate the repository.xml file in the …/pentaho-server/pentaho-solutions/system/jackrabbit directory and open it with any text editor.
2. Scroll to the bottom of the file and replace the section that begins with <!– Run with a cluster journal –> with the correct code for your type of database repository.
Change in the 1st pentaho server – Node1 /Server 1
<!–
Run with a cluster journal
–>
<Cluster id=”node1″>
<Journal class=”org.apache.jackrabbit.core.journal.DatabaseJournal”>
<param name=”revision” value=”${rep.home}/revision.log”/>
<param name=”url” value=”jdbc:postgresql://ip:port/jackrabbit”/>
<param name=”driver” value=”org.postgresql.Driver”/>
<param name=”user” value=”jcr_user”/>
<param name=”password” value=”password”/>
<param name=”databaseType” value=”postgresql”/>
<param name=”janitorEnabled” value=”true”/>
<param name=”janitorSleep” value=”86400″/>
<param name=”janitorFirstRunHourOfDay” value=”3″/>
</Journal>
</Cluster>
Change in the 2nd pentaho server – Node 2 /Server 2
<!–
Run with a cluster journal
–>
<Cluster id=”node2″>
<Journal class=”org.apache.jackrabbit.core.journal.DatabaseJournal”>
<param name=”revision” value=”${rep.home}/revision.log”/>
<param name=”url” value=”jdbc:postgresql://ip:port/jackrabbit”/>
<param name=”driver” value=”org.postgresql.Driver”/>
<param name=”user” value=”jcr_user”/>
<param name=”password” value=”password”/>
<param name=”databaseType” value=”postgresql”/>
<param name=”janitorEnabled” value=”true”/>
<param name=”janitorSleep” value=”86400″/>
<param name=”janitorFirstRunHourOfDay” value=”3″/>
</Journal>
</Cluster>

3. Save and close the file. Jackrabbit journaling is now set up for your cluster. Next, you need to cluster the quartz tables to avoid duplicate scheduling on each node.

Configure Quartz
1. Locate the quartz.properties file in the …/pentaho-server/pentaho-solutions/system/quartz directory and open it with any text editor.
2. Find the org.quartz.scheduler.instanceId = INSTANCE_ID line and change INSTANCE_ID to AUTO.
org.quartz.scheduler.instanceId = AUTO
3. Find the #_replace_jobstore_properties section and change the default value of org.quartz.jobStore.isClustered to true as shown.
#_replace_jobstore_properties
org.quartz.jobStore.misfireThreshold = 60000
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
org.quartz.jobStore.useProperties = false
org.quartz.jobStore.dataSource = myDS
org.quartz.jobStore.tablePrefix = QRTZ5_
org.quartz.jobStore.isClustered = true
4. Add this line just after the org.quartz.jobStore.isClustered = true line.
org.quartz.jobStore.clusterCheckinInterval = 20000
5. Save the Editor. Quartz is now configured for your cluster.
6. Restart the postgresql, tomcat and pentaho server. And the clustering of two servers is working properly.

Tomcat Configuration for Load Balancer in High Availability
1. Shut down the Tomcat server on each node in the cluster.
2. Open the tomcat/conf/server.xml file in a text editor.
3. Locate the following line <Engine name=”Catalina” defaultHost=”localhost”>.
4. The jvmRoute value will need to be unique for each node. This value will map to the
BalancerMember setup in the following section when configuring Apache. Add the
jvmRoute attribute like this:
Change in Pentaho Server 1
<Engine name=”Catalina” defaultHost=”localhost” jvmRoute=”server1″>

Change in Pentaho Server 2
<Engine name=”Catalina” defaultHost=”localhost” jvmRoute=”server2″>
5. To close connections so they do not become orphaned and cause errors, locate the
following line:
<Connector URIEncoding=”UTF-8″ port=”8009″ protocol=”AJP/1.3″
redirectPort=”8443″ />
Change it to:
<Connector port=”8009″ protocol=”AJP/1.3″ redirectPort=”8443″
connectionTimeout=”10000″ keepAliveTimeout=”10000″ />
6. Edit the pentaho-solutions/system/server.properties and change the fully-qualified-server-url param-value to the load balancer’s URL.
fully-qualified-server-url=http://ip:port/pentaho/
7. Start the Tomcat service back up after configuring this file.
8. Repeat this process throughout all nodes within the cluster.
Apache Configuration
Locate and edit the Apache configuration file. For Debian-based distributions, this is located by default in /etc/apache2/sites-available/000-default.conf.
Within this configuration file, you will need to edit the existing (or add a new) VirtualHost directive. Make sure to reflect the BalancerMember hostname with your Pentaho cluster nodes:
<VirtualHost *:80>
# The ServerName directive sets the request scheme, hostname and port that
# the server uses to identify itself. This is used when creating
# redirection URLs. In the context of virtual hosts, the ServerName
# specifies what hostname must appear in the request’s Host: header to
# match this virtual host. For the default virtual host (this file) this
# value is not decisive as it is used as a last resort host regardless.
# However, you must set it for any further virtual host explicitly.
ServerName master-server
ServerAdmin webmaster@localhost
DocumentRoot /var/www/html
# Available loglevels: trace8, …, trace1, debug, info, notice, warn,
# error, crit, alert, emerg.
# It is also possible to configure the loglevel for particular
# modules, e.g.
#LogLevel info ssl:warn
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
# For most configuration files from conf-available/, which are
# enabled or disabled at a global level, it is possible to
# include a line for only one particular virtual host. For example the
# following line enables the CGI configuration for this host only
# after it has been globally disabled with “a2disconf”.
#Include conf-available/serve-cgi-bin.conf
<Location “/pentaho”>
ProxyPass balancer://reportingcluster
ProxyPassReverseCookiePath / /pentaho
</Location>
<Location “/pentaho-style”>
ProxyPass balancer://reportingcluster-style
ProxyPassReverseCookiePath / /pentaho
</Location>
</VirtualHost>
ProxyPreserveHost On
ProxyPass /pentaho balancer://reportingcluster
ProxyPassReverseCookiePath / /pentaho
ProxyPass /pentaho-style balancer://reportingcluster-style
ProxyPassReverseCookiePath / /pentaho
Timeout 600
ProxyTimeout 600
<Proxy balancer://reportingcluster>
ProxySet stickysession=JSESSIONID timeout=360 scolonpathdelim=On failonstatus=502,503
BalancerMember http://ip:port/pentaho connectiontimeout=10 loadfactor=10 route=server1 max=20 ttl=120 retry=1
BalancerMember http://ip:port/pentaho connectiontimeout=10 loadfactor=10 route=server2 max=20 ttl=120 retry=1
BalancerMember http://ip:port/pentaho route=server3 retry=1 connectiontimeout=10 loadfactor=10 max=20 ttl=120 status=+H
ProxySet lbmethod=bytraffic
</Proxy>
<Proxy balancer://reportingcluster-style>
ProxySet stickysession=JSESSIONID timeout=360 scolonpathdelim=On failonstatus=502,503
BalancerMember http://ip:port/pentaho-sytle connectiontimeout=10 loadfactor=10 route=server1 max=20 ttl=120 retry=1
BalancerMember http://ip:port/pentaho-sytle connectiontimeout=10 loadfactor=10 route=server2 max=20 ttl=120 retry=1
BalancerMember http://ip:port/pentaho-sytle route=server3 retry=1 connectiontimeout=10 loadfactor=10 max=20 ttl=120 status=+H
ProxySet lbmethod=bytraffic
</Proxy>
# vim: syntax=apache ts=4 sw=4 sts=4 sr noet
After adding the above code, save the file and restart the Apache service.
You should now be able to browse to your load balancer and see the Pentaho User
Console, for example:

Screenshot of Load Balancing: