pentaho Blog


Comparison on Pentaho, AWS, Azure and Open Source Stack based on Data Platform, Data Lake, Data Visualization, Data Science and General


What is the data storage platform you would like to use? What is the Data Ingestion [ETL] tools you would like to use? What is the Data Processing tools you would like to use?


A measurable value that can be evaluated over a specific time period, determine the gap between actual and targeted performance and determine organization effectiveness and operational


Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model. The succeeding models are dependent on the previous model.


Ensemble Methods mostly are used in winning machine learning competitions by devising sophisticated algorithms and producing results with high accuracy


The K-Nearest Neighbors (K-NN) method of classification is one of the simplest methods in machine learning, it is essentially.


Artificial Neural Networks are the computational models inspired by the human brain.


Random Forest is a supervised learning algorithm. As mentioned in the name, it creates a forest and makes it somehow random.


Support vector machines are a type of supervised machine algorithm for learning which is used for classification and regression tasks.


K-Means is an introductory algorithm to clustering techniques and it is the simplest of them. K-Means is an easy to implement and handy algorithm.


Principal component analysis (PCA) is a technique used for identification of a smaller number of uncorrelated variables known as principal components from a larger set of data.


A Logistic Regression classifies observations by estimating the probability that an observation is in a particular category.