# Time-Series Forecasting using ARIMA model - Pentaho

### Introduction

A time-series is a series of data points measured at consistent time intervals. This time interval may be hourly, daily, weekly, monthly, quarterly, yearly and so on. In a time series, each data point is dependent on the previous data points. Time-series patterns may be classified as:

• Trend – Exists when there is a long-term increase or decrease in the data
• Seasonal – Occurs when time series is of a fixed and known frequency
• Cyclic – Occurs when the time series is not of a fixed frequency

Forecasting is defined as the process of making predictions of the future based on the past and present data and by analysis of trends. Time-series forecasting is the process of using a model to predict future values based on previously-observed values. ARIMA is a very popular technique for time-series modelling.

ARIMA (Auto Regressive Integrated Moving Average) is like a linear regression equation where the predictors depend on the following parameters:

• p (lag order) – number of lag observations included in the model
• d (degree of differencing) – number of times the raw observations are differenced
• q (order of moving average) – size of the moving average window

ARIMA can be used in cases where the data is stationary, does not contain any anomaly and is uni-variate. Cases of non-stationarity can be eliminated by applying differencing once or twice.

### Model Implementation

• Plot, examine and prepare the time-series for modelling.
• Extract the seasonality component from the time-series.
• Test for stationarity and apply appropriate transformations.
• Choose the best fit for the ARIMA model.
• Forecast the time-series.

The process of extracting trend, seasonality and cycle from a time-series is referred to as decomposition. Deconstruction of the data helps to understand its behaviour and prepare a foundation for building the forecast model. Decomposition is done in Python using seasonal_decompose().

ACF (Auto-correlation) plots help in determining the lag order (q) of the model.

PACF (Partial Auto-correlation) plots help in determining the order of moving average (p) of the model. Sample ACF & PACF plots

Auto arima function helps to identify the optimised fit ARIMA model. The .fit() command is used to fit the model without having to select p, d & q values. ARIMA takes into account the AIC and BIC values generated to determine the best combination of parameters. Graphical representation of the ARIMA model

### Conclusion

ARIMA is the simplest technique for performing time-series forecasting. ARIMA models are more flexible than other statistical models such as exponential smoothing or linear regression and do not over-fit.