Guide to ARIMA and Auto_Arima
The auto_arima function can be a valuable tool for extracting meaningful insights from time series data.
In this blog, we take an in-depth look at ARIMA, auto_arima, and established auto_arima models.
What Is ARIMA?
The autoregressive integrated moving average model, or ARIMA(p,d,q) model, is an extension of the Autoregressive Moving Average model [ARMA(p,q)], which is a combination of Autoregressive and Moving Average models AR(p) and MA(q) respectively.
The ARIMA model was introduced by Box and Jenkins in 1976, and requires three different parameters:
- p - the autoregressive parameters
- d - the number of differencing passes
- q - moving average parameters
For example, a model described as (0, 1, 2) means that it contains zero autoregressive parameters and two moving average parameters, which were computed for the series after it was differenced once.
The model is referred to as “autoregressive” because future values are functions of past values of the series; i.e., “auto” meaning “self” as opposed to “automatic.” In contrast, an MA(q) model (Moving Average) assumes the series is based on a lagged white noise process from previous observations. The combination of a moving average process with a linear difference equation generates an autoregressive moving average model, the popular ARMA(p,q) model, used for general forecasting.
An ARIMA model extends an ARMA model an additional step by adding the ability to model non-stationary time series using differencing. An ARMA representation for a time series requires a stationary time series. An ARIMA model changes a non-stationary time series to a stationary series by using repeated seasonal differencing.
The number of differences, d, is input to the fitting process. Since the forecast estimates are based on the differenced time series, an integration step is required so that the forecasted values are compatible with the original data. This integration step accounts for the “I” in “ARIMA.”
Note that for each of these models, the user normally must specify the number of autoregressive and moving average parameters p and q, respectively. The method of choosing values for p and q requires an expert analysis of the autocorrelations and partial autocorrelations for the series.
Even then, finding the proper model is sometimes an iterative technique where values are chosen and the model fitted using a partial input set, where known future values are used to evaluate the model’s forecast. The input parameters that give the best results are then chosen. Once the input parameters have been estimated, they are used to forecast values beyond the end of the time series.
What Is Auto_Arima?
Auto_arima, a routine from IMSL, applies automated configuration tasks to the autoregressive integrated moving average (ARIMA) model.
The auto_arima function automatically estimates missing values, selects the best values for p and q, performs seasonal differencing, detects outliers and produces forecasts.
Because a diligent user may be interested in the underlying time series outlier-free series as well as forecasted values of the outlier-free series, both the outlier-free series and associated forecasts are available as output.
How Is Auto_Arima used?
The auto_arima function is used for many time series problems, such as sales forecasting, commodity pricing, stock market predictions, and more.
Auto_Arima Model Selection
The auto_arima function can be invoked using one of six different cases. The case selected is specified by the user based upon their objectives and analyses of the input time series, including its autocorrelations, seasonality and non-stationary behavior. The six cases are described in the table below. The details on how users specify which case to invoke are specified in the routine’s API description.
|Case||Model||Parameter Selection||Non-Stationary Adjustment|
p, q, specified
d, s, automatic
The model choice is up to the user and is determined by the
auto_arima function at runtime by the user's choice of input parameters. Details regarding each method is summarized briefly below.
Case 1: AR(p) With Automatic Parameter Selection
In Case 1, the method searches for the best AR(p) using a minimum AIC (Akaike's information criterion) method for a range of p up to and including an input parameter max_lag, which is the largest offset used in autoregressing the time series. This method uses the fact that given a sufficiently large p, any stationary time series can be fitted using an AR(p) model. It is also based upon the widespread popular use of AIC as a yardstick for measuring the quality of a time series fit.
Case 2: AR(p,s,d) With Automatic Parameter Selection and Non-Stationary Adjustment
Case 2 is similar except that optimum values for s and d for fitting any non-stationary or seasonal trends are also determined. Thus Case 2 is an AR model that also accounts for seasonality and non-stationary time series.
Case 3: ARMA(p,q) With Automatic Parameter Selection
Case 3 is an ARMA(p,q) model where optimal values for p and q are selected from a search of all combinations of p (up to max_lag) and input values of q supplied by the user.
Case 4: ARMA(p,d,q) With Automatic Parameter Selection and Non-Stationary Adjustment
Case 4 extends Case 3 to account for non-stationary seasonality effects, effectively then computing an ARIMA(p,d,q) model with automatic parameter determination.
Case 5: ARMA(p,d,q) With Specified Parameter Selection
Case 5 accepts user-specified values for the input parameters in an ARIMA(p,d,q) model, allowing no searching by the function.
Case 6: Case 5: ARIMA(p,d,q) With Automatic and Specified Parameter Selection and Non-Stationary Adjustment
Finally, Case 6 accepts user values for p and q while conducting a grid search of all possible combinations for input values of s and d. For users familiar with the ARMA(p,q) model, this case allows the easiest extension to an ARIMA(p,d,q) model.
Today we looked at the auto_arima function from IMSL, including background on the ARIMA model, as well as a handful of auto_arima modeling options. In upcoming articles, we’ll talk about how to account for missing values, time series seasonality, and outliers and illustrate with an applied example.
Looking for additional reading on time series analysis? The resources below are a great place to start.
- Blog: Guide to Time Series Analysis
- Article: Time Series Analysis in Python
- Book: Time Series Analysis, Forecasting, and Control
Utilizing the IMSL Numerical Library function, an organization can integrate Auto_ARIMA to create an optimal and custom forecasting solution for their situation.
Want to learn more? Click the button below to see how IMSL can help add dependable functionality to your application today.