The ARMA process predicts the future by considering both past values and errors. When making predictions, we often look to the past to identify patterns that might repeat in the future. These patterns can be rooted in seasons, days (such as business days vs. weekends), or time of day (day vs. night). However, identical patterns rarely occur multiple times. Unexpected events related to politics, the economy, and daily life in general disrupt any pre-existing templates. Therefore, we need models like ARMA that simultaneously use past data as a template for estimates and account for unpredictable events that distort this template.
ARMA stands for Autoregressive Moving Average. It combines two simpler models: the Autoregressive (AR) and Moving Average (MA) models. ARMA predicts the future using a linear combination of past values and errors. ARMA is only suitable for univariate time series without trend or seasonal components. If your data is more complex but still has a linear dependency between past values, you must preprocess the data before feeding it to the model or use a more advanced process like ARIMA.
The AR (autoregressive) model explains a variable's future value using its past (or "lagged") values. The AR model treats the next step in the time series as a linear function of observations at prior time steps.
The AR model has a single parameter p that specifies the number of lags included.
The mathematical formulation of an AR( p ) model is as follows:
where
The Moving Average (MA) model works similarly to the AR model, but it uses past prediction errors to forecast the current value of the variable. A moving average process of order q is a linear combination of the q most recent past white noise terms, defined by:
where
First, we obtain the data. I selected a small-cap index from an Eastern European market. Why not use more popular data? Because widely followed data tends to have fewer anomalies, making it harder to identify inefficiencies.
My dataset is a time frame with several columns. I indexed, filtered, and converted it to weekly frequency to reveal clearer autocorrelation patterns at a higher scale. Finally, I computed the percentage change of the index to obtain a stationary time series. I then divided the data into two sets: x_train and x_test, which serve as input parameters for the forecasting algorithms.
# File with price data
EQ = r'swig80.txt'
# Column names
COL_DATE = ''
COL_OPEN = ''
COL_CLOSE = ''
COL_LOW = ''
COL_HIGH = ''
# Learning parameters
TEST_RATIO = 0.95
TRAIN_SIZE = 80
def get_data():
"""
Get all files in a given directory and its subdirectories
"""
df = pd.read_csv(EQ)
# Convert date strings to datetime objects
df[COL_DATE] = df[COL_DATE].apply(pd.to_datetime, format="%Y%m%d")
# We only need data from 2017 onwards to predict next week
df = df[df[COL_DATE] >= '2017-01-01']
# Set date as index
df.set_index(COL_DATE, inplace=True)
# Resample to weekly frequency
df = df.resample('1w').agg({COL_OPEN: 'first', COL_HIGH: 'max',
COL_LOW: 'min', COL_CLOSE: 'last'})
# Compute percentage changes
df['change'] = df[COL_CLOSE].pct_change() * 100
plot_eq(df[COL_CLOSE].to_numpy(), df['change'][1:].to_numpy())
return df['change'][1:].to_numpy()
Finally, get_data() produces the time series that serves as input to my prediction algorithms.
I trained the model on a two-year time series and predicted the next three months with weekly frequency.
The first algorithm uses the update function. After each predicted value, this function updates the model parameters with new observations. Initially, I created the auto_arima model. Then, in a loop, I predict the next value and refresh the model with the actual value new_ob from the x_test set. I computed prediction errors from the differences between x_test and fc.
def forecast_with_update(x_train, x_test):
auto_arima = pm.auto_arima(
x_train, seasonal=False, stepwise=False,
approximation=False, n_jobs=-1)
print(auto_arima)
forecasts = []
for new_ob in x_test:
fc, _ = auto_arima.predict(n_periods=1, return_conf_int=True)
forecasts.append(fc)
# Updates the existing model with a small number of MLE steps
auto_arima.update(new_ob)
print(f"Mean squared error: {mean_squared_error(x_test, forecasts)}")
print(f"SMAPE: {pm.metrics.smape(x_test, forecasts)}")
return forecasts
In the second approach, instead of updating the existing model, I recreated the model entirely for each iteration. This replaces the old model with a completely new one using updated data. This approach is much slower than simply updating the model. For more details on the differences between updating and recreating a model for each iteration, see the auto_arima documentation.
def forecast_with_new(x_train, x_test):
forecasts = []
print("Iterations {}".format(x_test.size))
for new_ob in x_test:
# Recompute the entire model
auto_arima = pm.auto_arima(
x_train, seasonal=False, stepwise=False,
approximation=False, n_jobs=-1)
fc, _ = auto_arima.predict(n_periods=1, return_conf_int=True)
forecasts.append(fc)
print("Iteration")
x_train = np.append(x_train[1:], new_ob)
print(f"Mean squared error: {mean_squared_error(x_test, forecasts)}")
print(f"SMAPE: {pm.metrics.smape(x_test, forecasts)}")
return forecasts
In both cases, the prediction results were not satisfactory. The forecasts did not closely follow the actual values. It appears the data is more complex and contains non-linear dependencies, which ARMA cannot model effectively. Rebuilding the model for each prediction did not improve performance. To enhance results, you could consider:
Input data (place in the same directory as the Python file)
ARMA predicts the future by considering both past values and errors. This post demonstrates how to use ARMA for stock price prediction.