Estimate the unobserved — Moving-Average Model Estimation with Maximum Likelihood in Python
Estimate the Unobserved: Moving-Average Model Estimation with Maximum Likelihood in PythonHow unobserved covariates’ coefficients can be estimated with MLEPhoto by Connor Naasz on UnsplashFor those experienced with time series data and forecasting, terms like regressions, AR, MA, and ARMA should be familiar. Linear Regression is a straightforward model with a closed-form parametric solution obtained through OLS. AR models can also be estimated using OLS. However, things become more complex with MA models, which form the second component of the more advanced ARMA and ARIMA models.The plan for this story:Introducing Moving Average modelsDiscussing why there’s no closed solution for MA modelIntroducing Maximum Likelihood Estimation methodMA(1) ML estimation — Theory and Python codeMoving Average ModelMA models can be described by the following formula:Eq 1: MA(q) formulaHere, the thetas represent the model parameters, while the epsilons are the error terms, assumed to be mutually independent and normally distributed with constant variance. The intuition behind this formula is that our time series, X, can always be described by the last q shocks that occurred in the series. It is evident from the formula that each shock impacts only the subsequent q values of X, in contrast to AR models where the effect of a shock persists indefinitely, although gradually diminishing over time.Closed estimation of a simple model — Linear RegressionAs a reminder, the general form of a linear regression equation looks like this:Eq 2: General linear regression formulaFor forecasting tasks, we typically aim to estimate all the model’s parameters (the betas) using a sample set of x’s and y’s. Given a few assumptions about our model, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimators for the betas have the lowest sampling variance among the class of linear unbiased estimators. In simpler terms, OLS provides us with the best possible estimation of the betas.So, what is OLS? It is a closed-form solution to a loss function minimization problem:Eq 3: OLS minimization equationwhere the loss function, S, is defined as follows -Eq 4: OLS loss functionIn this context, y and X are our sample data and are observable vectors of numbers (as in time series). Therefore, it is straightforward to calculate the function S, determine its derivative, and find the beta that solves the minimization problem.Closed estimation of MA(q)It is should be clear why applying a method like OLS to estimate MA(q) models is problematic — the dependent variable, the time series values, are described by unobservable variables, the epsilons. This raises the question: how can these models be estimated at all?Maximum Likelihood (MLE)Likelihood FunctionStatistical distributions typically depend on one or more parameters. For instance, the normal distribution is characterized by its mean and variance, which define its “height” and “center of mass” —Normal distribution from WikipediaSuppose we have a dataset, X={x_1,…x_n}, comprising samples drawn from an unknown normal distribution, with its parameters unknown. Our objective is to determine the mean and variance values that would characterize the specific normal distribution from which our dataset X is most likely to have been sampled.MLE provides a framework that precisely tackles this question. It introduces a likelihood function, which is a function that yields another function. This likelihood function takes a vector of parameters, often denoted as theta, and produces a probability density function (PDF) that depends on theta.The likelihood function general definitionThe probability density function (PDF) of a distribution is a function that takes a value, x, and returns its probability within the distribution. Therefore, likelihood functions are typically expressed as follows:Likelihood as a function of theta given xThe value of this function indicates the likelihood of observing x from the distribution defined by the PDF with theta as its parameters.The goalWhen constructing a forecast model, we have data samples and a parameterized model, and our goal is to estimate the model’s parameters. In our examples, such as Regression and MA models, these parameters are the coefficients in the respective model formulas.Statistic model estimation processThe equivalent in MLE is that we have observations and a PDF for a distribution defined over a set of parameters, theta, which are unknown and not directly observable. Our goal is to estimate theta.The MLE approach involves finding the set of parameters, theta, that maximizes the likelihood function given the observable data, x.Maximization of the likelihood functionWe assume our samples, x, are drawn from a distribution with a known PDF that depends on a set of parameters, theta. This implies that the likelihood (probability) of observing x under this PDF is essentially 1. Therefore, identifying the theta values that make our likelihood function value close to 1 on our samples, should reveal the true parameter values.Conditional likelihoodNotice that we haven’t made any assumptions about the distribution (PDF) on which the likelihood function is based. Now, let’s assume our observation X is a vector (x_1, x_2, …, x_n). We’ll consider a probability function that represents the probability of observing x_n conditional on that we have already observed (x_1, x_2, …, x_{n-1}) —This represents the likelihood of observing just x_n given the previous values (and theta, the set of parameters). Now, we define the conditional likelihood function as follows:Conditional likelihood functionLater, we will see why it is useful to employ the conditional likelihood function rather than the exact likelihood function.Log-LikelihoodIn practice, it is often convenient to use the natural logarithm of the likelihood function, referred to as the log-likelihood function:Maximize the log-likelihood functionThis is more convenient because we often work with a likelihood function that is a joint probability function of independent variables, which translates to the product of each variable’s probability. Taking the logarithm converts this product into a sum.Estimating MA(1) with MLEFor simplicity, I’ll demonstrate how to estimate the most basic moving average model — MA(1):MA(1) modelHere, x_t represents the time-series observations, alpha and beta are the model parameters to be estimated, and the epsilons are random noise drawn from a normal distribution with zero mean and some variance — sigma, which will also be estimated. Therefore, our “theta” is (alpha, beta, sigma), which we aim to estimate.Let’s define our parameters and generate some synthetic data using Python:import pandas as pdimport numpy as npSTD = 3.3MEAN = 0ALPHA = 18BETA = 0.7N = 1000df = pd.DataFrame({"et": np.random.normal(loc=MEAN, scale=STD, size=N)})df["et-1"] = df["et"].shift(1, fill_value=0)df["xt"] = ALPHA + (BETA*df["et-1"]) + df["et"]Note that we have set the standard deviation of the error distribution to 3.3, with alpha at 18 and beta at 0.7. The data looks like this —Simulation of MA(1) DGPLikelihood function for MA(1)Our objective is to construct a likelihood function that addresses the question: how likely is it to observe our time series X=(x_1, …, x_n) assuming they are generated by the MA(1) process described earlier?Likelihood for observing XThe challenge in computing this probability lies in the mutual dependence among our samples — as evident from the fact that both x_t and x_{t-1} depend on e_{t-1) — making it non-trivial to determine the joint probability of observing all samples (referred to as the exact likelihood).Therefore, as discussed previously, instead of computing the exact likelihood, we’ll work with a conditional likelihood. Let’s begin with the likelihood of observing a single sample given all previous samples:Conditional likelihood for observing x_n given the restThis is much simpler to calculate because —PDF of normal distributionAll that remains is to calculate the conditional likelihood of observing all samples:applying a natural logarithm gives:Final likelihood function to maximizewhich is the function we should maximize.CodeWe’ll utilize the GenericLikelihoodModel class from statsmodels for our MLE estimation implementation. As outlined in the tutorial on statsmodels’ website, we simply need to subclass this class and include our likelihood function calculation:from scipy import statsfrom statsmodels.base.model import GenericLikelihoodModelimport statsmodels.api as smclass MovingAverageMLE(GenericLikelihoodModel): def initialize(self): super().initialize() extra_params_names = ['beta', 'std'] self._set_extra_params_names(extra_params_names) self.start_params = np.array([0.1, 0.1, 0.1]) def calc_conditional_et(self, intercept, beta): df = pd.DataFrame({"xt": self.endog}) ets = [0.0] for i in range(1, len(df)): ets.append(df.iloc[i]["xt"] - intercept - (beta*ets[i-1])) return ets def loglike(self, params): ets = self.calc_conditional_et(params[0], params[1]) return stats.norm.logpdf( ets, scale=params[2], ).sum()The function loglike is essential to implement. Given the iterated parameter values paramsand the dependent variables (in this case, the time series samples), which are stored as class members self.endog, it calculates the conditional log-likelihood value, as we discussed earlier.Now let’s create the model and fit on our simulated data:df = sm.add_constant(df) # add intercept for estimation (alpha)model = MovingAverageMLE(df["xt"], df["const"])r = model.fit()r.summary()and the output is:MLE results from pythonAnd that’s it! As demonstrated, MLE successfully estimated the parameters we selected for simulation.Wrapping UpEstimating even a simple MA(1) model with maximum likelihood demonstrates the power of this method, which not only allows us to make efficient use of our data but also provides a solid statistical foundation for understanding and interpreting the dynamics of time series data.Hope you liked it !References[1] Andrew Lesniewski, Time Series Analysis, 2019, Baruch College, New York[2] Eric Zivot, Estimation of ARMA Models, 2005Unless otherwise noted, all images are by the authorEstimate the unobserved — Moving-Average Model Estimation with Maximum Likelihood in Python was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Welcome to Billionaire Club Co LLC, your gateway to a brand-new social media experience! Sign up today and dive into over 10,000 fresh daily articles and videos curated just for your enjoyment. Enjoy the ad free experience, unlimited content interactions, and get that coveted blue check verification—all for just $1 a month!
Account Frozen
Your account is frozen. You can still view content but cannot interact with it.
Please go to your settings to update your account status.
Open Profile Settings