Generalized Method of Moments (GMM)

From MM to GMM

In the linear regression, k+1k+1 moments conditions yield k+1k+1 equations and thus k+1k+1 parameter estimates. If there are more moments conditions than parameters to be estimated, the moments equations cannot be solved exactly. This case is called GMM (generalized method of moments).

In GMM, moments conditions are solved approximately. To this aim, single condition equations are weighted.

Population Moments Conditions

Definition: Let θ0\theta_0 be a true unknown vector parameter to be estimated, vtv_t a vector of random variables, and f(.)f(.) a vector of functions. Then, a population moment condition takes the form

E{f(vt,θ0)}=0,tT.E\{f(v_t, \theta_0)\} = 0, t \in T.

Often, f(.)f(.) will contain linear functions only, then the problem essentially becomes one of linear regression. In other cases, f(.)f(.) may still be products of errors and functions of observed variables, then the problem becomes one of non-linear regression. The definition is even more general.

GMM Estimator

The basic idea behind GMM is to replace the theoretical expected value E[⋅] with its empirical analog—sample average:

m^(θ)n1t=1nf(vt,θ)andm^(θ0)=0.\hat{m}(\theta)\equiv n^{-1} \sum_{t=1}^n f(v_t, \theta)\quad \text{and} \quad \hat{m}(\theta_0) = 0.

which is equivalent to minimizing a certain norm of m^(θ)\hat{m}(\theta):

θ^=argminθm^(θ)W2=argminθm^(θ)Wm^(θ),\hat{\theta} = \arg \min_{\theta} \|\hat{m}(\theta)\|_W^2=\arg \min_{\theta} \hat{m}(\theta)'W\hat{m}(\theta),

where WW is a positive definite matrix. The GMM estimator is the value of θ\theta that minimizes the above expression.

Definition: The Generalized Method of Moments estimator based on these population moments conditions is the value of θ\theta that minimizes

Qn(θ)={n1t=1nf(vt,θ)}Wn{n1t=1nf(vt,θ)},Q_n(\theta) = \left\{n^{-1} \sum_{t=1}^n f(v_t, \theta)'\right\} W_n \left\{n^{-1} \sum_{t=1}^n f(v_t, \theta)\right\},

where WnW_n is a non-negative definite matrix that usually depends on the data but converges to a constant positive definite matrix as nn \to \infty.

Asymptotic properties

Define

G=E[f(vt,θ)θ]andΩ=E[f(vt,θ)f(vt,θ)].G=\text{E}\left[\frac{\partial f(v_t, \theta)}{\partial \theta'}\right]\quad\text{and}\quad\Omega=\text{E}\left[f(v_t, \theta)f(v_t, \theta)'\right].

Then, the GMM estimator is consistent if GG is of full rank and Ω\Omega is positive definite. It can be shown that taking WΩ1W\varpropto \Omega^{-1} will yield the most efficient estimator. And then, the asymptotic distribution of the GMM estimator is

n(θ^θ0)dN(0,Ω1GΩ1).\sqrt{n}(\hat{\theta}-\theta_0) \xrightarrow{d} N(0, \Omega^{-1}G\Omega^{-1}).

Implementation

One difficulty with implementing the outlined method is that we cannot take W=Ω1W=\Omega^{-1} because, by the definition of matrix Ω\Omega, we need to know the value of θ0\theta_0 in order to compute this matrix, and θ0\theta_0 is precisely the quantity we do not know and are trying to estimate in the first place. In the case of YtY_t being iid we can estimate WW as
W^T(θ^)=(1Tt=1Tg(Yt,θ^)g(Yt,θ^))1.\hat{W}_T(\hat{\theta})=\left(\frac{1}{T} \sum_{t=1}^T g\left(Y_t, \hat{\theta}\right) g\left(Y_t, \hat{\theta}\right)^{\top}\right)^{-1} .

Several approaches exist to deal with this issue, the first one being the most popular:

In Monte-Carlo experiments this method demonstrated a better performance than the traditional two-step GMM: the estimator has smaller median bias (although fatter tails), and the J-test for overidentifying restrictions in many cases was more reliable.

References