In the linear regression, k+1 moments conditions yield k+1 equations and thus k+1 parameter estimates. If there are more moments conditions than parameters to be estimated, the moments equations cannot be solved exactly. This case is called GMM (generalized method of moments).
In GMM, moments conditions are solved approximately. To this aim, single condition equations are weighted.
Population Moments Conditions
Definition: Let θ0 be a true unknown vector parameter to be estimated, vt a vector of random variables, and f(.)a vector of functions. Then, a population moment condition takes the form
E{f(vt,θ0)}=0,t∈T.
Often, f(.) will contain linear functions only, then the problem essentially becomes one of linear regression. In other cases, f(.) may still be products of errors and functions of observed variables, then the problem becomes one of non-linear regression. The definition is even more general.
GMM Estimator
The basic idea behind GMM is to replace the theoretical expected value E[⋅] with its empirical analog—sample average:
m^(θ)≡n−1t=1∑nf(vt,θ)andm^(θ0)=0.
which is equivalent to minimizing a certain norm of m^(θ):
θ^=argθmin∥m^(θ)∥W2=argθminm^(θ)′Wm^(θ),
where W is a positive definite matrix. The GMM estimator is the value of θ that minimizes the above expression.
Definition: The Generalized Method of Moments estimator based on these population moments conditions is the value of θ that minimizes
where Wn is a non-negative definite matrix that usually depends on the data but converges to a constant positive definite matrix as n→∞.
Asymptotic properties
Define
G=E[∂θ′∂f(vt,θ)]andΩ=E[f(vt,θ)f(vt,θ)′].
Then, the GMM estimator is consistent if G is of full rank and Ω is positive definite. It can be shown that taking W∝Ω−1 will yield the most efficient estimator. And then, the asymptotic distribution of the GMM estimator is
n(θ^−θ0)dN(0,Ω−1GΩ−1).
Implementation
One difficulty with implementing the outlined method is that we cannot take W=Ω−1 because, by the definition of matrix Ω, we need to know the value of θ0 in order to compute this matrix, and θ0 is precisely the quantity we do not know and are trying to estimate in the first place. In the case of Yt being iid we can estimate W as W^T(θ^)=(T1t=1∑Tg(Yt,θ^)g(Yt,θ^)⊤)−1.
Several approaches exist to deal with this issue, the first one being the most popular:
Two-step feasible GMM:
Step 1: Take W= I (the identity matrix) or some other positive-definite matrix, and compute preliminary GMM estimate θ^(1). This estimator is consistent for θ0, although not efficient.
Step 2: W^T(θ^(1)) converges in probability to Ω−1 and therefore if we compute θ^ with this weighting matrix, the estimator will be asymptotically efficient.
Iterated GMM. Essentially the same procedure as 2-step GMM, except that the matrix W^T is recalculated several times. That is, the estimate obtained in step 2 is used to calculate the weighting matrix for step 3 , and so on until some convergence criterion is met. θ^(i+1)=argθ∈Θmin(T1t=1∑Tg(Yt,θ))⊤W^T(θ^(i))(T1t=1∑Tg(Yt,θ))
Asymptotically no improvement can be achieved through such iterations, although certain Monte-Carlo experiments suggest that finite-sample properties of this estimator are slightly better.
Continuously updating GMM (CUGMM, or CUE). Estimates θ^ simultaneously with estimating the weighting matrix W : θ^=argθ∈Θmin(T1t=1∑Tg(Yt,θ))⊤W^T(θ)(T1t=1∑Tg(Yt,θ))
In Monte-Carlo experiments this method demonstrated a better performance than the traditional two-step GMM: the estimator has smaller median bias (although fatter tails), and the J-test for overidentifying restrictions in many cases was more reliable.