The Introduction of Linear Mixed-Effect Model

From OLM to LMM

The OLM (Ordinary Linear Model) assumes that the observations are independent and identically distributed (i.i.d.). It can be written as
y=Xγ+e,\begin{equation} \mathbf{y} = \mathbf{X}\mathbf{\gamma} + \mathbf{e}, \end{equation}
where yRn\mathbf{y} \in \mathbb{R}^n is the vector of response variable, XRn×p\mathbf{X} \in \mathbb{R}^{n \times p} is the matrix of pp independent variables, γRp\mathbf{\gamma} \in \mathbb{R}^p is the vector of coefficients, and eN(0,σe2In)\mathbf{e} \sim \mathcal{N}(\mathbf{0}, \sigma^2_e \mathbf{I}_n) is the independent noise term. To simplify the notation, we assume that X\mathbf{X} and y\mathbf{y} are centered, i.e. the mean of y\mathbf{y} is 0\mathbf{0}, and the mean of X\mathbf{X} is 0\mathbf{0} as well. Thus, we do not need to estimate the intercept term.

In the OLM, if we want to obtain the coefficients γ\mathbf{\gamma}, we can use the least square method to minimize the loss function. However, when n<pn<p, the matrix X\mathbf{X} is not full rank, and the least square method could lead to overfitting. Because we have p+1p+1 (including σe2\sigma_e^2) parameters to estimate, but only nn observations.

Let's assume the design matrix X\mathbf{X} can be decomposed into two parts, i.e., X=[X1,X2]\mathbf{X} = [\mathbf{X}_1, \mathbf{X}_2], where X1Rn×p1,p1<np\mathbf{X}_1 \in \mathbb{R}^{n \times p_1}, p_1 < n\ll p and X2Rn×p2\mathbf{X}_2 \in \mathbb{R}^{n \times p_2}. And, γ\mathbf{\gamma} can be break down into a fixed effect part ωRp1\mathbf{\omega}\in \mathbb{R}^{p_1} and a random effect part βRp2\mathbf{\beta}\in \mathbb{R}^{p_2} the distribution of which is N(0,σβ2Ip2)\mathcal{N}(\mathbf{0}, \sigma^2_\mathbf{\beta} \mathbf{I}_{p_2}), i.e., γT=[ωT,βT]\mathbf{\gamma}^T = [\mathbf{\omega}^T, \mathbf{\beta}^T]. Then, we have
y=X1ω+X2β+e\begin{equation} \mathbf{y} = \mathbf{X}_1\mathbf{\omega} + \mathbf{X}_2\mathbf{\beta} + \mathbf{e} \end{equation}
Thus, we obtain the LMM (Linear Mixed-Effect Model). In this case, the number of parameters to estimate is p1+1+1p_1 + 1 + 1 (including σβ2\sigma_\beta^2 and σe2\sigma_e^2), which is much smaller than p+1p+1 in the OLM.

Model Description

LMM is a statistical model that accounts for both fixed effects and random effects in a linear regression model. It is used for modeling data where observations are not independent or identically distributed.

Consider a dataset {y,X,Z}\{\mathbf{y}, \mathbf{X},\mathbf{Z}\} with nn samples, where yRn\mathbf{y} \in \mathbb{R}^n is the vector of response variable, XRn×p\mathbf{X} \in \mathbb{R}^{n \times p} is the matrix of pp independent variables, and ZRn×c\mathbf{Z} \in \mathbb{R}^{n \times c} is another matrix of cc variables. The linear mixed model builds upon a linear relationship from y\mathbf{y} to X\mathbf{X} and Z\mathbf{Z} by
y=Zωfixed+Xβrandom+eerror,\begin{equation} \mathbf{y} = \underbrace{\mathbf{Z}\mathbf{\omega}}_{\text {fixed}} + \underbrace{\mathbf{X}\mathbf{\beta}}_{\text {random}} + \underbrace{\mathbf{e}}_{\text {error}}, \end{equation}
where ωRc\mathbf{\omega} \in \mathbb{R}^c is the vector of fixed effects, βRp\mathbf{\beta} \in \mathbb{R}^p is the vector of random effects with βN(0,σβ2Ip)\mathbf{\beta} \sim \mathcal{N}(\mathbf{0}, \sigma^2_\mathbf{\beta} \mathbf{I}_p), and eN(0,σe2In)\mathbf{e} \sim \mathcal{N}(\mathbf{0}, \sigma^2_e \mathbf{I}_n) is the independent noise term.

The LMM can be solved by various methods, such as the restricted maximum likelihood (REML) and the maximum likelihood (ML). The REML is a method of estimation that does not base estimates on a maximum likelihood fit of all the information, but instead uses a likelihood function derived from a transformed set of data, so that nuisance parameters have no effect. The ML is a method of estimating the parameters of a statistical model given observations, by finding the parameter values that maximize the likelihood of making the observations given the parameters. The ML is a special case of the maximum a posteriori estimation (MAP) that assumes that the prior over the parameters is uniform or non-informative.

Formal Description of Bayesian Inference and Variational Inference

The following contents are basically from Wikipedia-BI and Wikipedia-VBM. The purpose of this section is to provide a formal description of Bayesian inference and variational inference, which will be used in the following sections.

Definitions

Bayesian Inference

Bayesian Prediction

Variational Inference