Notes on U Statistic

Notes for A Class of Statistics with Asymptotically Normal Distribution by Wassily Hoeffding.

You can view the lecture notes on U-statistics.

Basic Concepts and Motivating Examples

Functional, Kernel and Order

Suppose that $\Phi(x_1,…,x_m)$ is a function of $m$ random variables and $X_1,…,X_n$ are i.i.d. observations from c.d.f. $F(x)$. Assume $n>m$.

Make inference about
$$\theta = \theta(F)=E_{F}[\Phi(X_1,…,X_m)]$$

Note: $\theta$ is a functional of the distribution $F$. Functional is a function of a function which maps a function to a real (or complex) number.

To use all the samples, we can use the U-statistic
$$\begin{align}
U &= \left(^n_m\right)^{-1}\Sigma^ \prime \Phi(X_{i_1},…,X_{i_m})\\
&= \frac{1}{n(n-1)…(n-m+1)}\sum_{1\leq i_1<…<i_m\leq n}\Phi(X_{i_1},…,X_{i_m})
\end{align}$$
where $\Sigma^\prime$ stands for summation over all permutations of $m$ integers $i_1,…,i_m$ such that $1\leq i_1<…<i_m\leq n$.

Here, $U$ is a function of the sample $X_1,…,X_n$ called a U-statistic. The $\Phi$ function is called the kernel and $m$ is its order.

Symmetry of Kernel

For convenience, we assume that $\Phi$ is symmetric in its arguments, which means
$$\Phi(…,x_{i},…,x_{j},…)=\Phi(…,x_{j},…,x_{i},…)$$

But it is not necessary to assume the symmetry of the kernel. Because we can always convert a non-symmetric kernel to a symmetric one.

If $\Phi(x_1,…,x_m)$ is not symmetric, there exist a symmetric $\Phi_0(x_1,…,x_m)$ such that
$$\Phi_0(x_1,…,x_m)=\frac{1}{m!}\sum \Phi(x_{\alpha_1},…,x_{\alpha_m})$$
where the sum is over all permutations of $m$ integers $1,…,m$.

Motivating Examples

Example 1: Let $m=1$ and $\Phi(x)=x$. Then, $U=\sum_{i=1}^nX_i/n$ is the sample mean.

Example 2: Let $m=2$ and $\Phi(x_1,x_2)=x_1^2-x_1x_2$. Then, $U=\sum_{1\leq i<j\leq n}(X_i^2-X_iX_j)/(n(n-1))$ is the sample variance. A symmetric kernel can be $\Phi_0(x_1,x_2)=(x_1-x_2)^2/2$.

Example 3: Let $m=2$ and $\Phi(x_1, x_2)=I(x_1+x_2>0)$. Then, $U=\sum_{1\leq i<j\leq n} I(x_i+x_j>0)/(n(n-1))$ is related to one sample Wilcoxon statistics.

Asymptotic Normality of U-statistic: A Simple Version

Define $\zeta_k=\mathbb{V}(\Phi_k(X_1,…,X_k)), k=1,…,m$. Suppose that the kernel $\Phi$ satisfies $\mathbb{E}\Phi^2(X_1,…,X_m)<\infty$. Assume that $0<\zeta_1<\infty$. Then,
$$\frac{U-\theta}{\sqrt{\mathbb{V}(U)}}\xrightarrow{d}N(0,1)$$
where $\mathbb{V}(U)=\frac{1}{n}m^2\zeta_1+O(n^{-2})$.

U-statistic

Notations

  • $X_1,…,X_n$: $n$ independent random vectors with the same d.f. $F(x)=F(x^{(1)},…,x^{(r)})$.
  • $X_\nu=(X_\nu^{(1)},…,X_\nu^{(r)})$: $r$-dimensional random vector.
  • $x_1,…,x_n$: sample of $n$ $r$-dimensional vectors.
  • $\Phi(x_1,…,x_m)$: a symmetric function of $m(\leq n)$ vector arguments.
  • $\theta=\theta(F)$: functional of $F$.

Definition: U-statistic

Consider the function of the sample,

$$U\left(x_{1}, \cdots, x_{n}\right)=\left(\begin{array}{c}
n \tag{4.4}\\
m
\end{array}\right)^{-1} \Sigma^{\prime} \Phi\left(x_{\alpha_{1}}, \cdots, x_{\alpha_{m}}\right)$$

where the kernel $\Phi$ is symmetric in its $m$ vector arguments and the sum $\Sigma^{\prime}$ is extended over all subscripts $\alpha$ such that

$$1 \leq \alpha_{1}<\alpha_{2}<\cdots<\alpha_{m} \leq n$$

Eq (4.4) can be driven from
$$\begin{equation*}
U=U\left(x_{1}, \cdots, x_{n}\right)=\frac{\Phi_0 \left(x_{\alpha_{1}}, \cdots, x_{\alpha_{m}}\right)}{n(n-1) \cdots(n-m+1)} \Sigma^{\prime \prime} , \tag{4.1}
\end{equation*}$$

where $\Sigma^{\prime \prime}$ stands for summation over all permutations $\left(\alpha_{1}, \cdots, \alpha_{m}\right)$ of $m$ integers such that

$$\begin{equation*}
1 \leq \alpha_{i} \leq n, \quad \alpha_{i} \neq \alpha_{j} \text { if } i \neq j, \quad(i, j=1, \cdots, m) \tag{4.2}
\end{equation*}$$

Asymptotic Normality of U-statistic

The Variance of a U-statistic

The Unbiased Estimator and Its Variance

If $\theta=\theta(F)$, we have
$$E{U}=E\left\lbrace\Phi\left(X_{1}, \cdots, X_{m}\right)\right\rbrace=\theta$$

Let
$$\begin{align}
\Phi_{c}\left(x_{1}, \cdots, x_{c}\right)&=E\left\lbrace\Phi\left(x_{1}, \cdots, x_{c}, X_{c+1}, \cdots, X_{m}\right)\right\rbrace, \tag{5.2}\\(c&=1, \cdots, m),
\end{align}$$

where $x_{1}, \cdots, x_{c}$ are arbitrary fixed vectors and the expected value is taken with respect to the random vectors $X_{c+1}, \cdots, X_{m}$. Then

$$\begin{equation*}
\Phi_{c-1}\left(x_{1}, \cdots, x_{c-1}\right)=E\left\lbrace\Phi_{c}\left(x_{1}, \cdots, x_{c-1}, X_{c}\right)\right\rbrace \tag{5.3}
\end{equation*}$$

and

$$\begin{equation*}
E\left\lbrace\Phi_{c}\left(X_{1}, \cdots, X_{c}\right)\right\rbrace=\theta, \quad(c=1, \cdots, m) . \tag{5.4}
\end{equation*}$$

Define

$$\begin{align}
& \Psi\left(x_{1}, \cdots, x_{m}\right)=\Phi\left(x_{1}, \cdots, x_{m}\right)-\theta \tag{5.5}\\
& \Psi_{c}\left(x_{1}, \cdots, x_{c}\right)=\Phi_{c}\left(x_{1}, \cdots, x_{c}\right)-\theta, \quad(c=1, \cdots, m) . \tag{5.6}
\end{align}$$

We have

$$\begin{align}
\Psi_{c-1}\left(x_{1}, \cdots, x_{c-1}\right)=E\left\lbrace\Psi_{c}\left(x_{1}, \cdots, x_{c-1}, X_{c}\right)\right\rbrace \tag{5.7}\\
E\left\lbrace\Psi_{c}\left(X_{1}, \cdots, X_{c}\right)\right\rbrace=E\left\lbrace\Psi\left(X_{1}, \cdots, X_{m}\right)\right\rbrace=0, \quad(c=1, \cdots, m) . \tag{5.8}
\end{align}$$

Suppose that the variance of $\Psi_{c}\left(X_{1}, \cdots, X_{c}\right)$ exists, and let

$$\begin{equation*}
\zeta_{0}=0, \quad \zeta_{c}=E\left\lbrace\Psi_{c}^{2}\left(X_{1}, \cdots, X_{c}\right)\right\rbrace, \quad(c=1, \cdots, m) \tag{5.9}
\end{equation*}$$

We have

$$\begin{equation*}
\zeta_{c}=E\left\lbrace\Phi_{c}^{2}\left(X_{1}, \cdots, X_{c}\right)\right\rbrace-\theta^{2} \tag{5.10}
\end{equation*}$$

Stationary Order of a Functional

If, for some parent distribution $F=F_{0}$ and some integer $d$, we have $\zeta_{d}\left(F_{0}\right)=0$, this means that $\Psi_{d}\left(X_{1}, \cdots, X_{d}\right)=0$ with probability 1. By (5.7) and (5.9), $\zeta_{d}=0$ implies $\zeta_{1}=\cdots=\zeta_{d-1}=0$.

If $\zeta_{1}\left(F_{0}\right)=0$, we shall say that the regular functional $\theta(F)$ is stationary for $F=F_{0}$. If

$$\begin{equation*}
\zeta_{1}\left(F_{0}\right)=\cdots=\zeta_{d}\left(F_{0}\right)=0, \quad \zeta_{d+1}\left(F_{0}\right)>0, \quad(1 \leq d \leq m) \tag{5.11}
\end{equation*}$$

$\theta(F)$ will be called stationary of order $d$ for $F=F_{0}$.

The Variance of a U-statistic: i.i.d. Case

If $\left(\alpha_{1}, \cdots, \alpha_{m}\right)$ and $\left(\beta_{1}, \cdots, \beta_{m}\right)$ are two sets of $m$ different integers, $1 \leq \alpha_{i}$, $\beta_{i} \leq n$, and $c$ is the number of integers common to the two sets, we have, by the symmetry of $\Psi$,

$$\begin{equation*}
E\left\lbrace\Psi\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}}\right) \Psi\left(X_{\beta_{1}}, \cdots, X_{\beta_{m}}\right)\right\rbrace=\zeta_{c} \tag{5.12}
\end{equation*}$$

If the variance of $U$ exists, it is equal to

$$\begin{align}
\sigma^{2}(U) & =\left(\begin{array}{c}n \\m\end{array}\right)^{-2} E\left\lbrace\Sigma^{\prime} \Psi\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}}\right)\right\rbrace^{2} \\& =\left(\begin{array}{c}n \\m\end{array}\right)^{-2} \sum_{c=0}^{m} \Sigma^{(c)} E\left\lbrace\Psi\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}}\right) \Psi\left(X_{\beta_{1}}, \cdots, X_{\beta_{m}}\right)\right\rbrace
\end{align}$$

where $\Sigma^{(c)}$ stands for summation over all subscripts such that

$$1 \leq \alpha_{1}<\alpha_{2}<\cdots<\alpha_{m} \leq n, \quad 1 \leq \beta_{1}<\beta_{2}<\cdots<\beta_{m} \leq n$$

and exactly $c$ equations

$$\alpha_{i}=\beta_{j}$$

are satisfied. By (5.12), each term in $\Sigma^{(c)}$ is equal to $\zeta_{c}$. The number of terms in $\Sigma^{(c)}$ is easily seen to be

$$\frac{n(n-1) \cdots(n-2 m+c+1)}{c !(m-c) !(m-c) !}=\left(\begin{array}{l}m \\c\end{array}\right)\left(\begin{array}{l}
n-m \\m-c\end{array}\right)\left(\begin{array}{l}
n \\m\end{array}\right)$$

and hence, since $\zeta_{0}=0$,

$$\sigma^{2}(U)=\left(\begin{array}{l}
n \tag{5.13}\\m\end{array}\right)^{-1} \sum_{c=1}^{m}\left(\begin{array}{l}m \\c\end{array}\right)\left(\begin{array}{l}n-m \\
m-c\end{array}\right) \zeta_{c}$$

The Variance of a U-statistic: General Case

When the distributions of $X_{1}, \cdots, X_{n}$ are different, $F_{\nu}(x)$ being the d.f. of $X_{\nu}$, let

$$\begin{equation*}
\theta_{\alpha_{1}, \cdots, \alpha_{m}}=E\left\lbrace\Phi\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}}\right)\right\rbrace \tag{5.14}
\end{equation*}$$

$$\begin{align}
&\Psi_{c\left(\alpha_{1}, \cdots, \alpha_{c}\right) \beta_{1}, \cdots, \beta_{m-c}}\left(x_{1}, \cdots, x_{c}\right) \\
=&E\left\lbrace\Phi\left(x_{1}, \cdots, x_{c}, X_{\beta_{1}} \ldots, X_{\beta_{m-c}}\right)\right\rbrace-\theta_{\alpha_{1}, \cdots, \alpha_{c}, \beta_{1}, \cdots, \beta_{m-c}}, \tag{5.15}\\
&\qquad(c=1, \cdots, m),
\end{align}$$

$$\begin{align}
& \zeta_{c\left(\alpha_{1}, \cdots, \alpha_{c}\right) \beta_{1}, \cdots, \beta_{m-c} ; \gamma_{1}, \cdots, \gamma_{m-c}} \\
=&E\left\lbrace\Psi_{c\left(\alpha_{1}, \cdots, \alpha_{c}\right) \beta_{1}, \cdots, \beta_{m-c}}\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{c}}\right) \Psi_{c\left(\alpha_{1}, \cdots, \alpha_{c}\right) \gamma_{1}, \cdots, \gamma_{m-c}}\right. \tag{5.16}\\
& \left.{ }\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{c}}\right)\right\rbrace \\
\zeta_{c, n}=&\frac{c !(m-c) !(m-c) !}{n(n-1) \cdots(n-2 m+c+1)} \Sigma_{c\left(\alpha_{1}, \cdots, \alpha_{c}\right) \beta_{1}, \cdots, \beta_{m-c} ; \gamma_{1}, \cdots, \gamma_{m-c}} \tag{5.17}
\end{align}$$

where the sum is extended over all subscripts $\alpha, \beta, \gamma$ such that

$$\begin{align}
1 &\leq \alpha_{1}<\cdots<\alpha_{c} \leq n, \quad 1 \leq \beta_{1}<\cdots<\beta_{m-c} \leq n, \quad 1 \leq \gamma_{1}<\cdots \gamma_{m-c} \leq n \\
\alpha_{i}& \neq \beta_{j}, \quad \alpha_{i} \neq \gamma_{j}, \quad \beta_{i} \neq \gamma_{j}
\end{align}$$

Then the variance of $U$ is equal to

$$\sigma^{2}(U)=\left(\begin{array}{l}n \tag{5.18}\\m
\end{array}\right)^{-1} \sum_{c=1}^{m}\left(\begin{array}{l}
m \\c\end{array}\right)\left(\begin{array}{l}
n-m \\m-c
\end{array}\right) \zeta_{c, n}$$

Properties of the Moments and the Variance

Returning to the case of identically distributed $X$ ‘s, we shall now prove some inequalities satisfied by $\zeta_{1}, \cdots, \zeta_{m}$ and $\sigma^{2}(U)$ which are contained in the following theorems:

Theorem 5.1 The quantities $\zeta_{1}, \cdots, \zeta_{m}$ as defined by (5.9) satisfy the inequalities

$$\begin{equation*}
0 \leq \frac{\zeta_{c}}{c} \leq \frac{\zeta_{d}}{d} \quad \text { if } 1 \leq c<d \leq m \tag{5.19}
\end{equation*}$$

Theorem 5.2 The variance $\sigma^{2}\left(U_{n}\right)$ of a U-statistic $U_{n}=U\left(X_{1}, \cdots, X_{n}\right)$, where $X_{1}, \cdots, X_{n}$ are independent and identically distributed, satisfies the inequalities

$$\begin{equation*}
\frac{m^{2}}{n} \zeta_{1} \leq \sigma^{2}\left(U_{n}\right) \leq \frac{m}{n} \zeta_{m} \tag{5.20}
\end{equation*}$$

$n \sigma^{2}\left(U_{n}\right)$ is a decreasing function of $n$,

$$\begin{equation*}
(n+1) \sigma^{2}\left(U_{n+1}\right) \leq n \sigma^{2}\left(U_{n}\right), \tag{5.21}
\end{equation*}$$

which takes on its upper bound $m \zeta_{m}$ for $n=m$ and tends to its lower bound $m^{2} \zeta_{1}$ as $n$ increases:

$$\begin{align}
&\sigma^{2}\left(U_{m}\right)=\zeta_{m} \tag{5.22} \\
&\lim_{n \rightarrow \infty} n \sigma^{2}\left(U_{n}\right)=m^{2} \zeta_{1} \tag{5.23}
\end{align}$$

If $E\left\lbrace U_{n}\right\rbrace=\theta(F)$ is stationary of order $\geq d-1$ for the d.f. of $X_{\alpha},(5.20)$ may be replaced by

$$\begin{equation*}
\frac{m}{d} K_{n}(m, d) \zeta_{d} \leq \sigma^{2}\left(U_{n}\right) \leq K_{n}(m, d) \zeta_{m} \tag{5.24}
\end{equation*}$$

where

$$K_{n}(m, d)=\left(\begin{array}{l}
n \tag{5.25}\\m
\end{array}\right)^{-1} \sum_{c=d}^{m}\left(\begin{array}{l}
m-1 \\c-1
\end{array}\right)\left(\begin{array}{l}
n-m \\m-c
\end{array}\right)$$

A Necessary and Sufficient Condition for the Existence of the Variance

(5.13) and (5.19) imply that a necessary and sufficient condition for the existence of $\sigma^{2}(U)$ is the existence of

$$\begin{equation*}
\zeta_{m}=E\left\lbrace\Phi^{2}\left(X_{1}, \cdots, X_{m}\right)\right\rbrace-\theta^{2} \tag{5.26}
\end{equation*}$$

or that of $E\left\lbrace\Phi^{2}\left(X_{1}, \cdots, X_{m}\right)\right\rbrace$.

If $\zeta_{1}>0, \sigma^{2}(U)$ is of order $n^{-1}$.

If $\theta(F)$ is stationary of order $d$ for $F=F_{0}$, that is, if (5.11) is satisfied, $\sigma^{2}(U)$ is of order $n^{-d-1}$. Only if, for some $F=F_{0}, \theta(F)$ is stationary of order $m$, where $m$ is the degree of $\theta(F)$, we have $\sigma^{2}(U)=0$, and $U$ is equal to a constant with probability 1.

Lemma 5.1

For proving Theorem 5.1 we shall require the following:

Lemma 5.1. If

$$\delta_{d}=\zeta_{d}-\left(\begin{array}{l}
d \\1 \tag{5.27}
\end{array}\right) \zeta_{d-1}+\left(\begin{array}{l}
d \\2
\end{array}\right) \zeta_{d-2} \cdots+(-1)^{d-1}\left(\begin{array}{c}
d \\d-1
\end{array}\right) \zeta_{1}$$

we have

$$\begin{equation*}
\delta_{d} \geq 0, \quad(d=1, \cdots, m)^{6} \tag{5.28}
\end{equation*}$$

and

$$\zeta_{d}=\delta_{d}+\left(\begin{array}{l}
d \tag{5.29}\\1
\end{array}\right) \delta_{d-1}+\cdots+\left(\begin{array}{c}
d \\d-1
\end{array}\right) \delta_{1}$$

Proof. (5.29) follows from (5.27) by induction.

For proving (5.28) let

$$\eta_{0}=\theta^{2}, \quad \eta_{c}=E\left\lbrace\Phi_{c}^{2}\left(X_{1}, \cdots, X_{c}\right)\right\rbrace, \quad(c=1, \cdots, m)$$

Then, by (5.10),

$$\zeta_{c}=\eta_{c}-\eta_{0}$$

and on substituting this in (5.27) we have

$$\delta_{d}=\sum_{c=0}^{d}(-1)^{d-c}\left(\begin{array}{l} d \\ c \end{array}\right) \eta_{c}$$

From (5.9) it is seen that (5.28) is true for $d=1$. Suppose that (5.28) holds for $1, \cdots, d-1$. Then (5.28) will be shown to hold for $d$.

Let

$$\overline{\Phi_{0}} (x_{1})=\Phi_{1}(x_{1})-\theta,$$

$$\begin{align}
\overline{\Phi_{c}}&\left(x_{1}, x_{2}, \cdots, x_{c+1}\right), \quad(c=1, \cdots, d-1) \\
=\Phi_{c+1}&\left(x_{1}, \cdots, x_{c+1}\right)-\Phi_{c}\left(x_{2}, \cdots, x_{c+1}\right).
\end{align}$$

For an arbitrary fixed $x_{1}$, let

$$\overline{\eta_{c}}\left(x_{1}\right)=E\lbrace\overline{\Phi_{c}}^{2}\left(x_{1}, X_{2}, \cdots, X_{c+1}\right)\rbrace, \quad(c=0, \cdots, d-1)$$

Then, by induction hypothesis,

$$\begin{align}\overline{\delta_{d-1}}\left(x_{1}\right)=\sum_{c=0}^{d-1}(-1)^{d-1-c}\left(\begin{array}{c}
d-1 \\ c
\end{array}\right) \overline{\eta_{c}}\left(x_{1}\right) \geq 0\end{align}$$

for any fixed $x_{1}$.

Now,

$$E\left\lbrace\overline{\eta_{c}}\left(X_{1}\right)\right\rbrace=\eta_{c+1}-\eta_{c}$$

and hence

$$\begin{align}
E\left\lbrace\overline{\delta_{d-1}}\left(X_{1}\right)\right\rbrace&=\sum_{c=0}^{d-1}(-1)^{d-1-c}\left(\begin{array}{c}
d-1 \\ c
\end{array}\right)\left(\eta_{c+1}-\eta_{c}\right)\\
&=\sum_{c=0}^{d}(-1)^{d-c}\left(\begin{array}{l}
d \\ c
\end{array}\right) \eta_{c}=\delta_{d}
\end{align}$$

The proof of Lemma 5.1 is complete.

Proof of Theorems 5.1

By (5.29) we have for $c<d$

$$\begin{align}
c \zeta_{d}-d \zeta_{c} & =c \sum_{a=1}^{d}\left(\begin{array}{l}d \\ a
\end{array}\right) \delta_{a}-d \sum_{a=1}^{c}\left(\begin{array}{l}c \\ a
\end{array}\right) \delta_{a} \\
& =\sum_{a=1}^{c}\left[c\left(\begin{array}{l}d \\ a
\end{array}\right)-d\left(\begin{array}{l}c \\ a
\end{array}\right)\right] \delta_{a}+c \sum_{a=c+1}^{d}\left(\begin{array}{l}d \\ a
\end{array}\right) \delta_{a} \tag{5.30}
\end{align}$$

From (5.28), and since $c\left(\begin{array}{l}d \\ a\end{array}\right)-d\left(\begin{array}{l}c \\ a\end{array}\right) \geq 0$ if $1 \leq a \leq c \leq d$, it follows that each term in the two sums of (5.30) is not negative. This, in connection with (5.9) proves Theorem 5.1.

Proof of Theorem 5.2.

From (5.19) we have

$$c \zeta_{1} \leq \zeta_{c} \leq \frac{c}{m} \zeta_{m}, \quad(c=1, \cdots, m)$$

Applying these inequalities to each term in (5.13) and using the identity

$$\left(\begin{array}{l}
n \tag{5.31}\\ m
\end{array}\right)^{-1} \sum_{c=1}^{m} c\left(\begin{array}{l}
m \\ c
\end{array}\right)\left(\begin{array}{l}
n-m \\ m-c
\end{array}\right)=\frac{m^{2}}{n}$$

we obtain (5.20).

(5.22) and (5.23) follow immediately from (5.13).

For (5.21) we may write

$$\begin{equation*}
D_{n} \geq 0 \tag{5.32}
\end{equation*}$$

where

$$D_{n}=n \sigma^{2}\left(U_{n}\right)-(n+1) \sigma^{2}\left(U_{n+1}\right)$$

Let

$$D_{n}=\sum_{c=1}^{m} d_{n, c} \zeta_{c}$$

Then we have from (5.13)

$$\begin{align}
& d_{n, c}=n\left(\begin{array}{c}
m \\ c\end{array}\right)\left(\begin{array}{l}
n-m \\ m-c\end{array}\right)\left(\begin{array}{l}n \\ m
\end{array}\right)^{-1}-(n+1)\left(\begin{array}{c}
m \\ c\end{array}\right) \tag{5.33}\\
&\left(\begin{array}{c}
n+1-m \\ m-c\end{array}\right)\left(\begin{array}{c}
n+1 \\ m\end{array}\right)^{-1},
\end{align}$$

or

$$\begin{align}
& d_{n, c}=\left(\begin{array}{c}
m \\ c
\end{array}\right)\left(\begin{array}{c}
n-m+1 \\ m-c
\end{array}\right)(n-m+1)^{-1}\left(\begin{array}{l}
n \\ m
\end{array}\right)^{-1}\left\lbrace(c-1) n-(m-1)^{2}\right\rbrace, \\
&(1 \leq c \leq m \leq n).
\end{align}$$

Putting

$$c_{0}=1+\left[\frac{(m-1)^{2}}{n}\right]$$

where $[u]$ denotes the largest integer $\leq u$, we have

$$\begin{array}{ll}
d_{n, c} \leq 0 & \text { if } c \leq c_{0} \\
d_{n, c}>0 & \text { if } c>c_{0} .
\end{array}$$

Hence, by (5.19),

$$d_{n, c} \zeta_{c} \geq \frac{1}{c_{0}} \zeta_{c_{0}} c d_{n, c}, \quad(c=1, \cdots, m)$$

and

$$D_{n} \geq \frac{1}{c_{0}} \zeta_{c_{0}} \sum_{c=1}^{m} c d_{n, c}$$

By (5.33) and (5.31), the latter sum vanishes. This proves (5.32).

For the stationary case $\zeta_{1}=\cdots=\zeta_{d-1}=0$, (5.24) is a direct consequence of (5.13) and (5.19). The proof of Theorem 5.2 is complete.

Properties of the Covariance

Let’s talk about the covariance of two U-statistics. Consider a set of $g$ U-statistics,

$$U^{(\gamma)}=\left(\begin{array}{c}
n \\m(\gamma)
\end{array}\right)^{-1} \Sigma^{\prime} \Phi^{(\gamma)}\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}(\gamma)}\right), \quad(\gamma=1, \cdots, g)$$

each $U^{(\gamma)}$ being a function of the same $n$ independent, identically distributed random vectors $X_{1}, \cdots, X_{n}$. The function $\Phi^{(\gamma)}$ is assumed to be symmetric in its $m(\gamma)$ arguments $(\gamma=1, \cdots, g)$.

Let

$$\begin{align}
E\left\lbrace U^{(\gamma)}\right\rbrace=E\left\lbrace\Phi^{(\gamma)}\left(X_{1}, \cdots, X_{m(\gamma)}\right)\right\rbrace=\theta^{(\gamma)}, \quad(\gamma=1, \cdots, g) ; \\
\Psi^{(\gamma)}\left(x_{1}, \cdots, x_{m(\gamma)}\right)=\Phi^{(\gamma)}\left(x_{1}, \cdots, x_{m(\gamma)}\right)-\theta^{(\gamma)}, \quad(\gamma=1, \cdots, g) ; \tag{6.1}\\
\Psi_{c}^{(\gamma)}\left(x_{1}, \cdots, x_{c}\right)=E\left\lbrace\Psi^{(\gamma)}\left(x_{1}, \cdots, x_{c}, X_{c+1}, \cdots, X_{m(\gamma)}\right)\right\rbrace, \tag{6.2}\\
(c=1, \cdots, m(\gamma) ; \gamma=1, \cdots, g) ; \\
\zeta_{c}^{(\gamma, \delta)}=E\left\lbrace\Psi_{c}^{(\gamma)}\left(X_{1}, \cdots, X_{c}\right) \Psi_{c}^{(\delta)}\left(X_{1}, \cdots, X_{c}\right)\right\rbrace, \tag{6.3}
\end{align}$$

$$(\gamma, \delta=1, \cdots, g)$$

If, in particular, $\gamma=\delta$, we shall write

$$\begin{equation*}
\zeta_{c}^{(\gamma)}=\zeta_{c}^{(\gamma, \gamma)}=E\left\lbrace\Psi_{c}^{(\gamma)}\left(X_{1}, \cdots, X_{c}\right)\right\rbrace^{2} \tag{6.4}
\end{equation*}$$

Let

$$\sigma\left(U^{(\gamma)}, U^{(\delta)}\right)=E\left\lbrace\left(U^{(\gamma)}-\theta^{(\gamma)}\right)\left(U^{(\delta)}-\theta^{(\delta)}\right)\right\rbrace$$

be the covariance of $U^{(\gamma)}$ and $U^{(\delta)}$.

In a similar way as for the variance, we find, if $m(\gamma) \leq m(\delta)$,

$$\sigma\left(U^{(\gamma)}, U^{(\delta)}\right)=\left(\begin{array}{c}
n \tag{6.5}\\ m(\gamma)
\end{array}\right)^{-1} \sum_{c=1}^{m(\gamma)}\left(\begin{array}{c}m(\delta) \\ c
\end{array}\right)\left(\begin{array}{c}
n-m(\delta) \\ m(\gamma)-c
\end{array}\right) \zeta_{c}^{(\gamma, \delta)} .$$

The right hand side is easily seen to be symmetric in $\gamma, \delta$.

For $\gamma=\delta,(6.5)$ is the variance of $U^{(\gamma)}$ (cf. (5.13)).

We have from (5.23) and (6.5)

$$\begin{align}
\lim_{n \rightarrow \infty} n \sigma^{2}\left(U^{(\gamma)}\right) & =m^{2}(\gamma) \xi_{1}^{(\gamma)} \\
\lim_{n \rightarrow \infty} n \sigma\left(U^{(\gamma)}, U^{(\delta)}\right) & =m(\gamma) m(\delta) \xi_{1}^{(\gamma, \delta)} .
\end{align}$$

Hence, if $\zeta_{1}^{(\gamma)} \neq 0$ and $\zeta_{1}^{(\delta)} \neq 0$, the product moment correlation $\rho\left(U^{(\gamma)}, U^{(\delta)}\right)$ between $U^{(\gamma)}$ and $U^{(\delta)}$ tends to the limit

$$\begin{equation*}
\lim_{n \rightarrow \infty} \rho\left(U^{(\gamma)}, U^{(\delta)}\right)=\frac{\zeta_{1}^{(\gamma, \delta)}}{\sqrt{\zeta_{1}^{(\gamma)} \zeta_{1}^{(\delta)}}} \tag{6.6}
\end{equation*}$$

The Limit Theorems: i.i.d. Case

In this section the vectors $X_{\alpha}$ will be assumed to be identically distributed.

Notes:
Converge of the Distribution Function:
A sequence of d.f.’s $F_{1}(x)$, $F_{2}(x), \cdots$ converges to a d.f. $F(x)$ if $\lim F_{n}(x)=F(x)$ in every point at which the one-dimensional marginal limiting d.f.’s are continuous.

Singularity of the Distribution:
A $g$-variate normal distribution is called non-singular if the rank $r$ of its covariance matrix is equal to $g$, and singular if $r<g$.

LEMMA 7.1. Let $V_{1}, V_{2}, \cdots$ be an infinite sequence of random vectors $V_{n}=$ $\left(V_{n}^{(1)}, \cdots, V_{n}^{(g)}\right)$, and suppose that the d.f. $F_{n}(v)$ of $V_{n}$ tends to a d.f. $F(v)$ as $n \rightarrow \infty$. Let $V_{n}^{(\gamma)^{\prime}}=V_{n}^{(\gamma)}+d_{n}^{(\gamma)}$, where

$$\begin{align}
\lim_{n \rightarrow \infty} E\left\lbrace d_{n}^{(\gamma)}\right\rbrace^{2}=0, \quad(\gamma=1, \cdots, g) \tag{7.1}
\end{align}$$

Then the d.f. of $V_{n}^{\prime}=\left(V_{n}^{(1)}, \cdots, V_{n}^{(g)}\right)$ tends to $F(v)$.

The Limit Theorem 7.1 and 7.2

Theorem 7.1. Let $X_{1}, \cdots, X_{n}$ be $n$ independent, identically distributed random vectors,

$$X_{\alpha}=\left(X_{\alpha}^{(1)}, \cdots, X_{\alpha}^{(r)}\right), \quad(\alpha=1, \cdots, n)$$

Let

$$\Phi^{(\gamma)}\left(x_{1}, \cdots, x_{m(\gamma)}\right), \quad(\gamma=1, \cdots, g),$$

be $g$ real-valued functions not involving $n, \Phi^{(\gamma)}$ being symmetric in its $m(\gamma)(\leq n)$ vector arguments $x_{\alpha}=\left(x_{\boldsymbol{\alpha}}^{(1)}, \cdots, x_{\alpha}^{(r)}\right),(\alpha=1, \cdots, m(\gamma) ; \gamma=1, \cdots, g)$. Define

$$U^{(\gamma)}=\left(\begin{array}{c}
n \tag{7.2}\\m(\gamma)
\end{array}\right)^{-1} \Sigma^{\prime} \Phi^{(\gamma)}\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}(\gamma)}\right), \quad(\gamma=1, \cdots, g)$$

where the summation is over all subscripts such that $1 \leq \alpha_{1}<\cdots<\alpha_{m(\gamma)} \leq n$. Then, if the expected values

$$\begin{equation*}
\theta^{(\gamma)}=E\left\lbrace\Phi^{(\gamma)}\left(X_{1}, \cdots, X_{m(\gamma)}\right)\right\rbrace, \quad(\gamma=1, \cdots ; g) \tag{7.3}
\end{equation*}$$

and

$$\begin{equation*}
E\left\lbrace\Phi^{(\gamma)}\left(X_{1}, \cdots, X_{m(\gamma)}\right)\right\rbrace^{2}, \quad(\gamma=1, \cdots, g) \tag{7.4}
\end{equation*}$$

exist, the joint d.f. of

$$\sqrt{n}\left(U^{(1)}-\theta^{(1)}\right), \cdots, \sqrt{n}\left(U^{(0)}-\theta^{(\theta)}\right)$$

tends, as $n \rightarrow \infty$, to the g-variate normal d.f. with zero means and covariance matrix $\left(m(\gamma) m(\delta) \zeta_{1}^{(\gamma, \delta)}\right)$, where $\zeta_{1}^{(\gamma, \delta)}$ is defined by (6.3). The limiting distribution is non-singular if the determinant $\left|\zeta_{1}^{(\gamma, \delta)}\right|$ is positive.

According to Theorem 5.2, $\sigma^{2}(U)$ exceeds its asymptotic value $m^{2} \zeta_{1} / n$ for any finite $n$. Hence, when $n$ is large but finite, Theorem 7.1 underestimate the variance of $U$. And for such cases the following theorem, which is an immediate consequence of Theorem 7.1, will be more useful.

Theorem 7.2. Under the conditions of Theorem 7.1, and if

$$\zeta_{1}^{(\gamma)}>0, \quad(\gamma=1, \cdots, g)$$

the joint d.f. of

$$\left(U^{(1)}-\theta^{(1)}\right) / \sigma\left(U^{(1)}\right), \cdots,\left(U^{(g)}-\theta^{(g)}\right) / \sigma\left(U^{(g)}\right)$$

tends, as $n \rightarrow \infty$, to the $g$-variate normal d.f. with zero means and covariance matrix $\left(\rho^{(\gamma, \delta)}\right)$, where

$$\rho^{(\gamma, \delta)}=\lim_{n \rightarrow \infty} \frac{\sigma\left(U^{(\gamma)}, U^{(\delta)}\right)}{\sigma\left(U^{(\gamma)}\right) \sigma\left(U^{(\delta)}\right)}=\frac{\zeta_{1}^{(\gamma, \delta)}}{\sqrt{\zeta_{1}^{(\gamma)} \zeta_{1}^{(\delta)}}}, \quad(\gamma, \delta=1, \cdots, g).$$

Proof of Theorem 7.1. The existence of (7.4) entails that of

$$\zeta_{m}^{(\gamma)}=E\left\lbrace\Phi^{(\gamma)}\left(X_{1}, \cdots, X_{m(\gamma)}\right)\right\rbrace^{2}-\left(\theta^{(\gamma)}\right)^{2}$$

which, by (5.19), (5.20) and (6.6), is sufficient for the existence of

$$\zeta_{1}^{(\gamma)}, \cdots, \zeta_{m-1}^{(\gamma)} \text {, of } \sigma^{2}\left(U^{(\gamma)}\right) \text {, and of } \zeta_{1}^{(\gamma, \delta)} \leq \sqrt{\zeta_{1}^{(\gamma)} \zeta_{1}^{(\delta)}}$$

Now, consider the $g$ quantities

$$Y^{(\gamma)}=\frac{m(\gamma)}{\sqrt{n}} \sum_{\alpha=1}^{n} \Psi_{1}^{(\gamma)}\left(X_{\alpha}\right), \quad(\gamma=1, \cdots, g)$$

where $\Psi_{1}^{(\gamma)}(x)$ is defined by (6.2). $\quad Y^{(1)}, \cdots, Y^{(g)}$ are sums of $n$ independent, random variables with zero means, whose covariance matrix, by virtue of (6.3), is

$$\begin{equation*}
\left\lbrace\sigma\left(Y^{(\gamma)}, Y^{(\delta)}\right)\right\rbrace=\left\lbrace m(\gamma) m(\delta) \xi_{1}^{(\gamma, \delta)}\right\rbrace \tag{7.5}
\end{equation*}$$

By the Central Limit Theorem for vectors (cf. Cramer [1, p. 112]), the joint d.f. of $\left(Y^{(1)}, \cdots, Y^{(\rho)}\right)$ tends to the normal $g$-variate d.f. with the same means and covariances.

Theorem 7.1 will be proved by showing that the $g$ random variables

$$\begin{equation*}
Z^{(\gamma)}=\sqrt{n}\left(U^{(\gamma)}-\theta^{(\gamma)}\right), \quad(\gamma=1, \cdots, g), \tag{7.6}
\end{equation*}$$

have the same joint limiting distribution as $Y^{(1)}, \cdots, Y^{(g)}$.

According to Lemma 7.1 it is sufficient to show that

$$\begin{equation*}
\lim _{n \rightarrow \infty} E\left(Z^{(\gamma)}-Y^{(\gamma)}\right)^{2}=0, \quad(\gamma=1, \cdots, n) \tag{7.7}
\end{equation*}$$

For proving (7.7), write

$$\begin{equation*}
E\left\lbrace Z^{(\gamma)}-Y^{(\gamma)}\right\rbrace^{2}=E\left\lbrace Z^{(\gamma)}\right\rbrace^{2}+E\left\lbrace Y^{(\gamma)}\right\rbrace^{2}-2 E\left\lbrace Z^{(\gamma)} Y^{(\gamma)}\right\rbrace \tag{7.8}
\end{equation*}$$

By (5.13) we have

$$\begin{equation*}
E\left\lbrace Z^{(\gamma)}\right\rbrace^{2}=n \sigma^{2}\left(U^{(\gamma)}\right)=m^{2}(\gamma) \zeta_{1}^{(\gamma)}+O\left(n^{-1}\right) \tag{7.9}
\end{equation*}$$

and from (7.5),

$$\begin{equation*}
E\left\lbrace Y^{(\gamma)}\right\rbrace^{2}=m^{2}(\gamma) \zeta_{1}^{(\gamma)} \tag{7.10}
\end{equation*}$$

By (7.2) and (6.1) we may write for (7.6)

$$Z^{(\gamma)}=\sqrt{n}\left(\begin{array}{c}
n \\m(\gamma)
\end{array}\right)^{-1} \Sigma^{\prime} \Psi^{(\gamma)}\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}(\gamma)}\right)$$

and hence

$$E\left\lbrace Z^{(\gamma)} Y^{(\gamma)}\right\rbrace=m(\gamma)\left(\begin{array}{c}
n \\m(\gamma)
\end{array}\right)^{-1} \sum_{\alpha=1}^{n} \sum^{\prime} E\left\lbrace\Psi_{1}^{(\gamma)}\left(X_{\alpha}\right) \Psi^{(\gamma)}\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}(\gamma)}\right)\right\rbrace.$$

The term

$$E\left\lbrace\Psi_{1}^{(\gamma)}\left(X_{\alpha}\right) \Psi^{(\gamma)}\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}(\gamma)}\right)\right\rbrace$$

is $=\zeta_{1}^{(\gamma)}$ if

$$\begin{equation*}
\alpha_{1}=\alpha \quad \text { or } \quad \alpha_{2}=\alpha \cdots \quad \text { or } \quad \alpha_{m(\gamma)}=\alpha \tag{7.11}
\end{equation*}$$

and 0 otherwise. For a fixed $\alpha$, the number of $\operatorname{sets}\left\lbrace\alpha_{1}, \cdots, \alpha_{m(\gamma)}\right\rbrace$ such that $1 \leq \alpha_{1}<\cdots<\alpha_{m(\gamma)} \leq n$ and (7.11) is satisfied, is $\left(\begin{array}{c}n-1 \\ m(\gamma)-1\end{array}\right)$. Thus,

$$E\left\lbrace Z^{(\gamma)} Y^{(\gamma)}\right\rbrace=m(\gamma)\left(\begin{array}{c}
n \tag{7.12}\\m(\gamma)
\end{array}\right)^{-1} n\left(\begin{array}{c}
n-1 \\m(\gamma)-1
\end{array}\right) \zeta_{1}^{(\gamma)}=m^{2}(\gamma) \zeta_{1}^{(\gamma)}$$

On inserting (7.9), (7.10), and (7.12) in (7.8), we see that (7.7) is true.

The proof of Theorem 7.1 is complete.

The Limit Theorem 7.3: Extension Theorem 7.1 to a Larger Class of Statistics

The application of Lemma 7.1 leads immediately to the following extension of Theorem 7.1 to a larger class of statistics.

Theorem 7.3. Let

$$\begin{equation*}
U^{(g)^{\prime}}=U^{(g)}+\frac{b_{n}^{(\gamma)}}{\sqrt{n}}, \quad(\gamma=1, \cdots, g) \tag{7.13}
\end{equation*}$$

where $U^{(\gamma)}$ is defined by (7.2) and $b_{n}^{(\gamma)}$ is a random variable. If the conditions of Theorem 7.1 are satisfied, and $\lim E\left\lbrace b_{n}^{(\gamma)}\right\rbrace^{2}=0,(\gamma=1, \cdots, g)$, then the joint distribution of

$$\sqrt{n}\left(U^{(1) \prime}-\theta^{(1)}\right), \cdots, \sqrt{n}\left(U^{(g) \prime}-\theta^{(g)}\right)$$

tends to the normal distribution with zero means and covariance matrix

$$\left\lbrace m(\gamma) m(\delta) \zeta_{1}^{(\gamma, \delta)}\right\rbrace$$

The Limit Theorem 7.4: Application to Sample Functionals

The theorem 7.3 applies, in particular, to the regular functionals $\theta(S)$ of the sample d.f.,

$$\theta(S)=\frac{1}{n^{m}} \sum_{\alpha_{1}=1}^{n} \ldots \sum_{\alpha_{m}=1}^{n} \Phi\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}}\right)$$

in the case that the variance of $\theta(S)$ exists. For we may write

$$n^{m} \theta(S)=\left(\begin{array}{l}
n \\ m
\end{array}\right) U+\Sigma^\star \Phi\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}}\right)$$

where the sum $\Sigma^\star$ is extended over all $m$-tuples $\left(\alpha_{1}, \cdots, \alpha_{m}\right)$ in which at least one equality $\alpha i=\alpha_{j}(i \neq j)$ is satisfied. The number of terms in $\Sigma^\star$ is of order $n^{m-1}$. Hence

$$\theta(S)-U=\frac{1}{n} D$$

where the expected value $E\left\lbrace D^{2}\right\rbrace$, whose existence follows from that of $\sigma^{2}{\theta(S)}$, is bounded for $n \rightarrow \infty$. Thus, if we put $U^{(\gamma)^{\prime}}=\theta^{(\gamma)}(S)$, the conditions of Theorem 7.3 are fulfilled. We may summarize this result as follows:

Theorem 7.4. Let $X_{1}, \cdots, X_{n}$ be a random sample from an r-variate population with d.f. $F(x)=F\left(x^{(1)}, \cdots, x^{(r)}\right)$, and let

$\theta^{(\gamma)}(F)=\int \cdots \int \Phi^{(\gamma)}\left(x_{1}, \cdots, x_{m(\gamma)}\right) d F\left(x_{1}\right) \cdots d F\left(x_{m}{ }^{(\gamma)}\right), \quad(\gamma=1, \cdots, g)$,

be $g$ regular functionals of $F$, where $\Phi^{(\gamma)}\left(x_{1}, \cdots, x_{m(\gamma)}\right)$ is symmetric in the vectors $x_{1}, \cdots, x_{m(\gamma)}$ and does not involve $n$. If $S(x)$ is the d.f. of the random sample, and if the variance of

$$\theta^{(\gamma)}(S)=\frac{1}{n^{m}} \sum_{\alpha_{1}=1}^{n} \cdots \sum_{\alpha_{m}(\gamma)=1}^{n} \Phi^{(\gamma)}\left(X_{\alpha_{1}}, \cdots, X_{\alpha_{m}(\gamma)}\right)$$

exists, the joint d.f. of

$$\sqrt{n}\left\lbrace\theta^{(1)}(S)-\theta^{(1)}(F)\right\rbrace, \cdots, \sqrt{n}\left\lbrace\theta^{(\theta)}(S)-\theta^{(g)}(F)\right\rbrace$$

tends to the g-variate normal d.f. with zero means and covariance matrix

$$\left\lbrace m(\gamma) m(\delta) \zeta_{1}^{(\gamma, \delta)}\right\rbrace$$

The Limit Theorem 7.5: Application to Functions of Statistics

The following theorem is concerned with the asymptotic distribution of a function of statistics of the form $U$ or $U^{\prime}$.

Theorem 7.5. Let $U^{\prime}=\left(U^{(1)^{\prime}}, \cdots, U^{(g) \prime}\right)$ be a random vector, where $U^{(\gamma)}$ is defined by (7.13), and suppose that the conditions of Theorem 7.3 are satisfied. If the function $h(y)=h\left(y^{(1)}, \cdots, y^{(g)}\right)$ does not involve $n$ and is continuous together with its second order partial derivatives in some neighborhood of the point $(y)=(\theta)=$ $\left(\theta^{(1)}, \cdots, \theta^{(g)}\right)$, then the distribution of the random variable $\sqrt{n}\left\lbrace h\left(U^{\prime}\right)-h(\theta)\right\rbrace$ tends to the normal distribution with mean zero and variance

$$\begin{equation*}
\sum_{\gamma=1}^{g} \sum_{\delta=1}^{g} m(\gamma) m(\delta) \left(\frac{\partial h(y)}{\partial y^{(\gamma)}} \frac{\partial h(y)}{\partial y^{(\delta)}}\right)_{y=\theta} \zeta_1^{(\gamma, \delta)}
\end{equation*}$$

Applications to particular statistics

  • Moments and functions of moments
  • Mean different and coefficient of concentration
  • Functions of ranks and of the signs of variate differences
  • Difference sign correlation
  • Rank correlation and grade correlation
  • Non-parametric tests of independence
  • Mann’s test against trend
  • The coefficient of partial difference sign correlation

References

[1] W. Hoeffding, ‘A Class of Statistics with Asymptotically Normal Distribution’, Ann. Math. Statist., vol. 19, no. 3, pp. 293–325, Sep. 1948, doi: 10.1214/aoms/1177730196.

[2] Shao J. Mathematical statistics[M]. Springer Science & Business Media, 2003.