Notes on U Statistic

Notes for A Class of Statistics with Asymptotically Normal Distribution by Wassily Hoeffding.

You can view the lecture notes on U-statistics.

Basic Concepts and Motivating Examples

Functional, Kernel and Order

Suppose that Φ(x1,,xm) is a function of m random variables and X1,,Xn are i.i.d. observations from c.d.f. F(x). Assume n>m.

Make inference about
θ=θ(F)=EF[Φ(X1,,Xm)]

Note: θ is a functional of the distribution F. Functional is a function of a function which maps a function to a real (or complex) number.

To use all the samples, we can use the U-statistic
U=(mn)1ΣΦ(Xi1,,Xim)=1n(n1)(nm+1)1i1<<imnΦ(Xi1,,Xim)
where Σ stands for summation over all permutations of m integers i1,,im such that 1i1<<imn.

Here, U is a function of the sample X1,,Xn called a U-statistic. The Φ function is called the kernel and m is its order.

Symmetry of Kernel

For convenience, we assume that Φ is symmetric in its arguments, which means
Φ(,xi,,xj,)=Φ(,xj,,xi,)

But it is not necessary to assume the symmetry of the kernel. Because we can always convert a non-symmetric kernel to a symmetric one.

If Φ(x1,,xm) is not symmetric, there exist a symmetric Φ0(x1,,xm) such that
Φ0(x1,,xm)=1m!Φ(xα1,,xαm)
where the sum is over all permutations of m integers 1,,m.

Motivating Examples

Example 1: Let m=1 and Φ(x)=x. Then, U=i=1nXi/n is the sample mean.

Example 2: Let m=2 and Φ(x1,x2)=x12x1x2. Then, U=1i<jn(Xi2XiXj)/(n(n1)) is the sample variance. A symmetric kernel can be Φ0(x1,x2)=(x1x2)2/2.

Example 3: Let m=2 and Φ(x1,x2)=I(x1+x2>0). Then, U=1i<jnI(xi+xj>0)/(n(n1)) is related to one sample Wilcoxon statistics.

Asymptotic Normality of U-statistic: A Simple Version

Define ζk=V(Φk(X1,,Xk)),k=1,,m. Suppose that the kernel Φ satisfies EΦ2(X1,,Xm)<. Assume that 0<ζ1<. Then,
UθV(U)dN(0,1)
where V(U)=1nm2ζ1+O(n2).

U-statistic

Notations

  • X1,,Xn: n independent random vectors with the same d.f. F(x)=F(x(1),,x(r)).
  • Xν=(Xν(1),,Xν(r)): r-dimensional random vector.
  • x1,,xn: sample of n r-dimensional vectors.
  • Φ(x1,,xm): a symmetric function of m(n) vector arguments.
  • θ=θ(F): functional of F.

Definition: U-statistic

Consider the function of the sample,

(4.4)U(x1,,xn)=(nm)1ΣΦ(xα1,,xαm)

where the kernel Φ is symmetric in its m vector arguments and the sum Σ is extended over all subscripts α such that

1α1<α2<<αmn

Eq (4.4) can be driven from
(4.1)U=U(x1,,xn)=Φ0(xα1,,xαm)n(n1)(nm+1)Σ,

where Σ stands for summation over all permutations (α1,,αm) of m integers such that

(4.2)1αin,αiαj if ij,(i,j=1,,m)

Asymptotic Normality of U-statistic

The Variance of a U-statistic

The Unbiased Estimator and Its Variance

If θ=θ(F), we have
EU=E{Φ(X1,,Xm)}=θ

Let
(5.2)Φc(x1,,xc)=E{Φ(x1,,xc,Xc+1,,Xm)},(c=1,,m),

where x1,,xc are arbitrary fixed vectors and the expected value is taken with respect to the random vectors Xc+1,,Xm. Then

(5.3)Φc1(x1,,xc1)=E{Φc(x1,,xc1,Xc)}

and

(5.4)E{Φc(X1,,Xc)}=θ,(c=1,,m).

Define

(5.5)Ψ(x1,,xm)=Φ(x1,,xm)θ(5.6)Ψc(x1,,xc)=Φc(x1,,xc)θ,(c=1,,m).

We have

(5.7)Ψc1(x1,,xc1)=E{Ψc(x1,,xc1,Xc)}(5.8)E{Ψc(X1,,Xc)}=E{Ψ(X1,,Xm)}=0,(c=1,,m).

Suppose that the variance of Ψc(X1,,Xc) exists, and let

(5.9)ζ0=0,ζc=E{Ψc2(X1,,Xc)},(c=1,,m)

We have

(5.10)ζc=E{Φc2(X1,,Xc)}θ2

Stationary Order of a Functional

If, for some parent distribution F=F0 and some integer d, we have ζd(F0)=0, this means that Ψd(X1,,Xd)=0 with probability 1. By (5.7) and (5.9), ζd=0 implies ζ1==ζd1=0.

If ζ1(F0)=0, we shall say that the regular functional θ(F) is stationary for F=F0. If

(5.11)ζ1(F0)==ζd(F0)=0,ζd+1(F0)>0,(1dm)

θ(F) will be called stationary of order d for F=F0.

The Variance of a U-statistic: i.i.d. Case

If (α1,,αm) and (β1,,βm) are two sets of m different integers, 1αi, βin, and c is the number of integers common to the two sets, we have, by the symmetry of Ψ,

(5.12)E{Ψ(Xα1,,Xαm)Ψ(Xβ1,,Xβm)}=ζc

If the variance of U exists, it is equal to

σ2(U)=(nm)2E{ΣΨ(Xα1,,Xαm)}2=(nm)2c=0mΣ(c)E{Ψ(Xα1,,Xαm)Ψ(Xβ1,,Xβm)}

where Σ(c) stands for summation over all subscripts such that

1α1<α2<<αmn,1β1<β2<<βmn

and exactly c equations

αi=βj

are satisfied. By (5.12), each term in Σ(c) is equal to ζc. The number of terms in Σ(c) is easily seen to be

n(n1)(n2m+c+1)c!(mc)!(mc)!=(mc)(nmmc)(nm)

and hence, since ζ0=0,

(5.13)σ2(U)=(nm)1c=1m(mc)(nmmc)ζc

The Variance of a U-statistic: General Case

When the distributions of X1,,Xn are different, Fν(x) being the d.f. of Xν, let

(5.14)θα1,,αm=E{Φ(Xα1,,Xαm)}

Ψc(α1,,αc)β1,,βmc(x1,,xc)(5.15)=E{Φ(x1,,xc,Xβ1,Xβmc)}θα1,,αc,β1,,βmc,(c=1,,m),

ζc(α1,,αc)β1,,βmc;γ1,,γmc(5.16)=E{Ψc(α1,,αc)β1,,βmc(Xα1,,Xαc)Ψc(α1,,αc)γ1,,γmc(Xα1,,Xαc)}(5.17)ζc,n=c!(mc)!(mc)!n(n1)(n2m+c+1)Σc(α1,,αc)β1,,βmc;γ1,,γmc

where the sum is extended over all subscripts α,β,γ such that

1α1<<αcn,1β1<<βmcn,1γ1<γmcnαiβj,αiγj,βiγj

Then the variance of U is equal to

(5.18)σ2(U)=(nm)1c=1m(mc)(nmmc)ζc,n

Properties of the Moments and the Variance

Returning to the case of identically distributed X ‘s, we shall now prove some inequalities satisfied by ζ1,,ζm and σ2(U) which are contained in the following theorems:

Theorem 5.1 The quantities ζ1,,ζm as defined by (5.9) satisfy the inequalities

(5.19)0ζccζdd if 1c<dm

Theorem 5.2 The variance σ2(Un) of a U-statistic Un=U(X1,,Xn), where X1,,Xn are independent and identically distributed, satisfies the inequalities

(5.20)m2nζ1σ2(Un)mnζm

nσ2(Un) is a decreasing function of n,

(5.21)(n+1)σ2(Un+1)nσ2(Un),

which takes on its upper bound mζm for n=m and tends to its lower bound m2ζ1 as n increases:

(5.22)σ2(Um)=ζm(5.23)limnnσ2(Un)=m2ζ1

If E{Un}=θ(F) is stationary of order d1 for the d.f. of Xα,(5.20) may be replaced by

(5.24)mdKn(m,d)ζdσ2(Un)Kn(m,d)ζm

where

(5.25)Kn(m,d)=(nm)1c=dm(m1c1)(nmmc)

A Necessary and Sufficient Condition for the Existence of the Variance

(5.13) and (5.19) imply that a necessary and sufficient condition for the existence of σ2(U) is the existence of

(5.26)ζm=E{Φ2(X1,,Xm)}θ2

or that of E{Φ2(X1,,Xm)}.

If ζ1>0,σ2(U) is of order n1.

If θ(F) is stationary of order d for F=F0, that is, if (5.11) is satisfied, σ2(U) is of order nd1. Only if, for some F=F0,θ(F) is stationary of order m, where m is the degree of θ(F), we have σ2(U)=0, and U is equal to a constant with probability 1.

Lemma 5.1

For proving Theorem 5.1 we shall require the following:

Lemma 5.1. If

(5.27)δd=ζd(d1)ζd1+(d2)ζd2+(1)d1(dd1)ζ1

we have

(5.28)δd0,(d=1,,m)6

and

(5.29)ζd=δd+(d1)δd1++(dd1)δ1

Proof. (5.29) follows from (5.27) by induction.

For proving (5.28) let

η0=θ2,ηc=E{Φc2(X1,,Xc)},(c=1,,m)

Then, by (5.10),

ζc=ηcη0

and on substituting this in (5.27) we have

δd=c=0d(1)dc(dc)ηc

From (5.9) it is seen that (5.28) is true for d=1. Suppose that (5.28) holds for 1,,d1. Then (5.28) will be shown to hold for d.

Let

Φ0(x1)=Φ1(x1)θ,

Φc(x1,x2,,xc+1),(c=1,,d1)=Φc+1(x1,,xc+1)Φc(x2,,xc+1).

For an arbitrary fixed x1, let

ηc(x1)=E{Φc2(x1,X2,,Xc+1)},(c=0,,d1)

Then, by induction hypothesis,

δd1(x1)=c=0d1(1)d1c(d1c)ηc(x1)0

for any fixed x1.

Now,

E{ηc(X1)}=ηc+1ηc

and hence

E{δd1(X1)}=c=0d1(1)d1c(d1c)(ηc+1ηc)=c=0d(1)dc(dc)ηc=δd

The proof of Lemma 5.1 is complete.

Proof of Theorems 5.1

By (5.29) we have for c<d

cζddζc=ca=1d(da)δada=1c(ca)δa(5.30)=a=1c[c(da)d(ca)]δa+ca=c+1d(da)δa

From (5.28), and since c(da)d(ca)0 if 1acd, it follows that each term in the two sums of (5.30) is not negative. This, in connection with (5.9) proves Theorem 5.1.

Proof of Theorem 5.2.

From (5.19) we have

cζ1ζccmζm,(c=1,,m)

Applying these inequalities to each term in (5.13) and using the identity

(5.31)(nm)1c=1mc(mc)(nmmc)=m2n

we obtain (5.20).

(5.22) and (5.23) follow immediately from (5.13).

For (5.21) we may write

(5.32)Dn0

where

Dn=nσ2(Un)(n+1)σ2(Un+1)

Let

Dn=c=1mdn,cζc

Then we have from (5.13)

(5.33)dn,c=n(mc)(nmmc)(nm)1(n+1)(mc)(n+1mmc)(n+1m)1,

or

dn,c=(mc)(nm+1mc)(nm+1)1(nm)1{(c1)n(m1)2},(1cmn).

Putting

c0=1+[(m1)2n]

where [u] denotes the largest integer u, we have

dn,c0 if cc0dn,c>0 if c>c0.

Hence, by (5.19),

dn,cζc1c0ζc0cdn,c,(c=1,,m)

and

Dn1c0ζc0c=1mcdn,c

By (5.33) and (5.31), the latter sum vanishes. This proves (5.32).

For the stationary case ζ1==ζd1=0, (5.24) is a direct consequence of (5.13) and (5.19). The proof of Theorem 5.2 is complete.

Properties of the Covariance

Let’s talk about the covariance of two U-statistics. Consider a set of g U-statistics,

U(γ)=(nm(γ))1ΣΦ(γ)(Xα1,,Xαm(γ)),(γ=1,,g)

each U(γ) being a function of the same n independent, identically distributed random vectors X1,,Xn. The function Φ(γ) is assumed to be symmetric in its m(γ) arguments (γ=1,,g).

Let

E{U(γ)}=E{Φ(γ)(X1,,Xm(γ))}=θ(γ),(γ=1,,g);(6.1)Ψ(γ)(x1,,xm(γ))=Φ(γ)(x1,,xm(γ))θ(γ),(γ=1,,g);(6.2)Ψc(γ)(x1,,xc)=E{Ψ(γ)(x1,,xc,Xc+1,,Xm(γ))},(c=1,,m(γ);γ=1,,g);(6.3)ζc(γ,δ)=E{Ψc(γ)(X1,,Xc)Ψc(δ)(X1,,Xc)},

(γ,δ=1,,g)

If, in particular, γ=δ, we shall write

(6.4)ζc(γ)=ζc(γ,γ)=E{Ψc(γ)(X1,,Xc)}2

Let

σ(U(γ),U(δ))=E{(U(γ)θ(γ))(U(δ)θ(δ))}

be the covariance of U(γ) and U(δ).

In a similar way as for the variance, we find, if m(γ)m(δ),

(6.5)σ(U(γ),U(δ))=(nm(γ))1c=1m(γ)(m(δ)c)(nm(δ)m(γ)c)ζc(γ,δ).

The right hand side is easily seen to be symmetric in γ,δ.

For γ=δ,(6.5) is the variance of U(γ) (cf. (5.13)).

We have from (5.23) and (6.5)

limnnσ2(U(γ))=m2(γ)ξ1(γ)limnnσ(U(γ),U(δ))=m(γ)m(δ)ξ1(γ,δ).

Hence, if ζ1(γ)0 and ζ1(δ)0, the product moment correlation ρ(U(γ),U(δ)) between U(γ) and U(δ) tends to the limit

(6.6)limnρ(U(γ),U(δ))=ζ1(γ,δ)ζ1(γ)ζ1(δ)

The Limit Theorems: i.i.d. Case

In this section the vectors Xα will be assumed to be identically distributed.

Notes:
Converge of the Distribution Function:
A sequence of d.f.’s F1(x), F2(x), converges to a d.f. F(x) if limFn(x)=F(x) in every point at which the one-dimensional marginal limiting d.f.’s are continuous.

Singularity of the Distribution:
A g-variate normal distribution is called non-singular if the rank r of its covariance matrix is equal to g, and singular if r<g.

LEMMA 7.1. Let V1,V2, be an infinite sequence of random vectors Vn= (Vn(1),,Vn(g)), and suppose that the d.f. Fn(v) of Vn tends to a d.f. F(v) as n. Let Vn(γ)=Vn(γ)+dn(γ), where

(7.1)limnE{dn(γ)}2=0,(γ=1,,g)

Then the d.f. of Vn=(Vn(1),,Vn(g)) tends to F(v).

The Limit Theorem 7.1 and 7.2

Theorem 7.1. Let X1,,Xn be n independent, identically distributed random vectors,

Xα=(Xα(1),,Xα(r)),(α=1,,n)

Let

Φ(γ)(x1,,xm(γ)),(γ=1,,g),

be g real-valued functions not involving n,Φ(γ) being symmetric in its m(γ)(n) vector arguments xα=(xα(1),,xα(r)),(α=1,,m(γ);γ=1,,g). Define

(7.2)U(γ)=(nm(γ))1ΣΦ(γ)(Xα1,,Xαm(γ)),(γ=1,,g)

where the summation is over all subscripts such that 1α1<<αm(γ)n. Then, if the expected values

(7.3)θ(γ)=E{Φ(γ)(X1,,Xm(γ))},(γ=1,;g)

and

(7.4)E{Φ(γ)(X1,,Xm(γ))}2,(γ=1,,g)

exist, the joint d.f. of

n(U(1)θ(1)),,n(U(0)θ(θ))

tends, as n, to the g-variate normal d.f. with zero means and covariance matrix (m(γ)m(δ)ζ1(γ,δ)), where ζ1(γ,δ) is defined by (6.3). The limiting distribution is non-singular if the determinant |ζ1(γ,δ)| is positive.

According to Theorem 5.2, σ2(U) exceeds its asymptotic value m2ζ1/n for any finite n. Hence, when n is large but finite, Theorem 7.1 underestimate the variance of U. And for such cases the following theorem, which is an immediate consequence of Theorem 7.1, will be more useful.

Theorem 7.2. Under the conditions of Theorem 7.1, and if

ζ1(γ)>0,(γ=1,,g)

the joint d.f. of

(U(1)θ(1))/σ(U(1)),,(U(g)θ(g))/σ(U(g))

tends, as n, to the g-variate normal d.f. with zero means and covariance matrix (ρ(γ,δ)), where

ρ(γ,δ)=limnσ(U(γ),U(δ))σ(U(γ))σ(U(δ))=ζ1(γ,δ)ζ1(γ)ζ1(δ),(γ,δ=1,,g).

Proof of Theorem 7.1. The existence of (7.4) entails that of

ζm(γ)=E{Φ(γ)(X1,,Xm(γ))}2(θ(γ))2

which, by (5.19), (5.20) and (6.6), is sufficient for the existence of

ζ1(γ),,ζm1(γ), of σ2(U(γ)), and of ζ1(γ,δ)ζ1(γ)ζ1(δ)

Now, consider the g quantities

Y(γ)=m(γ)nα=1nΨ1(γ)(Xα),(γ=1,,g)

where Ψ1(γ)(x) is defined by (6.2). Y(1),,Y(g) are sums of n independent, random variables with zero means, whose covariance matrix, by virtue of (6.3), is

(7.5){σ(Y(γ),Y(δ))}={m(γ)m(δ)ξ1(γ,δ)}

By the Central Limit Theorem for vectors (cf. Cramer [1, p. 112]), the joint d.f. of (Y(1),,Y(ρ)) tends to the normal g-variate d.f. with the same means and covariances.

Theorem 7.1 will be proved by showing that the g random variables

(7.6)Z(γ)=n(U(γ)θ(γ)),(γ=1,,g),

have the same joint limiting distribution as Y(1),,Y(g).

According to Lemma 7.1 it is sufficient to show that

(7.7)limnE(Z(γ)Y(γ))2=0,(γ=1,,n)

For proving (7.7), write

(7.8)E{Z(γ)Y(γ)}2=E{Z(γ)}2+E{Y(γ)}22E{Z(γ)Y(γ)}

By (5.13) we have

(7.9)E{Z(γ)}2=nσ2(U(γ))=m2(γ)ζ1(γ)+O(n1)

and from (7.5),

(7.10)E{Y(γ)}2=m2(γ)ζ1(γ)

By (7.2) and (6.1) we may write for (7.6)

Z(γ)=n(nm(γ))1ΣΨ(γ)(Xα1,,Xαm(γ))

and hence

E{Z(γ)Y(γ)}=m(γ)(nm(γ))1α=1nE{Ψ1(γ)(Xα)Ψ(γ)(Xα1,,Xαm(γ))}.

The term

E{Ψ1(γ)(Xα)Ψ(γ)(Xα1,,Xαm(γ))}

is =ζ1(γ) if

(7.11)α1=α or α2=α or αm(γ)=α

and 0 otherwise. For a fixed α, the number of sets{α1,,αm(γ)} such that 1α1<<αm(γ)n and (7.11) is satisfied, is (n1m(γ)1). Thus,

(7.12)E{Z(γ)Y(γ)}=m(γ)(nm(γ))1n(n1m(γ)1)ζ1(γ)=m2(γ)ζ1(γ)

On inserting (7.9), (7.10), and (7.12) in (7.8), we see that (7.7) is true.

The proof of Theorem 7.1 is complete.

The Limit Theorem 7.3: Extension Theorem 7.1 to a Larger Class of Statistics

The application of Lemma 7.1 leads immediately to the following extension of Theorem 7.1 to a larger class of statistics.

Theorem 7.3. Let

(7.13)U(g)=U(g)+bn(γ)n,(γ=1,,g)

where U(γ) is defined by (7.2) and bn(γ) is a random variable. If the conditions of Theorem 7.1 are satisfied, and limE{bn(γ)}2=0,(γ=1,,g), then the joint distribution of

n(U(1)θ(1)),,n(U(g)θ(g))

tends to the normal distribution with zero means and covariance matrix

{m(γ)m(δ)ζ1(γ,δ)}

The Limit Theorem 7.4: Application to Sample Functionals

The theorem 7.3 applies, in particular, to the regular functionals θ(S) of the sample d.f.,

θ(S)=1nmα1=1nαm=1nΦ(Xα1,,Xαm)

in the case that the variance of θ(S) exists. For we may write

nmθ(S)=(nm)U+ΣΦ(Xα1,,Xαm)

where the sum Σ is extended over all m-tuples (α1,,αm) in which at least one equality αi=αj(ij) is satisfied. The number of terms in Σ is of order nm1. Hence

θ(S)U=1nD

where the expected value E{D2}, whose existence follows from that of σ2θ(S), is bounded for n. Thus, if we put U(γ)=θ(γ)(S), the conditions of Theorem 7.3 are fulfilled. We may summarize this result as follows:

Theorem 7.4. Let X1,,Xn be a random sample from an r-variate population with d.f. F(x)=F(x(1),,x(r)), and let

θ(γ)(F)=Φ(γ)(x1,,xm(γ))dF(x1)dF(xm(γ)),(γ=1,,g),

be g regular functionals of F, where Φ(γ)(x1,,xm(γ)) is symmetric in the vectors x1,,xm(γ) and does not involve n. If S(x) is the d.f. of the random sample, and if the variance of

θ(γ)(S)=1nmα1=1nαm(γ)=1nΦ(γ)(Xα1,,Xαm(γ))

exists, the joint d.f. of

n{θ(1)(S)θ(1)(F)},,n{θ(θ)(S)θ(g)(F)}

tends to the g-variate normal d.f. with zero means and covariance matrix

{m(γ)m(δ)ζ1(γ,δ)}

The Limit Theorem 7.5: Application to Functions of Statistics

The following theorem is concerned with the asymptotic distribution of a function of statistics of the form U or U.

Theorem 7.5. Let U=(U(1),,U(g)) be a random vector, where U(γ) is defined by (7.13), and suppose that the conditions of Theorem 7.3 are satisfied. If the function h(y)=h(y(1),,y(g)) does not involve n and is continuous together with its second order partial derivatives in some neighborhood of the point (y)=(θ)= (θ(1),,θ(g)), then the distribution of the random variable n{h(U)h(θ)} tends to the normal distribution with mean zero and variance

γ=1gδ=1gm(γ)m(δ)(h(y)y(γ)h(y)y(δ))y=θζ1(γ,δ)

Applications to particular statistics

  • Moments and functions of moments
  • Mean different and coefficient of concentration
  • Functions of ranks and of the signs of variate differences
  • Difference sign correlation
  • Rank correlation and grade correlation
  • Non-parametric tests of independence
  • Mann’s test against trend
  • The coefficient of partial difference sign correlation

References

[1] W. Hoeffding, ‘A Class of Statistics with Asymptotically Normal Distribution’, Ann. Math. Statist., vol. 19, no. 3, pp. 293–325, Sep. 1948, doi: 10.1214/aoms/1177730196.

[2] Shao J. Mathematical statistics[M]. Springer Science & Business Media, 2003.

来发评论吧~
Powered By Valine
v1.5.2