# On empirical likelihood for linear models with missing responses

## On empirical likelihood for linear models with missing responses

ARTICLE IN PRESS Journal of Statistical Planning and Inference 140 (2010) 3399–3408 Contents lists available at ScienceDirect Journal of Statistical...

ARTICLE IN PRESS Journal of Statistical Planning and Inference 140 (2010) 3399–3408

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

On empirical likelihood for linear models with missing responses Yongsong Qin , Qingzhu Lei School of Mathematical Sciences, Guangxi Normal University, Guilin, Guangxi 541004, China

a r t i c l e in fo

abstract

Article history: Received 25 September 2009 Received in revised form 2 May 2010 Accepted 6 May 2010 Available online 15 May 2010

Suppose that we have a linear regression model Y ¼ X 0 b þ n0 ðXÞe with random error e, where X is a random design variable and is observed completely, and Y is the response variable and some Y-values are missing at random (MAR). In this paper, based on the ‘complete’ data set for Y after inverse probability weighted imputation, we construct empirical likelihood statistics on EY and b which have the w2 -type limiting distributions under some new conditions compared with Xue (2009). Our results broaden the applicable scope of the approach combined with Xue (2009). & 2010 Elsevier B.V. All rights reserved.

MSC: primary, 62G05 secondary, 62E20 Keywords: Linear model Empirical likelihood Missing at random Conﬁdence interval

1. Introduction Consider the following linear regression model: Y ¼ X 0 b þ n0 ðXÞe,

ð1:1Þ

where Y is a scalar response variable, X is a p  1 vector of random design variable, b is a p  1 vector of regression parameters, n0 ðÞ is a strictly positive known function and errors e is a random error with EðejXÞ ¼ 0. Suppose that we have incomplete i.i.d. observations fðXi ,Yi , di Þ, i ¼ 1,2, . . . ,ng from this model, where all the Xi’s are observed, and di ¼ 0 if Yi is missing, di ¼ 1 otherwise. Throughout this paper, we assume that Y is missing at random (MAR). That is, Pðd ¼ 1jX,YÞ ¼ Pðd ¼ 1jXÞ (see Little and Rubin, 1987). Wang and Rao (2002a) developed an imputed empirical likelihood (EL) method to construct conﬁdence intervals for the mean EY. The main idea is to impute the missing Y-values by their predicted values. Then a complete data EL method is used from the imputed data set as if they were i.i.d. observations. It is shown that the EL ratio statistic for EY has a limiting distribution of a scaled w21 with unknown weight, which cannot be applied directly to make inference for EY. An adjusted EL is thus needed to obtain a conﬁdence interval for EY. This also would lead to a loss of the accuracy of the conﬁdence interval. To solve this problem, Xue (2009) combined the EL method and the inverse probability weighted imputation technique to study the construction of conﬁdence intervals and regions for EY and b. It is shown that the EL ratios based on the inverse probability weighted imputation are asymptotically standard chi-squared, which can be used directly to construct conﬁdence intervals and regions for EY and b. This is a nice feature. However, somewhat strong conditions are  Corresponding author. Tel.: + 86 773 5851556.

ARTICLE IN PRESS 3400

Y. Qin, Q. Lei / Journal of Statistical Planning and Inference 140 (2010) 3399–3408

required in Xue (2009), which restrict the applicable scope of the approach. In this paper, we employ some new conditions so that the applicable scope of the approach is broadened. The EL method to construct conﬁdence intervals was proposed by Owen (e.g., Owen, 1988, 1990, 1991, 2001). The EL method has many advantages over its counterparts like the normal-approximation-based method and the bootstrap method (e.g., Hall and La Scala, 1990). Some progresses have been made in making inference for linear, nonparametric and semiparametric regression models with missing data. See Wang and Rao (2001, 2002a, 2002b), Qin et al. (2008), Wang et al. (2004), Xue (2009), among others. The rest of the paper is organized as follows. In Section 2, we introduce two imputation methods: linear regression and inverse probability weighted imputations. In Section 3, we develop EL approach based on the inverse probability weighted imputation technique, and show that the resulting empirical log-likelihood is asymptotically standard chi-squared under some conditions different with those in Xue (2009), which is used to obtain EL based conﬁdence intervals and regions on EY and b. A short discussion on bandwidth selection is given in Section 4. Section 5 reports some simulation results to study the performance of the proposed conﬁdence intervals. The proofs of the main results are presented in the Appendix.

2. Imputation methods To implement regression imputation, we need an initial estimator of b ﬁrst. Then two imputation methods are introduced, namely, linear regression and inverse probability weighted imputations.

2.1. Initial estimator of b Based on the completely observed pairs fðXi ,Yi Þ; i ¼ 1,2, . . . ,ng for i 2 fi : di ¼ 1,1r i rng, deﬁne the weighted least square estimator of b as n X di Xi X 0

b^ r ¼

!1

n X di Xi Yi

i

i¼1

2 ðX Þ 0 i

n

i¼1

n20 ðXi Þ

:

2.2. Linear regression imputation For missing Yi, use the predicted response to impute is a commonly used method, i.e. use  ¼ Xi0 b^ r , Yi1

to impute missing Yi. Then a ‘complete’ data set for Y is obtained as  , Y~ i1 ¼ di Yi þ ð1di ÞYi1

i ¼ 1, . . . ,n:

ð2:1Þ

2.3. Inverse probability weighted imputation Denote the probability that Y is not missing by pðXÞ given X, i.e., pðXÞ ¼ Pðd ¼ 1jXÞ. We also denote pðXi Þ ¼ pi . We use ðdi =pi ÞYi þ ð1di =pi ÞXi0 b^ r , i ¼ 1, . . . ,n as ‘complete’ data set for Y if all pi are known, which can be viewed as the combination of Horvitz–Thompson inverse-selection weighted method and imputation method. In general, pi ’s are unknown. We adopt the weight function method to estimate them. Take 0 o h ¼ hn -0 and a nonnegative kernel function KðxÞ, x 2 Rp . Let Kh(x)= K(x/h). pðXÞ is estimated by

p^ ðXÞ ¼

n X

Wnj ðXÞdj ,

j¼1

where Wni(x) is the Nadaraya–Watson weight with , n X Wni ðxÞ ¼ Kh ðxXi Þ Kh ðxXj Þ: j¼1

We thus use   d d Y~ i2 ¼ i Yi þ 1 i Xi0 b^ r , p^ i p^ i

i ¼ 1, . . . ,n

as ‘complete’ data set for Y, where p^ i ¼ p^ ðXi Þ. Throughout this paper, we deﬁne 0/0 as 0.

ð2:2Þ

ARTICLE IN PRESS Y. Qin, Q. Lei / Journal of Statistical Planning and Inference 140 (2010) 3399–3408

3401

3. EL conﬁdence intervals Wang and Rao (2002a) developed an imputed EL method to construct conﬁdence intervals for the mean EY, where the ‘complete’ data set fY~ i1 ,1 r ir ng for Y was used and the EL ratio statistic has a limiting distribution of a scaled w21 with unknown weight. Based on the ‘complete’ data set fY~ i2 ,1 r ir ng for Y, we construct EL statistics on EY and b and show that the EL statistics have the w2 -type limiting distributions, which are used to construct EL conﬁdence intervals/regions without adjustment. 3.1. EL conﬁdence interval on EY Let y ¼ EY, and Zin ðyÞ ¼ Y~ i2 y: Similar to Owen (1990), we deﬁne the empirical log-likelihood ratio on y as ‘n1 ðyÞ ¼ 2max

n X

logðnpi Þ,

i¼1

where the maximum is taken over all sets of nonnegative numbers p1,y,pn summing to 1 and such that It can be shown, by using the Lagrange multiplier method, that pi ¼

1 1 , n 1 þ ln1 Zin ðyÞ

‘n1 ðyÞ ¼ 2

n X

logf1 þ ln1 Zin ðyÞg,

Pn

i¼1

pi Zin ðyÞ ¼ 0.

ð3:1Þ

ð3:2Þ

i¼1

where ln1 is the solution of the equation n 1X Zin ðyÞ ¼ 0: n i ¼ 1 1 þ ln1 Zin ðyÞ

ð3:3Þ

Use JxJ to denote the L2-norm in Rp. Assume that the probability density function f ðÞ of X exists. We ﬁrst list some regularity conditions needed in Theorems 1 and 2. (C1) f ðÞ is bounded and there exist constants a,b4 0 such that Z f ðyÞ dy Zar p

ð3:4Þ

y2Sðx,rÞ\A

for all r 2 ½0,b and all x 2 A, where A is the support of X and S(x, r) is the closed sphere (under the L2 norm) with center x and radius r. (C2) The probability function pðxÞ is uniformly continuous on A and there exists some positive constant C0 such that min1 r i r n pðXi Þ ZC0 4 0, a.s. (C3) There exist positive constants C1, C2 and r such that, C1 IðJuJ r rÞ rKðuÞ r C2 IðJuJ r rÞ, where I is the indicator function. p (C4) h-0 and nh =logn-1 as n-1. (C5) EðejXÞ ¼ 0, Eðjej3 Þ o1, EðJXJ3 Þ o 1, s20 40 and S0 4 0, where

s20 ¼ Efs2 ðXÞn20 ðXÞ=pðXÞg þVarfmðXÞg, S0 ¼ Efs2 ðXÞn20 ðXÞXX 0 =pðXÞg, mðXÞ ¼ X 0 b,

s2 ðXÞ ¼ Eðe2 jXÞ:

Remark 1. Eq. (3.4) in condition (C1) holds if inf A f ðxÞ 4 0 and Z dy Zar p

ð3:5Þ

y2Sðx,rÞ\A

for all r 2 ½0,b and all x 2 A. Eq. (3.5) is a mild condition on support sets, which holds true for all support sets used in the ﬁeld of probability and statistics. However, the restriction that the support A of X is bounded is usually necessary in condition (C1) if we put the condition inf A f ðxÞ 4 0. We note that the supports of all uniform and double-side truncated distributions are bounded. In practice, the supports of most distributions can be viewed as bounded. Thus, our results broaden the applicable scope of the approach combined with Xue (2009).

ARTICLE IN PRESS 3402

Y. Qin, Q. Lei / Journal of Statistical Planning and Inference 140 (2010) 3399–3408

Remark 2. We signiﬁcantly weaken the conditions on the kernel function, the bandwidth and the response probability function pðÞ in Xue (2009). The condition that f ðÞ has partial derivatives up to the order r is also removed. Remark 3. Xue (2009) has not assumed that the support of X is bounded, and the condition (C5) in this paper is stronger than the corresponding condition (C2) in Xue (2009). Theorem 1. Suppose that conditions (C1)–(C5) hold. Then as n-1, d

‘n1 ðyÞ!w21 ,

ð3:6Þ

where w21 is a chi-squared distribution with one degree of freedom. Let za satisfy Pðw21 r za Þ ¼ 1a. It follows from (3.6) that an EL based conﬁdence interval on y with asymptotically correct coverage probability 1a can be constructed as fy : ‘n1 ðyÞ rza g: 3.2. EL conﬁdence region on b Let

oin ðbÞ ¼ Xi ðY~ i2 Xi0 bÞ: Similar to Owen (1991), we deﬁne the empirical log-likelihood ratio on b as ‘n2 ðbÞ ¼ 2max

n X

logðnpi Þ,

i¼1

where the maximum is taken over all sets of nonnegative numbers p1,y,pn summing to 1 and such that It can be shown, by using the Lagrange multiplier method, that pi ¼

1 1 , n 1þ l0n2 oin ðbÞ

‘n2 ðbÞ ¼ 2

n X

Pn

i¼1

pi oin ðbÞ ¼ 0.

ð3:7Þ

0

logf1 þ ln2 oin ðbÞg,

ð3:8Þ

i¼1

where ln2 is the solution of the equation n 1X oin ðbÞ ¼ 0: n i ¼ 1 1þ l0n2 oin ðbÞ

ð3:9Þ

Theorem 2. Suppose that conditions (C1)–(C5) hold. Then as n-1, d

‘n2 ðbÞ!w2p ,

ð3:10Þ

2 p

where w is a chi-squared distribution with p degrees of freedom. Let zap satisfy Pðw2p rzap Þ ¼ 1a. It follows from (3.10) that an EL based conﬁdence region on b with asymptotically correct coverage probability 1a can be constructed as fb : ‘n2 ðbÞ rzap g: 4. Bandwidth selection Cross-validation method in choosing bandwidths is recommended. We select h by minimizing CVðhÞ ¼

n X

fdi p^ i ðXi Þg2 ,

i¼1

where p^ i ðÞ is a ‘leave out’ version of p^ ðÞ. 5. Simulations We conducted a small simulation study on the ﬁnite sample performance of the EL based conﬁdence intervals on y ¼ EY and b. We used the model Y ¼ X b þjXj1=2 e, where b ¼ 1, X  Uð1,2Þ, and e was generated from the standard normal distribution N(0, 1) or the uniform distribution U[ 0.5, 0.5]. The weight function Wni(x) used in Section 2 was

ARTICLE IN PRESS Y. Qin, Q. Lei / Journal of Statistical Planning and Inference 140 (2010) 3399–3408

3403

chosen as , Wni ðxÞ ¼ KððxXi Þ=hÞ

n X

KððxXj Þ=hÞ,

j¼1

where KðxÞ ¼ ð15=16Þð1x2 Þ2 Iðjxj r1Þ. h was chosen by the cross-validation method introduced in Section 4. We considered the following three cases of response probabilities under the MAR assumption: Case 1: p1 ðxÞ ¼ Pðd ¼ 1jX ¼ xÞ ¼ 0:8 þ 0:2jx1j, if jx1j r1, and ¼ maxf10:05jx1j,0:001g, elsewhere. Case 2: p2 ðxÞ ¼ Pðd ¼ 1jX ¼ xÞ ¼ 0:90:2jx1j, if jx1j r4:49, and 0.1 elsewhere. Case 3: p3 ðxÞ ¼ Pðd ¼ 1jX ¼ xÞ ¼ 0:6 for all x. The average missing rates corresponding to the preceding three cases are approximately 9.9%, 20.1% and 40.0%, respectively. For each of the three cases, we generated 5000 random samples of incomplete data fXi ,Yi , di ,i ¼ 1, . . . ,ng for n =60, 100, 150, 200, 300, 400 and 500 from the model and speciﬁed response probability function. For nominal conﬁdence level 1a ¼ 0:95, using the simulated samples, we evaluated the coverage probability (CP) and average length (AL) of the EL based conﬁdence intervals on y ¼ EY and b proposed in Section 3. Table 1 repots the simulation results for y as e  Nð0,1Þ. Table 1 Coverage probabilities (CP) and average lengths (AL) of conﬁdence intervals on y under different response functions pðxÞ, sample sizes n and e  Nð0,1Þ: n

60 100 150 200 300 400 500

CP

AL

p1 ðxÞ

p2 ðxÞ

p3 ðxÞ

p1 ðxÞ

p2 ðxÞ

p3 ðxÞ

0.913 0.921 0.925 0.930 0.935 0.937 0.947

0.905 0.917 0.917 0.922 0.926 0.930 0.941

0.892 0.904 0.913 0.915 0.918 0.921 0.932

1.1324 0.9257 0.8571 0.7077 0.6355 0.5448 0.5050

1.1577 1.1133 0.9102 0.8203 0.7802 0.6782 0.5362

1.4214 1.1324 1.1051 0.8450 0.8363 0.7669 0.6507

Table 2 Coverage probabilities (CP) and average lengths (AL) of conﬁdence intervals on b under different response functions pðxÞ, sample sizes n and e  Nð0,1Þ. n

60 100 150 200 300 400 500

CP

AL

p1 ðxÞ

p2 ðxÞ

p3 ðxÞ

p1 ðxÞ

p2 ðxÞ

p3 ðxÞ

0.910 0.915 0.921 0.933 0.934 0.936 0.941

0.892 0.902 0.915 0.920 0.924 0.927 0.932

0.856 0.876 0.880 0.903 0.910 0.918 0.925

0.7235 0.61233 0.5122 0.4915 0.4402 0.4104 0.3847

0.8271 0.6597 0.6014 0.5723 0.5090 0.4694 0.3809

0.9200 0.7182 0.6773 0.6180 0.5984 0.5482 0.4799

Table 3 Coverage probabilities (CP) and average lengths (AL) of conﬁdence intervals on y under different response functions pðxÞ, sample sizes n and e  U½0:5,0:5. n

60 100 150 200 300 400 500

CP

AL

p1 ðxÞ

p2 ðxÞ

p3 ðxÞ

p1 ðxÞ

p2 ðxÞ

p3 ðxÞ

0.927 0.930 0.932 0.935 0.940 0.940 0.942

0.915 0.917 0.923 0.926 0.931 0.933 0.938

0.902 0.906 0.910 0.915 0.924 0.925 0.931

0.4192 0.3745 0.3092 0.2978 0.2786 0.2337 0.2005

0.4267 0.4122 0.3232 0.3083 0.2834 0.2638 0.2372

0.4400 0.4365 0.3870 0.3248 0.2852 0.2725 0.2588

ARTICLE IN PRESS 3404

Y. Qin, Q. Lei / Journal of Statistical Planning and Inference 140 (2010) 3399–3408

Table 4 Coverage probabilities (CP) and average lengths (AL) of conﬁdence intervals on b under different response functions pðxÞ, sample sizes n and e  U½0:5,0:5. n

CP

60 100 150 200 300 400 500

AL

p1 ðxÞ

p2 ðxÞ

p3 ðxÞ

p1 ðxÞ

p2 ðxÞ

p3 ðxÞ

0.911 0.917 0.920 0.930 0.935 0.938 0.939

0.875 0.900 0.905 0.915 0.919 0.920 0.923

0.805 0.887 0.900 0.903 0.911 0.917 0.921

0.2613 0.2039 0.1923 0.1604 0.1456 0.1229 0.1175

0.3024 0.2874 0.2275 0.1899 0.1672 0.1503 0.1430

0.3553 0.3111 0.2938 0.2233 0.1835 0.1787 0.1507

Table 2 repots the simulation results for b as e  Nð0,1Þ. Table 3 repots the simulation results for y as e  U½0:5,0:5. Table 4 repots the simulation results for b as e  U½0:5,0:5. Tables 1–4 reveal the following facts: 1. For every response rate and sample size, the coverage probabilities (CP) of EL conﬁdence intervals are close to the nominal level 95%, the average lengths of intervals are small. 2. The coverage probabilities (CP) of intervals go closer to the nominal level 95% as the sample size increases or the response rate becomes higher. 3. In almost all situations, the lengths of intervals also improve (become smaller) as the sample size increases or the response rate becomes higher. 4. We found that the performances of the EL conﬁdence intervals suggested in this paper are quite robust against the variability of the bandwidth h. Extra simulation results (unreported to save space) indicated that the performances of the EL conﬁdence intervals are similar when we let h vary between n  1/5 and n  1/2. Comparing with the conﬁdence intervals proposed in Wang and Rao (2002a), the conﬁdence interval in this paper is easy to implement and performances better for large sample sizes.

Acknowledgements This work was partially supported by the National Natural Science Foundation of China (10971038) and the Natural Science Foundation of Guangxi (2010 GXNSFA 013117). The authors are thankful to the referees for constructive suggestions. Appendix A. Proofs of Theorems 1 and 2 We use C to denote a positive constant independent on n, which may take a different value for each appearance. {Zi}

Lemma 1 (Bernstein’s inequality, Serﬂing, 1980, p. 95). Let PðjZi jr MÞ ¼ 1, i Z1. Then, for any t 4 0,   ! ( )  X n t2   P  : ðZi EZ i Þ 4 t r2exp  Pn   2ð j ¼ 1 VarZ j þMt=3Þ i¼1

be

independent

random

variables

satisfying

Lemma 2. Suppose that conditions (C1), (C3) and (C4) hold. Then there exist positive constants C3 and C4 such that lim PðAn Þ ¼ 1,

ðA:1Þ

n-1

p 1

where An ¼ fmin1 r i r n fðnh Þ p

Pn

j¼1

ðnh Þ max Wni ðXj Þ ¼ Oð1Þ 1 r i,j r n

p 1

Kh ðXi Xj Þg Z C3 ,max1 r i r n fðnh Þ

a:s:;

where , Wni ðxÞ ¼ Kh ðxXi Þ

n X

Kh ðxXj Þ:

j¼1

Proof. Let p

Tn ðXi Þ ¼ ðnh Þ1

n X j¼1

Kh ðXi Xj Þ:

Pn

j¼1

Kh ðXi Xj Þg r C4 g, and on the set An, ðA:2Þ

ARTICLE IN PRESS Y. Qin, Q. Lei / Journal of Statistical Planning and Inference 140 (2010) 3399–3408

3405

Use PX and EX to denote the conditional probability and conditional expectation given X. From Bernstein’s inequality and conditions (C3) and (C4), for any t 4 0, we have   X n PðjTn ðXi ÞEXi Tn ðXi Þj 4tÞ P max jTn ðXi ÞEXi Tn ðXi Þj 4 t r 1rirn

i¼1

 0 1   n  X pA p   @ EP Xi  ðKh ðXi Xj ÞEXi Kh ðXi Xj Þ 4 tnh r2n expfCnh g rCn2 : ¼  j ¼ 1 i¼1 n X

Thus, max jTn ðXi ÞEXi Tn ðXi Þj ¼ oð1Þ

1rirn

a:s:

ðA:3Þ

On the other hand, from conditions (C1), (C3) and (C4), we have Z Z n1 p n1 p p p EXi Tn ðXi Þ ¼ IðJXi uJ r rhÞf ðuÞ du þðnh Þ1 Kð0Þ h h Kh ðXi uÞf ðuÞ du þ ðnh Þ1 Kð0Þ Z C1 n n u2SðXi , rhÞ Z n1 p p 1 ¼ C1 h T f ðuÞ duþ ðnh Þ Kð0Þ ZC5 a:s: n u2SðXi , rhÞ A

ðA:4Þ

for some constant C5 4 0. It is clear that EXi Tn ðXi Þ rC6

a:s:

ðA:5Þ

for some constant C6 4 0. From (A.3), (A.4) and (A.5), we have (A.1). Eq. (A.2) follows from (A.1) and the boundness of K.

&

The following result is about the uniform consistency of p^ i ,1 r ir n. Lemma 3. Suppose that conditions (C1)–(C4) hold. Then as n-1, max jp^ i pi j ¼ op ð1Þ:

ðA:6Þ

1rirn

Proof. By Bernstein’s inequality and Lemma 2, for An in Lemma 2 and any t 4 0, we have 9   08 1  X  \  =\ <   n   @ An A max An ¼ P Wnj ðXi Þðdj pj Þ 4t P max jp^ i EfXi ,1 r i r ng p^ i j 4t ; :1 r i r n 1rirn  j¼1 9   9 08 1 08 1   =\ < X =\ < X n n n n X X   p p     @ A @ An ¼ An A P Wnj ðXi Þðdj pj Þ 4 t EP fXi ,1 r i r ng Wnj ðXi Þnh ðdj pj Þ 4 tnh r ; : ; :   i¼1 j¼1 i¼1 j¼1 p

r 2nexpfCnh g rCn1 -0,

ðA:7Þ

where we note, on An, that n X

p

2 Wnj ðXi Þ rCðnh Þ2

j¼1

n X

p

Kh2 ðXi Xj Þ rCðnh Þ2

j¼1

n X

p

Kh ðXi Xj Þ r Cðnh Þ1

a:s:

j¼1

Further,    \  max jp^ i EfXi ,1 r i r ng p^ i j 4t An þPðAcn Þ, P max jp^ i EfXi ,1 r i r ng p^ i j4 t r P 1rirn

1rirn

where Acn is the complement of An. Eq. (A.7) and Lemma 2 imply max jp^ i EfXi ,1 r i r ng p^ i j ¼ op ð1Þ:

ðA:8Þ

1rirn

By conditions (C2) and (C3), we have    X   n  max jEfXi ,1 r i r ng p^ i pi j ¼ max  Wnj ðXi ÞðpðXj ÞpðXi ÞÞ 1rirn 1 r i r n  j¼1    X n   ¼ max  Wnj ðXi ÞðpðXj ÞpðXi ÞÞðIðJXi Xj J 4 rhÞ þ IðJXi Xj J r rhÞÞ ¼ oð1Þ   1rirn j¼1

This completes the proof of Lemma 3.

&

a:s:

ðA:9Þ

ARTICLE IN PRESS 3406

Y. Qin, Q. Lei / Journal of Statistical Planning and Inference 140 (2010) 3399–3408

We also need to know the convergence rate of b^ r as follows. Lemma 4. Suppose conditions (C1)–(C4) hold. Then as n-1,

b^ r b ¼ Op ðn1=2 Þ:

ðA:10Þ

Proof. It is clear that pﬃﬃﬃ ^ nðb r bÞ ¼

n1

n X di Xi X 0 i¼1

n

!1

i 2 ðX Þ 0 i

 n1=2

n X di Xi ei i¼1

n0 ðXi Þ

¼

8 < :

E

dXX 0

n

!1

2 ðXÞ 0

þ op ð1Þ

9 = ;

 n1=2

n X di Xi ei i¼1

n0 ðXi Þ

¼ Op ð1Þ,

ðA:11Þ

where we have used the fact that n1=2

n X di Xi ei i¼1

d

n0 ðXi Þ

!N 0,E

dXX 0 e2

n20 ðXÞ

! :

&

Lemmas 5 and 6 are key results in proving Theorems 1 and 2. Lemma 5. Suppose conditions (C1)–(C5) hold. Then as n-1, n1=2

n X

d

Zin ðyÞ!Nð0, s20 Þ,

ðA:12Þ

i¼1

n1

n X

2 Zin ðyÞ ¼ s20 þ op ð1Þ,

ðA:13Þ

i¼1

and max jZin ðyÞj ¼ oðn1=2 Þ

a:s:;

1rirn

ðA:14Þ

where

s20 ¼ Efs2 ðXÞn20 ðXÞ=pðXÞg þ VarfmðXÞg, s2 ðXÞ ¼ Eðe2 jXÞ, mðXÞ ¼ X 0 b: Proof. Eq. (A.14) is obvious. We ﬁrst prove (A.13). Note that p^ i ¼ pi þ op ð1Þ uniformly for i, b^ r ¼ b þop ð1Þ and that min1 r i r n pðXi Þ Z C0 4 0. One can show that   2 n n  X X di d 2 Zin ðyÞ ¼ n1 Yi þ 1 i Xi0 bEY þ op ð1Þ n1 i¼1

i¼1

¼ n1

pi

n  X di i¼1

pi

pi

n0 ðXi Þei þ mðXi ÞEmðXi Þ

2

þ op ð1Þ ¼ Efs2 ðXÞn20 ðXÞ=pðXÞg þVarfmðXÞg þop ð1Þ,

which implies (A.13). We now prove (A.12). Observe that       n n  n  X X X di d di n0 ðXi Þei d n1=2 Zin ðyÞ ¼ n1=2 Yi þ 1 i Xi0 b^ r EY ¼ n1=2 þmðXi ÞEmðXi Þ þ 1 i Xi0 ðb^ r bÞ p^ i p^ i p^ i p^ i i¼1 i¼1 i¼1    n  X ^ d n ðX Þ e d d n ðX Þð p  p Þ e 0 i i 0 i i i i i , ðA:15Þ þmðXi ÞEmðXi Þ þ 1 i Xi0 ðb^ r bÞ þ i ¼ n1=2 pi p^ i pi p^ i i¼1

where we have used the equality: 1

p^ i

¼

1

pi

þ

pi p^ i : p^ i pi

Note that n1

n  X i¼1

1

di

pi



Xi0 ¼ op ð1Þ:

From Lemmas 3 and 4, we can see that   n  n  X X d d 1 i Xi0 ðb^ r bÞ ¼ n1=2 1 i Xi0 ðb^ r bÞ þop ð1Þ ¼ op ð1Þ: n1=2 p^ i pi i¼1

ðA:16Þ

i¼1

Use E to denote the conditional expectation given fðdi ,Xi Þ : 1 r ir ng. Let I(Bn) and I(Dn) be the indicator functions of Bn P and Dn, where Bn ¼ fmin1 r i r n p^ i 4 C0 =2g and Dn ¼ fn1 ni¼ 1 n20 ðXi Þs2 ðXi Þ r 2En20 ðXÞs2 ðXÞg with C0 in condition (C2). Then

ARTICLE IN PRESS Y. Qin, Q. Lei / Journal of Statistical Planning and Inference 140 (2010) 3399–3408

3407

by the MAR assumption and condition (C2), ( )2 n n X X di ðpi p^ i Þ di ðpi p^ i Þ2 2  1=2 E n n0 ðXi Þei  IðBn ÞIðDn Þ ¼ n1 n0 ðXi Þs2 ðXi ÞIðBn ÞIðDn Þ pi p^ i p2 p^ 2 i¼1

i¼1

r C max ðpi p^ i Þ2 n1 1rirn

n X

i

i

n20 ðXi Þs2 ðXi ÞIðDn Þ r C max ðpi p^ i Þ2 : 1rirn

i¼1

As max1 r i r n ðpi p^ i Þ2 r1, we thus have Efmax1 r i r n ðpi p^ i Þ2 g-0 by Lemma 3 and the dominated convergence theorem. It follows that ( )2 n X di ðpi p^ i Þ 1=2 E n n0 ðXi Þei  IðBn ÞIðDn Þ -0: pi p^ i i¼1

Thus, n1=2

n X di ðpi p^ i Þ n0 ðXi Þei  IðBn ÞIðDn Þ ¼ op ð1Þ: pi p^ i

i¼1

On the other hand, P(Bcn) =o(1) by Lemma 3 and condition (C2) and P(Dcn)= o(1) by the law of large numbers. It follows, for any t 4 0, that    ! ( ) !     n n \ X X di ðpi p^ i Þ di ðpi p^ i Þ  ðBn Dn Þ þPððBn Dn Þc Þ P n1=2 n0 ðXi Þei  4t r P n1=2 n0 ðXi Þei  4 t     pi p^ i pi p^ i 1  i ¼) (i ¼ 1 !   n  1=2 X di ðpi p^ i Þ þ PððBn Dn Þc Þ: n0 ðXi Þei  IðBn ÞIðDn Þ 4 t ¼P n   pi p^ i i¼1

Therefore, n1=2

n X di ðpi p^ i Þ i¼1

pi p^ i

n0 ðXi Þei ¼ op ð1Þ:

ðA:17Þ

From (A.15), (A.16) and (A.17), we have  n n  X X di n0 ðXi Þei n1=2 Zin ðyÞ ¼ n1=2 þmðXi ÞEmðXi Þ þ op ð1Þ: i¼1

i¼1

pi

Eq. (A.12) follows by the central limiting theorem.

&

Lemma 6. Suppose conditions (C1)–(C4) hold. Then as n-1, n1=2

n X

d

oin ðbÞ!Nð0, S0 Þ,

ðA:18Þ

i¼1

n1

n X

oin ðbÞo0in ðbÞ ¼ S0 þ op ð1Þ,

ðA:19Þ

i¼1

and max Join ðbÞJ ¼ oðn1=2 Þ

1rirn

a:s:;

ðA:20Þ

where S0 ¼ Efs2 ðXÞn20 ðXÞXX 0 =pðXÞg, s2 ðXÞ ¼ Eðe2 jXÞ. Proof. Similar to the proof of Lemma 5, we can show that Lemma 6 holds.

&

Proof of Theorem 1. From (A.12), (A.13) and (A.14), and the arguments similar to the proof of (2.14) in Owen (1990), we can show that

ln1 ¼ Op ðn1=2 Þ:

ðA:21Þ

From (3.3), it is readily seen by a direct calculation and (A.21) that ( )1 ( ) n n X X 1 2 1 ln1 ¼ n Zin ðyÞ n Zin ðyÞ þ op ðn1=2 Þ: i¼1

i¼1

ðA:22Þ

ARTICLE IN PRESS 3408

Y. Qin, Q. Lei / Journal of Statistical Planning and Inference 140 (2010) 3399–3408

Thus, using Taylor expansion, we have ( )1 ( )2 n n X X 2 ‘n1 ðyÞ ¼ n n1 Zin ðyÞ n1 Zin ðyÞ þop ð1Þ: i¼1

ðA:23Þ

i¼1

The proof is complete from (A.12), (A.13) and (A.23).

&

Proof of Theorem 2. Using Lemma 6, we can prove Theorem 2 similar to the proof of Theorem 1.

&

References Hall, P., La Scala, B., 1990. Methodology and algorithms of empirical likelihood. Internat. Statist. Rev. 58, 109–127. Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. John Wiley, New York. Owen, A.B., 1988. Empirical likelihood ratio conﬁdence intervals for a single functional. Biometrika 75, 237–249. Owen, A.B., 1990. Empirical likelihood ratio conﬁdence regions. Ann. Statist. 18, 90–120. Owen, A.B., 1991. Empirical likelihood for linear models. Ann. Statist. 19, 1725–1747. Owen, A.B., 2001. Empirical Likelihood. Chapman & Hall, New York. Qin, Y., Rao, J.N.K., Ren, Q., 2008. Conﬁdence intervals for marginal parameters under fractional linear regression imputation for missing data. J. Multivariate Anal. 99, 1232–1259. Serﬂing, R.J., 1980. Approximation Theorems of Mathematical Statistics. John Wiley & Sons, New York. Wang, Q., Rao, J.N.K., 2001. Empirical likelihood for linear regression models under imputation for missing responses. Canad. J. Statist. 29, 597–608. Wang, Q., Rao, J.N.K., 2002a. Empirical likelihood-based inference in linear models with missing data. Scand. J. Statist. 29, 563–576. Wang, Q., Rao, J.N.K., 2002b. Empirical likelihood-based inference under imputation for missing response data. Ann. Statist. 30, 896–924. ¨ Wang, Q., Linton, O., Hardle, W., 2004. Semiparametric regression analysis with missing response at random. J. Amer. Statist. Assoc. 99, 334–345. Xue, L.G., 2009. Empirical likelihood for linear models with missing responses. J. Multivariate Anal. 100, 1353–1366.