A two-stage spline smoothing method for partially linear models

A two-stage spline smoothing method for partially linear models

Journal of Statistical Planning and Inference 187 27 (1991) 187-201 North-Holland A two-stage spline smoothing method for partially linear mode...

856KB Sizes 0 Downloads 87 Views

Journal

of Statistical

Planning

and Inference

187

27 (1991) 187-201

North-Holland

A two-stage spline smoothing method for partially linear models Hung

Chen*

Department Stony

of Applied

Brook,

NY

Jyh-Jen Institute

Horng

7 August

Recommended Abstract: parametric a two-stage

National

showed

regression

model.

component

undersmoothing

Hsinchu,

received

the partial

19 January

spline

is generally

biased

estimate

these estimates Subject

proposed

Brook,

Taiwan 30043, R.O.C. 1990

estimate and

of the parametric

it is necessary

component

Classification:

component

in a

the non-

to undersmooth

and phrases: partial

1. Introduction paper,

Partial

regression;

splines;

rate

parameters,

with

by Denby (1986) and Speckman are also shown

for both estimates.

secondary

new

we show that the estimate

without

(1988). Asymptotic Furthermore,

normali-

we associate

62599.

semiparametric

scores;

the

scores methods.

62605;

efficient

parametric

We also show that the same result holds for the partial

(1986) efficient Primary

rates for smoothing

at the

component.

independently

with Wellner’s

choosing

be estimated

the nonparametric

for the parametric

In this

that

By appropriately can

ty results

words

manuscript

model

regression

vergence;

York-Stony

component to force the bias to be negligible with respect to the standard error. We propose spline smoothing method for estimating the parametric and nonparametric components in

parametric

Key

of New

Studden

Rice (1986)

a semiparametric

AMS

State University

Tsing Hua University,

1989; revised

by W.J.

semiparametric

and Statistics,

Shiau**

of Statistics,

Received

Mathematics

11794, U.S.A.

additive

regression;

smoothing

splines,

rate of con-

models.

and summary we

consider

the following

yin=X;B+g(fin)+eeln,

i=

l)...)

semiparametric

regression

n,

model (1)

where both the xin = (xi,,,, . . . ,x~~,)~ (a d-vector) and lj,, (a real number) are known, ’ is a vector of unknown regression coefficients, g is a smooth funcP=(P17...,RJ tion to be estimated, and the {ej,} are independent noise terms with mean zero and variance ~7~. * This work was sponsored by the National ** Current address: AT&T Bell Laboratories, NJ 08540,

Science Foundation under Grant No. DMS-8901556. Engineering Research Center, P.O. Box 900, Princeton,

U.S.A.

0378-3758/91/$03.50

0

1991-Elsevier

Science

Publishers

B.V. (North-Holland)

H. Chen,

188

J.-J.H.

Shiau / Spline smoothing

for partially

linear models

There have been several approaches to estimating /I and g from noisy data { ri,}. One approach is the so-called partial spline estimation method proposed by Engle et al. (1986), Wahba (1984, 1986) and Shiau et al. (1986) among others. The partial spline estimate is the minimizers of the following variational problem: min A i (U,,-XlrnP-g(tin))2+~J(g) [email protected],gs w;” n ;= I where WT is the Sobolev space {f 1f has m - 1 absolutely continuous derivatives and fcfn) E&[O, 11) and J(g) = 1; (g’“‘(t))2 dt is a penalty functional measuring the smoothness of g. The smoothing parameter A controls the tradeoff between fidelity to the data and roughness of the solution. Let X=(x,.,) be the n x d design matrix for the parametric part of (1) and y = (yl,, . . . , Y,,)~. It can be shown easily (e.g., Shiau and Wahba and g = (g(tl,), $=

(1988), Speckman (1988)) that the partial obtained from (2) are

spline estimates

for /I

. . . , g(t,,))T

(XT(I-S,)X)-‘X’(Z-S~)y

and

g = S,(v-Xp^),

(3)

where SA is the smoother matrix for ordinary spline smoothing in (2) with /?=O. There have been several studies on the asymptotic behavior of j?. Heckman (1986) considered the case where x,, is ‘white noise’ and showed that fi(p-p) is asymptotically normal under mild assumptions. However, Rice (1986) considered a simple model where the X, and t, are not independent. For d = 1, he let xin = h(tjn) + zin where h is a smooth function and zin behaves like white noise. He found that the parametric rate of convergence O(n-1’2) for /I can be achieved in general only at the expense of undersmoothing the nonparametric component g. Thus the use of the generalized cross-validation (GCV) method proposed by Craven and Wahba (1979) for choosing A is questionable in this case. Similar results have been obtained in the case when x,, is deterministic, say, xin =h(r;,,). Shiau and Wahba (1988) (SW) studied convergence rates for the mean square errors of the partial spline estimate in (3) as well as the partial regression estimates (called Denby/Speckman-type estimates Speckman

in SW since it was proposed (1988)) defined as follows:

independently

p, = (XT(Z-S~)2X)~1XT(Z-S~)2y

and

For d = 1, they reported parametric convergence for the bias for both rate than that of p^ in some cases. for A optimal for predictive mean not be optimal for mean square reported similar results as in SW.

by

Denby

gl = SA(y-X/?,).

(1986)

and

(4)

convergence for the variance but nonparametric p^ and p,. However, /?I has a faster convergence They also pointed out that data based estimates square error in the function, such as GCV, may error of /?. Eubank and Whitney (1989) also

We propose a new two-stage spline smoothing method motivated by the following ideas. First, we note that the partial spline method does not penalize the roughness of the smooth term in the parametric component, h(t). In the case of x,,= h(fi,)+zi,, it is natural to consider including h(t) in the penalty term. Second, we

H. Chen, J.-J.H. Shiau / Spline smoothing for partially linear models

189

notice that the independence assumption of xi, to t gives Heckman (1986) the asymptotic normality result while Rice (1986) reported the negative result described above. This gives us the idea of extracting z,, from xin such that zin is more or less orthogonal and hence independent to t. We remark that Eubank and Speckman (1989) also suggested the same idea of orthogonalization motivated by the partial regression method. Thus we propose the following two-stage spline smoothing procedure. Procedure.

Stage 1. Forj=

1, . . . . d, smooth the j-th column vector xj of X with respect the residual vectors rj and form the block matrix R = [r,, . . . , rd]. Stage 2. Apply partial spline smoothing with X replaced by R to obtain estimate as in (3).

to t

to obtain

the

In this paper, we study the case that the smoother is the smoothing spline smoother. Note that in the stage two we are basically penalizing the roughness of the whole function rather than just the function g in the partial spline smoothing. If we use the same smoother S>,,,for each regressor, we obtain

(5) However, the appropriateness of using the same smoothing parameter is doubtful since the regressors may have different smoothness, not mentioning they may have quite different scales and variations among themselves. It is more natural to use different smoothing parameter for each regressor. In this case our new estimate can be written as

p(’= (F((P- &y(Z- S,)(i- s)z)- ‘XT&

g~=s~(y-x~“)-(z-s~)s~~~,

S)r(z - SA)Yt

(6)

where x7, f, and Sare block matrices to be defined in Section 2. We show that under regularity conditions, the negative result reported in Rice (1986) disappears for these new estimates of /3 and g. More specifically, by choosing appropriate smoothing parameters, the convergence rate of /& reaches the parametric rate O(n-I’*) while go keeps the same optimal convergence rate as that of the ordinary smoothing splines. This estimate has another interpretation. Wellner (1986) gave two genera1 schemes using scores for constructing asymptotic efficient estimates for semiparametric models. It can be shown that WelIner’s first method leads to an estimate of the form (6). This gives the estimate (6) a flavor of efficiency. However, we must warn the readers that Wellner’s approaches do not guarantee the efficiency for every semiparametric model, and it is not our goal to discuss the efficiency in this paper. It is interesting to note that if the second method in WelIner (1986) is used, we obtain

190

H. Chen, J.-J. H. Shiau / @line smoothing

estimates for /I and g identical of Wellner’s second approach shown in Section 3.2 that

linear models

to (3), the partial spline estimate. Also, a variation leads to the partial regression estimate. It will be p”i in (4) achieves the parametric rate without

undersmoothing the nonparametric the context of kernel smoothing. Buja, Hastie, and Tibshirani estimates for additive models.

for partially

component.

Speckman

(1989) proposed another Model (1) can be treated

(1988) gave this result in

general scheme to construct as a special case of additive

models. In this approach, first the system of normal equations in the population version (in terms of random variables and the conditional expectations) are derived and then converted into the data version (in terms of the realizations of the random variables and the smoother). The smoother in the sample version can be quite general, e.g., a smoothing spline smoother, a kernel smoother, a running mean smoother, etc. A backfitting algorithm is then used to solve the normal equations. This leads to the estimate in (3), the partial spline estimate, if the smoother S is the smoothing spline smoother. For the estimate (3), it is shown in Rice (1986) and Theorem 1 in Speckman (1988) that it is not possible to estimate /I at the parametric rate n-i’* while the average mean square error of & achieves the optimal rate of convergence when we use either the smoothing spline smoother or the kernel smoother. On the other hand, Chen (1988) using the estimate (3), Speckman (1988) using the estimate (4), and this paper using the new estimate (5) or (6) show that, by choosing appropriate smoothing parameter(s), /3 can be estimated at the parametric rate X1’* while the average mean square error of & achieves the optimal rate of convergence. However, we note that the regression spline smoother used in Chen (1988) is a projection so that his estimate (3) is in fact a special case of (4) or (5). Since many general estimation schemes have been proposed recently with smoothers up to user’s free choice, we think it is important to warn users that different smoothers can behave quite differently even for the same scheme. The choice of the smoothing parameters is well known to be crucial to the solution. For the semiparametric model (1), one thought is to estimate both p and g well by a two-stage method, say, first by undersmoothing g to get a good estimate of /I and then smooth the data with the estimated parametric component taken off to get the ‘right’ smoothness for g. Two essential questions arise: (1) How much undersmoothing is appropriate? (2) How to do undersmoothing automatically (i.e., data-driven)? The two-stage smoothing method we propose in this paper does not require undersmoothing. Furthermore, our results hold for a wide range of rates for smoothing parameters. We conjecture that our results hold for the L’s chosen by the generalized cross validation method proposed by Craven and Wahba (1979). Research on this aspect is underway. The remainder of this paper is organized as follows. In Section 2, we introduce notation. In Section 3, under regularity conditions, we derive the convergence rate results and show that fiO and fi, can be estimated at the parametric rate O(K”*) without undersmoothing the nonparametric component. We also derive the asymptotic normality for &, and p^t.

H. Chen, J.-J.H. Shiau / Spline smoothing for partially linear models

191

2. Notation Let I= [Z, . . . . I] be an n x nd matrix composed by putting d n x n identity matrix Z in a row. Similarly, S(n,, . . . ,&) = [S,,, . . . , S,,] where S,, is an n xn smoother matrix. Finally, let 8 be the nd x d block diagonal matrix defined as follows:

(7)

where Xi = (Xlj,, . . . , x,,,,,)~, the j-th column vector of X, and O,, , = (0, . . . , O)T, an n-vector. Throughout the rest of this paper, we will denote $(A,, . . . ,&) by s when there is no confusion. Simple algebra shows that X=1x,

QFT(f- qT(z- S,)(f- S)8), = XT(Z-SJ(Zand that the i-th element

3. Convergence

of xT(P-

$)T(Z-

SA)(Z_ SA,)X,

SA)y is x,r(Z- S,,)T(Z-

Sn)y.

rates

Asymptotic analysis parallel to that in Rice (1986) is used to study the behavior of our new estimate (6) and the partial regression estimate (4). It is well known that the solution to (2) when /Z = 0 is in the space of natural splines of order m on [0,11. According to Demmler and Reinsch (1975), a basis for this space is (@jj.,(t)>isj<, with the following biorthogonality property:

Here {A,,} is a nondecreasing values of SA are (1 +Ukn)-‘. Set


(n-‘cz=i rkin
(kjnjo,

sequence

Zu,,@kn(tln), Note that xijn

of nonnegative

hkjn=n-“2C:=i =

hj(ti,)

numbers hj(tin)@kn(t;n),

and the eigenand

I’=

+ zijn. For j = 1, . . . , d, we assume:

l’l+oO.

n--t 03; z is positive

definite.

A3. SUplsk5n I
192

H.

Chen, J.-J.H.

Shiau / Spline smoothing for partially linear models

A4. The points t, are regular in the sense that (2i- 1)/2n = j: p(t) dt for some continuous density function p(t) on [0, 11. A.5. mz2. Set hj=(hlj,, . . . . h,,)T. Let B~,=n~‘gT(I-S~)*g, Bt,=n-‘h,7(~-S~,)*hj, and B:jp = K’h,T(I- SA)*hj. Note that BFP is the averaged squared bias of the ordinary smoothing spline estimate of g when /3 = 0. A similar interpretation is applicable to B& and B2,,,. The following lemma is due to Speckman (1981). Lemma 1. Under A4 and AS, (a) ,I,, = ~k~~(1-t o(1)) where c is a constant

depending on p( . ), and o(1) denotes a term tending to zero as n + 00 uniformly for k,, I k< k,, for any sequence k,, + 03 and k,, = o(n*‘(*“+ I’); (b) B&=0(1)

ifge

W;l.

Thus Lemma l(b) also implies that Bzjp = 0(/l,) and B$,,= O(L) if hjE WT. Throughout the rest of this paper, we will write a(n)= b(n) when a(n)/b(n) is bounded away from zero and infinity. Lemma

2. Assume that A4 and AS hold and A = KS for 0 < 6 < 1, ;

(1 + &,A))’

for some positive o(P*m).

= o(n-

constant

1/2m)+~(,~l/2~~l)~~(nl/2-r)

5, depending

on 6 only. Also,

Ck (1 + A,&)*

=

Proof. The second equality is given in Speckman (1981). To show the first equality, we mimic the argument used in Speckman (1981) to split the range of summation into [1,Am1’2m], [X1’2m,n3,‘4m], and [n3’4m, n]. The sum for the first range is bounded by AP1’2m. Over the second range, by Lemma l(a), we can approximate A,, by the second sum by an integral which gives K1’2m as a ck21n and approximate each summand on the third range is bounded bound. Since Akn is nondecreasing, by the first term. Thus the third sum is bounded by O(n * nm3’2X1) = O(n-“*A-‘). This completes the proof. 0 To study asymptotic behaviors of the two estimates (4) and (6), we first obtain convergence rates for some terms which will be used later in the proofs of Theorem 1 through 6. The proof of Lemma 3 is given in the Appendix. Lemma 3. Assume that Al-A5 hold, A t. nPs and Ai=npsi for 0<6,,6< that g,h,e Wpfor i=l,...,d. Then for lli,j
1, and

H. Chen, J.-J.H.

Shiau / Spline smoothing ,for partially linear models

>(I-SE,)Sn,xj=o(n-“2)+O((AJ)“2), (e) XiTSI,,(IS,)iSl,Xj=O(,,) + 0((~j~j)~“4’n10g2 (f) tr S,” = 0(AP”2”z),

193

(d) n-‘$(I-&

(8) x,‘]S, + (I- USirlTISA + (I(h) n~‘XT(Z-S~)2Xj=~;j+ O(l), (i) n-‘~,~S,‘x, = 0( 1).

S,P,,lx, = O(n),

We are now ready to show the main 3. I.

?Z),

results.

Two-stage smoothing spline estimate

For the parametric

component

/I,,, we have

Theorem 1. Suppose that the same assumptions (a) E(&)=/?+o(n~“2)+O((maxj A,A)“2, (b) nVar(&,) --f a2Xm’ as n-m. Proof.

as in Lemma 3 hold. Then

We show (b) first. By (6), we have nVar(&)=a2A[‘A,A;r

where

Sj~)(P- 5)x

At = K’ZT(f-

S)‘(Z-

A, = K’,f’(F-

S)T(Z- S,)‘(f-

and

S)%.

Note that both A, and A2 converge to Z as n -+ ~0 by Lemma proves (b). To show (a), observe that E(/$-P

3(a) and 3(b). This

= A,‘[n-18T(J-S)T(Z-S~)S~~]

+x~,‘[~-‘X~(II-$)~(Z-S~,)~]. Thus

(a) is obtained

by Lemma

3(c) and 3(d).

q

Next, we study the asymptotic behavior of &. Define the average squared bias of & at the data points, B2(A,x) =n-‘C:=, (E&(t,,)-g(tj,))2, and the average variance of &, V(A,i) = n-‘Cl_, Var(&(t,)) where i= (A,, . . . ,A,). Convergence rates of B’(A,x) and k’(A,I) are given in the following theorem. 2. Suppose the same assumptions as in Lemma 3 hold. Then (a) B2(A, x) = O(A) + O(n-‘(maxi A3i))1’2mlog’ n), (b) v(A, 1) = 0(n-1A-1’2’“).

Theorem

Proof.

Recall that & = SAy - [S, f+ (I-

E&-g=

S,)s”]x&.

We have

-(Z-SA)g-(Z-S,)~~~-[Sl~+(Z-SA)$]~(E/&/3).

194

H. Chen, .I-J.H. Shiau / Spline smoothing for partially linear models

Then (n&(13., K))“2 = IIJ%.-gII~

IlFS,)gII + ll(~-wm + ll[S,~+(~-S,)Sl~(E~~-_p)ll.

03)

Note that II(Z-SJg~~2=gr(Z-SA)2g=nB~p=O(n~) by Lemma l(b) and II(Z-S,Jf?~jl /12= + O((max, Ai))1’2m log2 n) by Lemma 3(e). The square of the third term of (8)

O(d)

equals [E(B,) -p]rF[SJ+

(z-S$?]r[SJ+

(I-

S,)Q~[E(&)

-/I]

= [o(n~l)+O(max;~ii)]O(n)=o(l)+O(nmaxi~i~) by Theorem l(a) and Lemma 3(g). Putting these three terms together, we have nB*(A, X) = O(nA) + O((max; Ai))1’2m log2 n) + O(n maxi A;A), hence (a) holds. To show (b), we note that nv(&X)

=E(ll&-E&l~2)

=E(IIS,e-[S~I”+(Z-S~)~]~(~~-~~~)~~2)

~2~(11~~el12+lI[~~~+(~-~~)~l~(~~-~~~)l12) = 2o2trS~+2tr(8T[S*~+(Z-_~)S]T[S~~+(Z-S~)S]~Var(Bo)) where e=(e,,, O(n_‘F’2”). Define

. . . . e,,)T. By Lemma 0

the average AMSE(A,L)

mean

square

= n-l

3(f), 3(g) and Theorem

error

of &(t)

i MSE(&(tj)) i=l

l(b), we have I’(n, I)=

as

= B*(1, x) + I’@, I).

Let A*, 27 be the ‘optimal’ A, 1, which ‘minimize’ AMSE(A,x) in the sense of convergence rates. In fact, we equate the convergence rates in B2(&z) and that in v&1-> to get: Corollary AMSE(I*,

1. 17 = O(K 2m’(2m+1)(log n)4”) for j= 1, . . . ,d, x*) = 0(,-2m’(2m+ t)) where n” = (AT, . . . , As).

A* =O(np2m’(2m’1)),

and

Remark 1. By Theorem l(a), we know that ED,, =p+ ~(n-l’~) if max; AiL = 0(/C’). Suppose 1 can be chosen so that it achieves the optimal rate 0(n~2m’(2m+1)). Then we only need maxi Ai = O(n?O) with 1 > a,,> 142~ + 1). However, by Theorem 2, for AMSE(I, x) to achieve the optimal rate, we need maxi Ai = 0(n-2”‘(2m+ “(log n)4”) which goes to zero faster than O(n -2m’(2m+1)ip) for any positive constant E. There exists E such that 2m/(2m + 1) --E> 1/(2m + l), say, E = 1/(2m + 1). Thus we have shown that by choosing appropriate rates for ,J and ~j, we can estimate jI at the parametric rate n -1’2 while the average mean square error of & can achieve the optimal rate of convergence at the ‘about right’ degree of smoothness. By applying the Markov inequality to X1 Cp (&(t;,,) -g(ti,))2, Theorem 2 immediately gives us

H. Chen, _I-J.H. Shiau / Spline smoothing for partially linear models

Theorem

3. Under the same conditions

n-l;!,(&(t;,)

-g(t,,))*

=

195

as in Lemma 3,

0,(2 + n~‘~~“2m + n-‘(maxi

&)“*“(log

n)*).

When A=A* and ~j==~, np’C:=, (~~(t~~)-g(t;~))2=O~(~~2m’~2m~‘~). Next, we will show that fi(&/?) converges in distribution to N(O,o*z-‘) under the following additional conditions. A6. z is a random vector with mean zero, covariance matrix .E’, and finite absolute third moment. A7. The {e,,) are independent random variables with uniformly bounded absolute third moments and e, is independent of z. Theorem

4. Under all the conditions of Lemma 3 and A6 and A7, fi(/?,, - p) converges in distribution to N(0, a*C-‘) if maxi A,,I = o(n-‘).

Proof. Since &/?=FIA,‘[~~‘T=?~(~-$)(Z-S~)(F?.?/?+~+~)], by Lemma 3(c) and to 3(d), it remains to show that n -“*T?‘(~- $)(I- S,Je converges in distribution N(0,02C). Define iiO, &, to be the dn x d matrices of the form (7) with xi replaced by h, and Zi, respectively. Write .?r(F-

S)(Z-

Sri))) = AT(F-

S)(Z-

SA)e + ZT[(F-

S)(Z-

SJ - f]e+

ZTe.

Since EhT(Z- S,,)(Z- Si)e= 0 and Var(K”*hT(Z-

S,,)(Z-

SJe)

~02B4j~ = o(l),

= n-‘02h~(Z-S~,)(Z-S~)*(Z-S~,)hj we

have

n-“*AT(P-

S)(Z- S,)e = o,(l).

(9)

Let o be any unit d-vector and a=Zu. Then vTZTe= Cy=, aiejn where ai is the i-th element of a. By A6, the Iail are i.i.d. random variables with finite third moments. Observe that

np3’*;il E (,iein~35np3’2j~l

E Iail

supE

le,,13 = o(1)

which means oTZTe satisfies a Lindeberg condition of order 3. By A6 and the law of large numbers, np’ZTZ+ .X. Then by Theorem 9.1 of Chow and Teicher (1978),

n-“2ZTe + N(0, 0~2’) in distribution. By (9) and (lo),

Theorem

4 holds if we can show that for all i,

nm”2z’[(Z- Sk,)(Z- S,) - Z]e -+0 Observe

that _!$[(I-

(LO)

in probability.

SA,)(Z- S,) - Z]e = 0 and

(11)

196

H. Chen, J.-J. H. Shiau / Spline smoothing for partially linear models

Var(z’[(Z-SAJ)(Z-S,) -

Z]e)

= EVar{zTK-%,)(Z-SA)-Zle = U2E(z’[(Z-S~,)(Z-S~)-Z]22;} 5 Mtr(si

+ S:J = O(K”2”

where A4 is a constant.

Again

1z,})

/ zi} +Var(E(z.,T[(Z-Sh,)(Z-Si)-Z]e = 02C7;itr[(Z-Sj,)(Z-S>~)-Z]2 + AI”‘“)

by the Markov

=

0(n”2)

inequality,

(11) holds.

0

Remark 2. Based on Theorem 4, the asymptotic variance of & is n-‘a2C-‘. When is the smallest possible achievable asymp% - N(0, 02), it can be shown that a2F’ totic variance among those estimators of j? which utilize the information of g E W;” only. See Remark 1 in Chen (1988) for further detail. 3.2.

Partial

estimate

regression

In this section, we study the asymptotic behavior of the partial defined in (4) and summarize them in the following theorems. Theorem

5. Suppose the same assumptions

nP’;gl (g’(rj,)-g(tjn))2 when

A =

regression

estimate

as in Lemma 3 hold. Then

= 0p(nP2m’(2m+‘))

n-2’77/(2m + ‘)+

Proof. Define B2(A)=nm1Cr=1 (Eg’(t,,)-g(tj,))2 Observe that

and V(l)=n-‘Cy=’

Var(g,(t,)).

2, = S~[Z-X(XT(Z-S,)2X)-‘Xr(Z-S~)2](g+e). We have

Ef,-g

= -(Z-SA)g-SAX(n-‘XT(Z-SA)2X)-‘n-‘Xr(Z-SA)2g

and

nV(A) = a2[tr S: - 2tr(XT(Z-

SA)2X)P’XT(Z-

SA)2SkX

+ tr(XT(Z-SA)2X)P’XTSjX(XT(Z-Sk)2X)P’XT(Z-Si)4X]. Note that

//(Z-SA)g112= O(nA) by Lemma

l(b) and

lISAX(nm’XT(Z- SA)2X)-‘nm1XT(Z= K’ [n-‘XT(Z. (n-‘XT(ZThen

by the proof flO(log

of Lemma

SA)2g/12

S A)’ glT(n-‘XT(ZSn)2X)P’nP’XT(Z-

SJ2X)-‘(n-‘X’SjfX) S,)2g.

3(c), ~n-1x~(Z-SA)2g/

is bounded

n)[O(nP’Am”2m) + 0(n-3’21.)]“2 + O(A).

above

by

197

H. Chen, J.-J. H. Shiau / @line smoothing for partially linear models

Hence,

EZ2(n)=0(2)

by the above

arguments

and Lemma

3(b), (f), (h) and (i), we have nV(1)=0(X”2’“). n-l i E&Q,,) i=l

-g(l,n))2

By applying the Markov follows easily. 0

3(h) and (i). By Lemma

Hence,

= P(A) + V(n) = O(A + nP’P2m).

inequality

to

n -’ C:=, @?I(tin) -g(t,,))2,

Theorem

6. Under all the conditions of Lemma 3 and A6 and Al, fi(j?r verges in distribution to N(0, a2T’) if A2 = o(n-‘).

Theorem

Proof.

Observe

5

- p) con-

that

p^, -B = (Yry-

s,)2X)-‘X’(z-

S$g+

(XT(Z-

S$X))‘XT(I-

S&?.

Write

n m”2xT(Z- Sj,)% = nm”2HT(Z- SA)2e+ n- “2ZT[(Z-Sj,)2-Z]f?+n~“2ZTe. By (9), (lo), and (1 l), nm”‘XT(ZLemma 3(c),

nm”2XT(ZCombine

Appendix.

the above

Sj,)‘e converges

to N(O,o’_Z)

in distribution.

By

SA)2g = o,(l) + O,(n”‘A).

results

and Lemma

3(h), Theorem

6 holds.

0

Proof of Lemma 3

We first show (a). Note that

n-‘x,‘~(Z- S,<)(Z- S,)(Z- SA,)xJ

Recall that BiJP= n-’ Ck h~J,(~k,~ji/(l $ ~kn~J))2=O(~j) (1 +A,&)5 1, we have

Next,

observe

that

for

hJ E Wzm. By )LknA/

H. Chen, J.-J. H. Shiau / Spline smoothing for partially linear models

198

+AknIIi)+(l

(1 +Aknl;)+AknA(l

KkinLjnl

= ;

(l

+

A,nAi)(l

+

kn

A.+&+ J

+Iknli)(~kn~j)+~kn~j.Akn~

AknAj)(l

+ Aknn>

l+:

kn

kn

(A.11

A.]. I

By A3 and Lemma 2, ;

j &;,&.jn(

(1 + d&-

= o((bg

n)2)o(d’2)

= o(n).

(A.3

(A.2) holds for I. replaced by Ai or ii,. Thus the right hand side of (A.l) equals to o(n). Then by A2, we have (A-3) By the Cauchy-Schwartz .-’

;

inequality, the cross product term,

rkinhkjn

AknAiA’& *

1 + Ak”Ai 1 +

1 f AknA

Ak”Aj

=(O(~j)~ii)“2

=

o(l).

Putting pieces together, we have shown (a). Note that the cross product term will never dominate the rate due to the Cauchy-Schwartz inequality. Therefore, there is no need to obtain the convergence rate for them in the subsequent equalities (b)-(i). Using the same argument, the proof of (b) is straightforward. Set ckn= K”‘C~=, g(rin)~kn(tin). TO show (c), we note that

n-‘$(I-

s,,)(l-

s,& =

.-l

c (
kn

hn

Observe that

l/2 1

(

F

(1 iYkAA)2 n

>

“2*

(A.4)

Note that n-l ck c,$kn< 03 for g E w2nl and that &,&(l +J&,,A)~%l/(1 +AknA). Then by Lemma 2 and A3, the right-hand side of (A.4) is bounded above by $O(log n)(O(n_‘A- “2m) + O(K~‘~A))“* = o(K”~). Also by the Cauchy-Schwartz inequality, we have

which is of the order O((AiA)“‘) by Lemma l(b). Thus (c) is proved. Next, we show (d). Note that

H. Chen,

1..J.H.

Shiau / Spline smoothing

n-‘qT(z- S,,)(Z-

By A3, Lemma

linear models

199

S,)S,,x;

2, and mr2,

= O(n-l(log

for partially

it follows

rl)2(A;-l’2n’

that

+ r?A;‘))

= .(n-*‘2).

Recall that

I .-’ Putting

F lhklnhkJnI

these terms together,

we have shown

We show (e) by observing F lrkinrkjnl

~kn*r ~ “B,;,B,j~ 1 + &,lIi; 1 + &,,A

the following

= O((n~;)“2).

(d).

inequalities:



l (i%J 1 $ hknhi 1 + AknA] l/2

1

I

O(log2 n)

c

k

(

c

k

(1 + Ak,,A;12

= 0((~j~j)~“4’“10g2

l

(1 + hk,&)2

)

n)

and

5

(+$&-nB,,BI, =W/l).

$ b%inhkjnI

Thus $&(IS~)2S,,x;=O(n~)+ 0((~;Ij)-“4mlog2 n). (f) is a direct consequence of Lemma 2 since tr s,‘= ck (1 + Ak,,A)m2. TO show (g), we first show that (i) holds. This is shown by noting F k&&I

(1

+lk,?A)-2

= 0(A-1’2’n10g2

n)

and

IhkirthkjrzI (1+AknA)m2

C k

5

F

IhkinhkjnI

-K 2 )I”( - ( Y$ hkin

-$

h&,,)‘12 = O(n).

H. Chen, J.-J.H. Shiau / Spline smoothing for partially linear models

200

The last equality holds since h,(t) and h,(t) are continuous functions with compact domain and hence bounded. Hence n~‘x,rS~~~=O(l). Next, we have X,%].(I-

S&S&X, = O(P2Yog2

II + .m1’2L-’ log2 n + n/P)

by the fact that

=O(log%2)O(P2~+n~“2~-‘) and

Combining these results and (e), we have (g). To show (h), observe that

K’X’(Z- S,)2Xj= n-’ F

(hk;, + rkin)(hk~n + Skjn)

kn (IL”,)

2. kn

We have

nmlF hkinhkp By (A.2),

Hence

I B,i,B,jp

= O(n) = o(l).

we have

(h) holds by A2.

Acknowledgements We thank the referee for comments paper.

which helped improve

the presentation

of the

References Buja, A., T. Hastie

and R. Tibshirani

(1989). Linear

smoothers

and additive

models.

Ann. Statist. 17,

4.53-555. Chen,

H. (1988). Convergence

16, 136-146.

rates for parametric

components

in a partly

linear model.

Ann. Statist.

H. Chen,

J.-J.H.

Chow, Y.S. and H. Teicher Craven, P. and G. Wahba

Shiau / Spline smoothing

(1978). Probability (1979). Smoothing

for partially

linear models

Theory. Springer-Verlag, noisy data with spline

201

New York. functions. Numer.

Math.

31,

Math.

24,

377-403. Demmler,

A. and C. Reisch

(1975).

Oscillation

matrices

with spline

smoothing.

Numer.

375-382. Denby,

L.

(1986).

Laboratories,

Smooth

Princeton,

regression

function.

Statistical

Research

Report

#26,

Engle, R.F., C.W. Granger, J. Rice and A. Weiss (1986). Semiparametric estimates tween weather and electricity sales. J. Alter. Statist. Assoc. 81, 310-320. Eubank,

R.L.

Buja,

and P. Speckman

A., T. Hastie

Eubank, Heckman,

Plann. NE.

(1986).

Shiau,

J. Atmos.

Ann.

to “Linear

Sfatist.

and additive

be-

models”,

by

17, 525-529.

rates for estimation

smoothing

in partly

rates for partially spline estimation

linear

splined

models.

models.

of functions

and D.R.

boundary Ocean.

Johnson

information

Technol.

(1986). Partial

in otherwise

in certain /. Roy.

partially

Statist.

Stafist.

Probab.

with discontinuities. spline models

smooth

linear models.

Assoc.

Left.

Ser. B 48,

4, 203-208.

Tech. Rep. #768,

Dept.

for the inclusion

two and three dimensional

of tropopause

objective

analysis.

for a semiparametric

model.

3, 713-725.

Shiau, J. and Cl. Wahba (1988). Rates of convergence Comm. Sta/is/. 17 (4), 1117-l 133. Speckman,

smoothers

of the relation

Univ. of Wisconsin-Madison.

J., G. Wahba

and frontal

Bell

23, 33-43.

Spline

J. (1985). Smoothing

of Statistics,

Discussion

(1989). Convergence

Inference

2444248. Rice, J. (1986). Convergence Shiau,

(1989).

and R. Tibshirani.

R.L. and P. Whitney

J. Statist.

AT&T

NJ.

P. (1981). The asymptotic

integrated

of some estimators

mean square error for smoothing

noisy data by splines.

Manuscript. Speckman,

P. (1988).

Kernel

smoothing

in partial

linear

models.

J. Roy.

Statist.

Assoc.

Ser. B 50,

413-436. Wahba,

G. (1984).

Partial

spline

models

for the semiparametric

estimation

of functions

of several

variables. In: SfatisticalAnalysis of TimeSeries,312-329. Institute of Statistical Mathematics, Tokyo. Wahba, G. (1986). Partial and interaction splines for the semiparametric estimation of functions of several

variables.

In: T.J.

Boardman,

Ed., Computer

Science and Statistics:

Symposium on the Interface. American Statistical Association, Wellner, J. (1986). Semiparametric models: progress and problems. ISI Centenary

Session.

Center

for Mathematics

and Computer

Proceedings

of the 18th

Washington, DC, 75-80. In: R.D. Gill and M.N. Voors, Eds., Science,

Amsterdam.