Journal of Econometrics 23 (1983) 275283. NorthHolland
PARTIALLY
GENERALIZED LEAST SQUARES AND TWOSTAGE LEAST SQUARES ESTIMATORS* Takeshi AMEMIYA Stanford University, Stanford, CA 94305, USA
Received April 1982, final version received November 1982 A class of partially generalized least squares estimators and a class of partially generalized twostage least squares estimators in regression models with heteroscedastic errors are proposed. By using these estimators a researcher can attain higher efficiency than that attained by the least squares or the twostage least squares estimators without explicitly estimating each component of the heteroscedastic variances. However, the e%ziency is not as high as that of the generalized least squares or the generalized twostage least squares estimator calculated using the knowledge of the true variances. Hence the use of the term partial.
1. Introduction
In this paper I propose a class of partially generalized least squares estimators and a class of partially generalized twostage least squares estimators in regression models with heteroscedastic errors. By using these estimators a researcher can attain higher efficiency than that attained by the least squares or the twostage least squares estimators without explicitly estimating each component of the heteroscedastic variances. However, the efficiency is not as high as that of the generalized least squares or the generalized twostage least squares estimator calculated using the knowledge of the true variances. This is why I use the term partially above. This paper is motivated by Chamberlain (1982), who suggests a way to improve on the least squares or twostage least squares estimator in heteroscedastic regression models without explicitly estimating each variance .I In this paper I carry Chamberlain’s idea further to define a class of estimators more efficient than his. Chamberlain assumes that the exogenous variables are i.i.d. random variables, but I work with the more standard assumption that the exogenous *This work was supported by NSF Grant SES 7912965 to Stanford University. The paper has greatly improved through numerous discussions with Tom MaCurdy. ‘The idea should also be credited to White (1982), who proposed an instrumental variables estimator more efficient than the twostage least squares estimator in a heteroscedastic stratified crosssection model. 03044076/83/%3.00 0 1983, Elsevier Science Publishers B.V. (NorthHolland)
216
T. Amemiya, Partially generalized LS and 2SLS estimators
variables are known constants. So Chamberlain’s estimators are reinterpreted to conform to my setting. Chamberlain also considers a modification of the threestage least squares estimator, but I do not because the results regarding the twostage least squares can be easily generalized to this situation. 2. A heteroscedastic regression model In this section I will consider a heteroscedastic
regression model,
y=x/?+u,
(2.1)
where X is a T x K matrix of known constants with a full column rank and the elements {ut} of the Tvector u are independent but heteroscedastic with bounded second moments and finite fourth moments. We define C=Euu’. Note that Z is a diagonal matrix by our assumption. In order to prove certain asymptotic results later, I will assume that the elements of X are bounded and lim T'X'X exists and is nonsingular. The boundness of X is not necessary and can be easily replaced by a set of slightly more general assumptions, but I will not do so because it seems to be a rather uninteresting mathematical exercise. I assume that there are 4 linear constraints on fi written as
QB=O,
(2.2)
where Q is a 4 x K matrix of known constants with a full row rank. The possibility of no constraint is subsumed under the assumption. First, I will assume that C is known, and later I will consider the more interesting case where C is unknown. I will consider three estimators: The constrained least squares (CLS), the constrained generalized least squares (CGLS), and Chamberlain’s estimator. The first two are well known. They are defined as follows:
j?+={Z(X'X)lQ'[Q(X'X)'Q']'Q}fl,
(2.3)
where ?=(X’x)‘X’y, and
~~={~(x~~lx)lQ~~~(~~~l~)l~~~l~)~c, where ~G=(x’zlx)lxT~y.
(2.4)
T. Amemiya, Partially generalized LS and 2SLS estimators
277
To define Chamberlain’s estimator p, premultiply (2.1) by X’ to obtain X'y=X'Xj?+X'u.
(2.5)
Then, p is CGLS applied to (2.5). Thus,
F=cI~Q’(Q~Q’) 1~113,
(24
where ‘4 =(X’X) ~‘XZX(XX)
l.
All the three estimators are unbiased and their variancecovariance matrices can be easily derived from their definitions. A direct comparison of the variancecovariance matrices will show (2.7) where the inequalities are in the sense of matrices. Strict inequalities generally hold except in a special case where there is no constraint, in which case
I will give an alternative, more intuitive explanation for (2.7). That fl is better than p’ (assuming there is a constraint) can be shown by noting that /I’ is CLS applied to (2.5), whereas /? is CGLS applied to (2.5) as I stated earlier. To show that fi,’ is superior to p, define a T x (TK) matrix of constants W such that [X, W] is nonsingular and W’X =O. Premultiply (2.1) by W’ and obtain w’y=
(2.8)
w’u.
Then, /3,+ can be interpreted as CGLS applied to (2.5) and (2.8) jointly and hence superior to 8. Now, I will consider the case where Z is unknown. Let y, and xi be the tth rows of y and X, respectively, and let D be the Tdimensional diagonal matrix whose tth diagonal element is equal to (y,x;fi)‘. Then, under our assumptions we have plim (X’DX/T)
=
lim (X’CX/T),
(2.9)
which I will prove below.’ Therefore, it is clear from the definition of p that *Eq. (2.9) was First demonstrated and used in estimating the variancecovariance matrix of the least squares estimator in a heteroscedastic regression model by Eicker (1963). Eicker’s idea was further developed by White (1980).
278
T. Amemiya, Partially generalized LS and 2SLS estimators
if we replace X’CX with X’DX in the definition (2.6) we obtain the asymptotically equivalent estimator. Note that even if we cannot estimate Z we can consistently estimate lim T‘X’CX, which is all that is needed for the present purpose. Now, a proof of (2.9). Consider the i, j th element of the righthand side of (2.9). We have T‘x~ZX~ = T’ $I EXi,Xj,U:.
(2.10)
Consider the same for the lefthand side. We have
2T’
~
XitXjtX;
(8a)~t.
(2.11)
t=1
The first term of the righthand side of (2.11) converges in probability to the limit of the righthand side of (2.10) by a law of large numbers under the assumptions I stated earlier. Also under our assumptions, the second and the third terms of the righthand side of (2.11) converge to zero in probability. Thus, (2.9) is proved. Now I want to ask: (1) Can I define an estimator which is more efficient than Chamberlain’s and yet does not require the estimation of Z? (2) Can such an estimator be asymptotically as efficient as CGLS? The answer to the first question is yes and the answer to the second is generally no, as I will show below. When we define CGLS as in (2.4), it seems that we cannot calculate it unless we can estimate C since there does not seem to be a consistent estimator of lim T‘X’Z;‘X. Suppose we rewrite (2.4) using the interpretation of fi,’ given after (2.8). Then we have ,!?G+={I(X’X)1A(X’X)1Q’[Q(X’X)1x4(X’X)1Q’]1Q}j?,,
(2.12)
A=X’CXX’X’W(W’ZW)‘W’CX,
(2.13)
~c=~(x’x)lXZw(w’cw)‘w’y.
(2.14)
where
and
T. Amemiya, Partially generalized
LS and 2SLS estimators
An equivalence of (2.12) to (2.4) can be also directly demonstrated the identity
C~W(W’CW)lW’C~=IC+X(X’ClX)X’z+.
279
by using
(2.15)
We have made progress since the righthand side of (2.12) depends only on C and not on C‘. However, a difficulty of calculating j?b still exists: each element of the matrices T‘X’DW and T‘W’DW converge in probability to the limit of the corresponding element of T ‘X’C W and T ’ W’ZW, respectively, for the same reason as (2.9) holds, but replacing X’CW and W’CW with X’DW and W’DW does not produce an asymptotically equivalent estimator because the sizes of the matrices increase with the sample size T. The above consideration suggests that we should use only a subset W, of W in defining an estimator of the form (2.12). I assume that Wl is a T x N matrix of full column rank, where N is a finite fixed number, such that its elements are bounded and lim T ’ W; W, exists and is nonsingular. I define the class of constrained partially generalized least squares estimators (CPGLS) by PP’={I(X’X)1Al(X’X)1Q’[Q(X’X)1Al(X’X)1Q’]1Q}~~,
(2.16) where Al=X’~xX’CWl(WICWl)lWICX,
(2.17)
j?p=pI(x’x)lx’Zwl(WIZwl)lw;y.
(2.18)
and
One can replace C with D in the above without changing the asymptotic distribution of the estimator. I can show that BP+ is more efficient than Chamberlain’s i? in exactly the same way as I earlier showed the superiority of /3,’ over fl. If there is no constraint, BP+is reduced to flp. Note that BP,,with D in place of C, is asymptotically more efficient than the least squares, even though Chamberlain’s estimator cannot do any better than the least squares in the case of no constraint. More precisely, we have
vpv~p=(x’x)‘x’cwl(w;cwl)lwlzx(x’x)’.
(2.19)
The above equality suggests that W, should be chosen so as to maximize the correlations between the columns of WI and CX. Unfortunately, I cannot at the moment offer any concrete formula which is generally useful for finding
280
T. Amemiya, Partially generalized LS and 2SLS estimators
the optimal IV,. I will only give a simple example below, where the optimal W, can be easily found. Consider a special case of the model (2.1) where B is a scalar and X is a Tvector of ones. Assume that the first T/2 elements of C are ones and the remaining T/2 elements have the same value a, which is an unknown parameter. Actually, the number of elements in each of the two groups may differ from T/2 by any finite number without affecting our asymptotic results. Then, the optimal W, is the vector whose first T/2 elements are ones and remaining T/2 elements are minus ones. Then, using (2.19), we can easily show V&=V&(l/T)(2a/(l
(2.20)
+a)),
whereas
~B^=U/T)U +4/2). 3. A heteroscedastic
simultaneous
(2.21) equations model
In this section I will consider a limited information simultaneous equations model with heteroscedastic errors defined by y=Yy+X,fl+u=Zcr+u,
(3.1)
y=xII+v,
(3.2)
and
where X1 is a subset of X, X following the same condition as in section 2, and u also has the same properties as in section 2, with Euu’ = Z diagonal as before. As for I’, I assume that its t th row u; may be correlated among one another and also with u, but serially independent with bounded variances. Here I do not assume any constraints among the parameters y and /I, though such constraints can be easily handled. As in section 2, I first assume that Z is known and compare the following three estimators: the twostage least squares (2SLS), the generalized twostage least squares (G2SLS), and Chamberlain’s estimator. The 2SLS estimator of CY is defined by 4 = (z,PZ)  ‘Z’Py,
where P = X(X’X) given by I/& = (Z’Z)
 IX’,
(3.3) and its asymptotic
 l Z’xqZ’Z)
 1,
variancecovariance
matrix is
(3.4)
T. Amemiya, Partially generalized LS and 2SLS estimators
where Z=(Xn,
281
X,). I define the G2SLS estimator as
8,=(z’PC_‘PZ)_lz’PC_‘Py,
(3.5)
and its asymptotic variancecovariance
matrix is given by
VB,=(Z’c~‘Z)‘.
(3.6)
Theil (1961, p. 345) defined G2SLS as (Z’P,YIZ)‘Z’PC‘y, which has the same asymptotic distribution as (3.5). I defined it as (3.5) in order to rewrite it in a certain way, which I will show later. Chamberlain’s estimator can be derived by premultiplying (3.1) by X’ to obtain x’y = x’za
+ X’u,
(3.7)
and then, applying GLS to (3.7), as L?= [ZX(X’CX)

1x’z]  ‘Z’X(x’CX)  ‘x’y.
Its asymptotic variancecovariance I/i?= [z’x(xlcx)‘X’Z]
It is straightforward
(3.8)
matrix is given by 1.
(3.9)
to show
I/oi2 I/&2 V’!i,.
(3.10)
If C is unknown, one can replace X’CX by X’DX in (3.8) without changing the asymptotic distribution because of (2.9) where D is now defined as the diagonal matrix whose t th diagonal element is equal to (y,~idi)~. Using the identity (2.15) we can rewrite (3.5) as ti,=(Z’XA_‘X’Z)_‘Z’XA_‘X’y,
(3.11)
where A is as defined in (2.13). Either in the form (3.5) or (3.11), however, one cannot replace C with D without changing the asymptotic distribution for the same reason I explained in section 2. As in section 2, I define the class of partially generalized twostage least squares estimators (PG2SLS) by oi,=(Z’XA;‘X’Z)~‘Z’XA;‘X’y, where A, is as defined in (2.17). Its asymptotic varianceecovariance given by V&,=(Z’XA;lX’Z)l.
(3.12) matrix is
(3.13)
282
T. Amemiya, Partially generalized LS and ZSLS estimators
It is easy to show
The asymptotic distribution definition.
of 6, is unchanged if C is replaced with D in its
All the estimators considered in this section can be straightforwardly generalized to the full information simultaneous equations model to yield the G3SLS, PG3SLS, and the corresponding Chamberlain’s estimator. I will briefly indicate how this can be done. Write n structural equations as
y=za+u,
(3.15)
where v=(y;,
y; ,...,
y:)‘,
a=(&,&
,...,
a;)‘,
u=(u;,u;
,...,
I&)‘,
and
z=
2,
0
0
Z2
.
i:
0
0
Also define
[email protected],
w,=
1.
Z”
where @ is the Kronecker product, and
w;
0
0
W
0
...
0
V
So far everything is essentially the same as the model (3.1) using these newly defined matrices which appear in bold italics. The only significant new feature of (3.15) as compared to (3.1) is that here C=Euu’ is not diagonal,
T. Amemiya, Partially generalized LS and 2SLS estimators
283
but is of the form
where each ~ij is a diagonal matrix. However, this does not create any significantly new problem because, for example, plim T‘X’D,jX where the tth diagonal element of Dij is (YitZ:,Cli) = plim T  ‘X’ZijX, (Yjt ZStaj).
References Chamberlain, Gary, 1982, Multivariate regression models for panel data, Journal of Econometrics 18,546. Eicker, F., 1963, Asymptotic normality and consistency of the least squares estimators for families of linear regressions, Annals of Mathematical Statistics 34, 447456. Theil, Henri, 1961, Economic forecasts and policy, 2nd rev. ed. (NorthHolland, Amsterdam). White, Halbert, 1980, A heteroscedasticconsistent covariance matrix estimator and a direct test for heteroscedasticity, Econometrica 48,817838. White, Halbert, 1982, Instrumental variables regression with independent observations, Econometrica 54483499.