# A Generalised Eigenvalue Problem and The Lanczos Algorithm

## A Generalised Eigenvalue Problem and The Lanczos Algorithm

Large Scale Eigenvalue Problems J. Cullum and R.A. Willoughby (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1986 95 A GENERALISED EI...

Large Scale Eigenvalue Problems J. Cullum and R.A. Willoughby (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1986

95

A GENERALISED EIGENVALUE PROBLEM AND THE LANCZOS ALGORITHM Thomas Ericsson visiting the Center for Pure and Applied Mathematics University of California Berkeley, California U.S.A.

Some properties of the generalised eigenvalue problem, Kx = X M z , where K and M a r e r e a l and symmetric matrices, and K is nonsingular and M is positive sermdefinite (and singular), a r e examined. We s t a r t by listing some basic properties of the problem, such as the Jordan normal form of K I M . The next p a r t deals with three standard algorithms (inverse iteration, the power method, and the Lanczos method) the focus being on the Lanczos method. I t is shown t h a t if the Lanczos starting vector is contaminated by a component, in a certain subspace, this component may grow rapidly and cause large e r r o r s in the computed approximations of the eigenvectors. Some ways to refine the approximations a r e presented. The last p a r t deals with the problem of estimating the quality of an approximate eigenpair, given a residual T

=m-xm.

I . Introduction The properties of the generalised eigenvalue problem

Kx = hMx where K and M a r e real, symmetric matrices, and M is positive definite, a r e wellknown. Since M is positive definite we can transform the generalised roblem into a symmetric standard eigenvalue problem A z = hz, where A = M-#KM- and 2 = Mx. This implies, for example, t h a t there exist real matrices X and A, where X i s nonsingular and A diagonal such t h a t K X = MXA. Since M is positive definite, x T M y defines a n inner product and (zTMz)% a norm. We can, f o r example, choose X such t h a t the eigenvectors a r e orthogonal with respect t o M ,i.e. XTMX = I . Given a n algorithm f o r the standard eigenvalue problem, we can also construct the corresponding algorithm f o r the generalised problem. The main computation in = Ayk. Reversing the transformathe power method, for example, h a s the form where gi = M-HY,, i = k , k + l , f o r the power tion above, we get the [email protected]+l = method in the g e n e r a k e d case. However, in practice one often encounters the case when M is only positive semidefinite. When this is the case most of what we did above need not be true. Not only may we lose the properties t h a t depend on the fact t h a t M is positive definite (some of those will remain if K is positive definite) but we can g e t a n eigenvalue problem with new properties, a s shown in the following example.

!f

mk,

Example. LetK=b Y ] , M = [ i

:].

Here we have one h = 1 and one i n f i n i t e A, (i.e. a zero eigenvalue of K I M . )

T.Ericsson

96

Let

K=

01

[l O].

0 0

M = [o

then K I M =

1i]

In this case Kx = XMx has a defective. infinite. eigenvalue

Here K

I] k], = AM

f o r any A, a n d we s a y t h a t t h e pencil (K, M ) is singular.

To summarise one could say t h a t this paper deals with some of the problems t h a t arise when we have a positive semidefinite a n d singular M . We s t a r t by analysing the eigenvalue problem, in section 2, to find out when we have a defective, singular etc. problem. In section 3 the behaviour of t h r e e algorithms (inverse iteration, power iteration, and the Lanczos method) is examined. Most of the section deals with a M inner product Lanczos algorithm for a shifted and inverted problem. It is shown t h a t problems occur (the Lanczos vectors may become contaminated by large components in a unwanted subspace). Section 4 deals with t h e problems of estimating the accuracy of computed results, given a residual. When Y is positive definite t h e r e a r e several bounds using t h e & -n!-'orm. When M is singular we must, of course, look for other alternatives. This paper does n o t deal with the implementational details of the algorithms (e.g. roundoff is neglected). A more practical view c a n be found in [ 13. For easy reference, we have collected some notation and definitions below.

1.1. Notation.

The notation corresponds to t h a t of Wilkinson , with the following additions.

Em'" denotes t h e vector space of all m-by-n r e a l matrices. If A E I R ~ ' " , t h e n the range space T ( A ) = { ye!Rm1 y = Ax for some z E I R " ~ ,and t h e null space of A is n ( A ) = [EIR" s I A z = Oj. If d i m ( ) denotes t h e dimension of a space, then rank ( A ) = d i m (T ( A ) ) ,null ( A ) = dim (n( A ) ) . If A = ( a l , .. . , %), t h e n s p a n ( a l , . . . , %) = T ( A ) . Let A , B E En’", t h e n A , , ( A , B ) is a n eigenvalue of t h e pencil ( A , B ) , i.e. d e t ( A - A k ( A , B ) B ) = 0. If B=Z, i.e. we have a standard eigenvalue problem, we write A k ( A ) instead. If it is clear from the context, what matrices a r e involved, we will use Ak.

If A €IRnXn, A = A T , A j ( A ) > 0, i = 1, . . . , n , we say t h a t A is positive definite. If & ( A ) > 0 , f o r all i. A is positive semidefinite. If A o r -A is positive definite, A is definite. If h i @ ) may be arbitrary, A is indefinite. If A is positive definite, zTAy defines a n inner product, and I ) x ] I A = ( x T A z ) g a vector norm. If zTAy = 0 , we s a y t h a t x a n d y a r e A-orthogonal. The n o r m having A = I is denoted 1 1 1 1 . The subordinate matrix n o r m is denoted in t h e same way, so 1 1 B 1 1 = ( m a z A(B*B))%. A+ is the Moore-Penrose pseudo-inverse of A If & , i = 1.. . . , n a r e square matrices, C = d i a g ( A 1 . .. . denotes the block , = 4. diagonal matrix where Cijis the zero matrix f o r all i f j , and C

,&I

Since we deal with K and M throughout this paper, we will give some often used factorisations:

A Generalised Eigewalue Problem und the Lanczos Algorithm

91

M has t h e eigen decomposition M(R, N ) = ( R ,N ) d i a g (R2, O),where the square matrix ( R ,N) is orthogonal, R = d i a g (a1,. . . , a?), T = rank ( M ) and u, > 0. ( R stands for Range space part, and N for Null space p a r t ) If K is nonsingular then

where H3 is a square matrix of order n u l l ( M ) . H I has the eigen decomposition H I ( Q ~QN) , = (QR,QN)d i a g ( h - ' , 0 ) where (QR, QN) is orthogonal.

C = X-'M, has the Jordan normal form CS = S J . The projections Px,PN and the subspaces X, N a r e defined in section 2.3 2. Some Basic Properties of ( K ,M ) . In this section we will give criteria for when t h e pencil (K, M ) is singular, defective e t c . Some of the theorems a r e well-known, but we have included them for completeness. These subjects a r e also dealt with in [ 5 ] , "71, f o r example, but from a different point of view. A s before we assume t h a t K and M E IRnX", and t h a t M is positive semidefinite and singular. We assume t h a t K is symmetric, but otherwise arbitrary. To make things easier to follow, we will first transform Kz = AMz into a n equivalent problem where the transformed M has a simple form.

Lemma 2.1. Kxi = XiMzi c a n be transformed into Rsi = hi&si,where R, & a r e r e a l = bij then a n d symmetric, a n d fi = d i a g ( I , 0 ) with T a n k (a)= T a n k (M).If z:Mzj s:[email protected] = 6, Proof. Using the factorisation of M (please see section 1.1) with U = ( R ,N ) , A = d i a g then Kx = AMx H A-'UTKUA-'AUTz

(n, I ) ,

M = U A d i a g ( I , O)AUT

= A diag ( I ,O)AUTz H

R = A - ~ u ~ K u A -z^~=. A U T X .

Assume M = d i a g ( I , 0 ) and partition

Rz^= ha?, with

K as

and let K ( a ) = K+aM, a EIR.

Lemma 2.2. If lar.

k]

has full column rank then there is a a such t h a t K ( a ) is nonsingu-

Proof. Assume t h e r e is no such a, and let Q = d i a g ( 1 , U ) , where U T K 3 U = d i a g ( A , 0 ) , where A is nonsingular and U is orthogonal. Let K 2 U = ( A , B ) .

2 ’

Having assumed t h a t @ K ( a Q IS singular for all a, there must exist nonzero w T = ( x T , y T , z T ) , s u c h t h a t QK ( a ) @ -0,or

98

T. Ericsson

(Kl+aZ)x+Ay +Bz = 0 A ~ X + A Y=

o

BTz = 0

(22)j

AA-'ATz+Ay = 0. This, together with (1) x ( K l + a l ) z - z T A A - ' A T x = 0, so z T ( K 1 - A A - I A T ) z= - a x T z .

and

(3),

gives

Now, take a such t h a t l a l > m a x I A ( K l - A A - ' A T ) I , then x = 0 is necessary (since -a would be a Rayleigh quotient otherwise).

= 0 a n d ( 2 ) > y = 0 , s o (1) gives Bz = 0 , a n d since w rank.

z

Using lemma 2.2 i t is easy t o prove:

Theorem 2.3. ( K , M ) singular H equivalent to t h a t

El

#

0 , B cannot have full column

does n o t have full column rank. (Which also is

K and M have a nontrivial intersection between their null spaces.)

We leave the proof to the r e a d e r We will now characterise t h e definite case ((K, M ) is said to be a definite pencil if t h e r e exist r e a l numbers a and p such t h a t aK+pM is a definite matrix). If (K, M ) is nondefective t h a t does n o t imply t h a t t h e pencil is definite. Let, f o r example,

K = diag(1,1 , -l), M = diag(l,O,0), [email protected] = d i a g ( a t @ a, , -a) which is indefinite for all a,8 . The pencil is n o t defective, however. We have t h e following theorem.

Theorem 2.4. (K, M ) definite

K3 definite

Proof. If (K, M ) is definite, then K ( a ) is definite for some a , and in particular K3 must be definite.

If, on t h e other hand, K3 is definite (let us assume it is positive definite) take a > I min A(Kl )I (so Kl+aI is positive definite). The block LDLT-decomposition of K ( a ) is given by K ( a ) = LDLT, where D has the diagonal blocks Kl+aI and K ~ - K ~ ( K ~ + c x I ) Since - ' K ~ .K ( a ) and D a r e congruent we can, by taking a large enough, make t h e Dzz-block positive definite (the Dl,-block is already positive definite). H One special case is t h e one t r e a t e d in t h e following corollary

Corollary2.5. If (K, M ) is nonsingular and if K is positive semidefinite then definite.

El

(K,M ) is

Proof. It is obviously t r u e if K3 is definite. Assume K3 is positive semidefinite and singular. We now prove:

has full column r a n k and K3 is singular 3 K has negative

a n d positive eigenvalues. (So if K is positive semidefinite K3 must be positive definite.) Take

z# 0

such

that

K3x = O ,

and

let

z T =(xTKl,uzT),

then

A Generalised Eigenvalue Problem and the Lanczos Algorithm

99

z T K z = z T K ~ K l K z z + 2 ~K1 Ip l12. Since K2z f 0 (we would have rank deficiency otherwise), z ' K z can take any value with a suitable choice of 0 .

=

All the above results hold also when M is nondiagonal ( M is still real, symmetric and positive semidefinite). To generalise the theorems we use the matrix N E RXnU''((iU) introduced before (see section 1.1) and reverse the transformation in lemma 2 . 1 . Our two theorems then become:

( K , M ) singular H KN does not have full column rank ( K , M ) definite H NTKN definite.

(2.6 a ) (2.6 b)

In the coming sections we will not look a t the case when ( K , M) is singular. One way to avoid the singular case is t o assume t h a t K is nonsingular. Another advantage, t h a t results from the assumption, is t h a t we c a n study the transformed problem Cz = u z , where C = K I M , and u = A-'. To justify t h a t assumption we need, however, to show t h a t K being nonsingular does not r e s t r i c t our problem t o o much, a n d t h a t the theorems holding for a nonsingular K also a r e t r u e for a singular K and nonsingular ( K ,M). This is done in the next two paragraphs. In the next chapter we will construct the Jordan normal form, CS = SJ of the matrix C = K ' M . I t will be shown t h a t J = diag (A-l, B), where A is diagonal and non-

I"

singular, a n d B is a block diagonal matrix having and 0 blocks on the diagonal. S ll c a n be partitioned a s S = ( X , R ) , where XTMX = I , so t h a t K x = MXA. For the B-block we have KRB = MR. Now, assume t h a t ( K ,M ) is nonsingular and t h a t K is singular. Then there exists a such t h a t K+aM is nonsingular. From the Jordan normal form of (K+aM)-'M, (K+aM)-'MS = S J , we get M: = MX(A-aZ) and KRB = MR (since B(Z-aB)-' = B). S o t h e only change we can get, when allowing a singular K , is t h a t ( K , M) can have one o r more zero eigenvalues. As we have seen, NTKN is a n important quantity and since M has not changed (when forming K ( a ) ) ,N T K ( a ) N = NTKN. Putting these things (and those in the next section) together, we get the following picture:

T. Ericsson

100

KN

/\

r a n k deficient

singular

full r a n k

\

( K , M ) is nonsingular

I

N~KN

We will not analyse this branch f u r t h e r

/\

definite

I

not definite

\

(K,M ) is n o t definite

( K ?M ) is definite

I

NTKN

/\

singular

nonsingular

( K , M ) is not

(K,M )

is defective

defective

2.1. The Jordan NormalFormof K I M . In the sections t h a t remain we will assume t h a t K is nonsingular and t h a t M is positive semidefinite and singular. So our problem Kz = AMx can be written a s Cz = u z , where v = A-', C = K ' M . We know t h a t M is singular, s o some u-eigenvalues must be zero (corresponding t o infinite A-eigenvalues). In this section we will cons t r u c t t h e Jordan normal form of C, CS = SJ, where S is nonsingular and J is block diagonal. To s t a r t with we assume t h a t M = diug ( I , 0 ) . Suppose now t h a t M=diag ( I , O), K-’ = H =

Ei 2

then C = K - ' M =

E:

:],where

H , has order T = r a n k (M), and we c a n s t a t e our-main theorem in this seciion. ' Theorem2.7. C has a zero eigenvalue of algebraic multiplicity n u l l ( M ) + n u l l (H,) and The order of the Jordan blocks, corresponding to the geometric multiplicity n u l l (M). defective eigenvalues, is two. The remaining m = n - ( n u l L ( H l ) + n u l L(M))nonzero eigenvalues a r e nondefective, and we c a n find diagonal A E RmXm, a n d X E IRnXm, such t h a t K x = MXA and XTMX = I . Proof. The eigenvalues of C a r e Ak(Hl) and n u l l ( M ) zeroes (looking a t the diagonal blocks C, Czz). Suppose t h a t H 1 has the spectral factorisation

HI(@, QN) = (QR,QN) d i a g

0)

101

A Generalised Eigenvalue Problem and the Lanczos Algorithm

where (QR,QN) is orthogonal, a n d A = diug(hl,. . . , As), for some s = rank(H1),and where t h e hi # 0. Then CX = XA-’, where

PI

*= H z Q ~ A

Obviously XTMX = I,and lix = MXA. This t a k e s c a r e of the nonzero eigenvalues of C . The zero eigenvalues (corresponding to t h e Czz-block), have eigenvectors

k],

where

U is any nonsingular matrix of o r d e r null ( M ) . If H , is nonsingular this would be it. But if H1 is singular, we c a n n o t find eigenvec-

t o r s corresponding t o these e x t r a zero eigenvalues, since if u = I

implies H , u , = 0, H z v l = 0. which gives u 1 = 0 (otherwise H

\

l“bl

tI:

, t h e n CV = O

v z m u s t be zero too, since we have used all these vectors (for U-above.)

Another way to s e e the same thing, is by looking a t t h e principal vectors in t h e Jordan of g r a d e o n e , corresponding to the zero eigenn o r m a l form. The values, a r e given QN

n ( C z ) - n ( C )= T ( [ o

1).

The principal vectors of grade two, a r e given by There a r e no principalvectors of higher grade, since

I t follows t h a t t h e Jordan normal form, C S = S J , is given by S =( X , N ) and J = diug(A- ,O) in the nondefective c a s e , a n d ( X , R ) (for some R) and J = diug (A-l,B) where B = diug ( B , , . .

,

,

B,, 0), B1 =

, in the defective case.

Let us now look a t a problem where M # diug ( I , 0) By reversing t h e transformation, of t h e g e n e r a l eigenvalue problem, we g e t the corresponding H,= R R T K I R R , a n d so null ( R T K ’ R ) is t h e interesting quantity. We c a n use the Iollowing lemma to c h a r a c t e r i s e null ( H , ) in t e r m s of K ,r a t h e r t h a n K .

Lemma 2.8. Let Q = lar, then

(al,Qz) be a n orthogonal matrix, and assume t h a t A is nonsingunull (QTA-’ Q1)= null (Q\$4Qz)

Proof.

have t h e same nullity.

W e c a n now reformulate our t h e o r e m :

T. Ericsson

102

Corollary2.9. Given Kz = AMz ( M not necessarily equal to d i a g ( I , O ) ) , there a r e + null ( N T K N ) infinite eigenvalues (zero A - l ) of which null ( N T K N ) a r e defective. The remaining eigenvalues a r e finite, nonzero, and nondefective, and we can find h a n d X s u c h t h a t KX = MXh, XTMX = I .

null ( M )

Example. Let K l , K2, and M , E IR"’",

M , positive definite, Kl symmetric, t h e n with

all the eigenvalues a r e infinite (and located in 2-by-2 blocks). In this c a s e

,

#

Let M = d i u g ( l , 1,0,0)

and K =

1 0 0 0 0 1 1 0

0 0 1 1

That there a r e null (M)+ nu11 ( N T K N ) infinite eigenvalues, of which null ( N T K N ) lack eigenvectors, should be interpreted in t e r m s of zero eigenvalues of C. We can t h e n avoid a discussion of eigenvalues in plus a n d minus infinity, as in the following example.

K = diag (1, 1, - l ) , M = diag(Z,O,0 ) . Here c = diug (2, 0,O) I

the

eigenvalues

are

5, + m ,

-m,

but

2.2. Another Word on Definite Pencils. We have already seen t h a t ( K , M ) nondefective does not imply t h a t the pencil is definite, i.e. a linear combination a K t g M , a2+P2 = 1, is positive definite. Let us now consider the equivalent pencil

(R,a)= ([email protected]+aM, aK+BM).

( R ,a)is a positive definite pencil, then RS 3 = S diag((aA+pI)-X, (aNTKN)-B), [email protected] =I A = diug ((aA+PI)-l([email protected]),-(@/ a ) I ) . Theorem 2.10. If

Proof.

=I

@ ~ Xwhere ,

A Generalised Eigerivalue Problem and the Lanczos Algorithm

103

f2S = ( - p K + a M ) S = -pKS+aKSJ = K S ( - p I + a J )

[email protected]= ( a K + p M ) S = aKS+pKSJ = K S ( a I + p J ) Since

(R,8)is

definite, aI+pJ must be nonsingular, and hence

h = (aI+pJ)-'(-pI+aJ) = diu ((aA+pI)-'(aI-pA), - ( p / a ) I ) .

Q

RS

=

where

diug_(aA+pI,a N K N ) , so with D = diug ((aA+pI)-%,(aNTKN)-H), 3 = SD, Now [email protected]_= [email protected]= I , AD = DA. which proves the theorem. I

2-3.Projections.

Since we will r e f e r t o T ( X ) quite often we introduce t h e following notation:

x=T(X)

If (K,M ) is nondefective it will later be shown t h a t X = ' f ( ( K - p M ) - ' M ) . The null s p a c e of (K-pM)-'M is denoted N. Obviously N = n(M).N a n d X a r e orthogonal with r e s p e c t t o K (if K is indefinite we mean t h a t z T K y = 0 for all x ~ X a n yd EN.)This follows f r o m K X = MXA and y T M = 0 if y EN.Also X @ N = iR".This is easily s e e n f r o m the proof of theorem 2.7 or by the fact that if S = (X,N) t h e n

sT= (m(xTm)-l, KN(N~KN)-’).

We also introduce t h e oblique projections

Px = X(XTKX)-'XTK = X X T M pN = N ( N ~ K N ) - ' N ~ K We have Px(Xu+Nb) = X a ,PN(Xa+Nb)= Nb. Also Px+PN = I (since ( P x ,P N )= (X, N)S-' = I ) , P\$ = Px? P\$ = PN,a n d P\$KPN = 0

Example. In the following examples it c a n be s e e n how Xis made u p of components in T (M) a n d n (M). Note that eigenvectors , in general, s h o u l d have components In n (M). Note also t h a t X a n d N a r e orthogonal t o e a c h other with r e s p e c t to K and n o t with r e s p e c t to I. 1)

Let K = I a n d 2M = diug (1, 0 ) , t h e n A = 1 and X = s p u n ( (1, O)T).

t h e n h = 1.5 a n d X = s p a n ( ( 1 ,-)\$)T) 3)

t h e n h = 1.5andX=spun((l,-)\$,0)T).

If (K, M ) is defective t h e n we will show t h a t X = T((K-pM)-*M(K-uM)-'M). I t is no longer t r u e t h a t [email protected] n(u)= En (it is easily seen from t h e proof of t h e o r e m 2.7, we

T. Ericsson

104

miss

the

principal

vectors

of

2).

If

we,

however,

define

N = ~ ( ( K - , U M ) - ~ M ( K - C F M ) - ' Mwe ) , have [email protected] N = En.We c a n no longer define PN using

t h e formula above (since NTKN is singular), but t h e definition for Px still holds a n d s o PN = I - p x .

3. Some Algorithms. In this section we will look a t some standard algorithms (inverse iteration, the power method, a n d a Lanczos algorithm) for solving the generalised eigenvalue problem. We will take t h e algorithms for a positive definite M , a n d then s e e how they behave when M is singular. As we will s e e , the major problem will be to keep vectors in the right subspace X.

3.1. Inverse Iteration. The algorithmis given by (see  page 317).

In one s t e p we have 2 = (K-pM)-'& (we assume t h a t K-pM is nonsingular). The algorithm is self correcting (given y 1B x then yk E x f o r k > l o r 2) as shown in the following the0 rem.

Theorem 3.1. T ( ( K - p M ) - ' M ) = X if ( K , M ) is nondefective.

r((K-pulW)-’M(K-~hf)-’M) = X if ( K , M ) is defective Proof. Let C = K I M a n d let C = SJS-' be the Jordan normal f o r m of C .

(K-pH)-'M = (I-pC)-'C = S(I-pJ)-’JS-’ = S diap ((A-pZ)-', 0)S-l. In tne defective c a s e we g e t in t h e same way: (K-~M)-~M(K-UM)-~M = S ( Z - ~ J ) - ~ J ( Z - ~ J ) - ~=J - ~ sdiag ( ( ( A - ~ z ) ( A - ~ I ) ) - ~0 , )s-1 since, with B =

i"

, (I-~B)-’B(I-UB)-~B = 0.

W

This implies t h a t if y, has a nonzero component inX, after at most two iterations, inverse iteration will work a s the standard algorithm (when M is positive definite).

A Generalised Eigenvalue Problem and the Lanczos Algoritlim

105

3.2. The Power Iteration. The algorithm is given by (see  page 317). In this section we will assume t h a t t h e problem is nondefective. Given y p1 do For k = 1 , 2 , ... Mz = f K - p k M ) y k Yk+l = = / I / z 11 choose & + I In one step we have M z = ( K - p M ) y say. Now, if M is singular this equation need n o t have a solution. In fact, using theorem 3.1 (assuming K - p M is nonsingular), we see t h a t regardless of what the possible solution z is, t h e known vector y must lie in X. If we have access to ( K - p M ) - ’ M we c a n produce a vector in the right space (X). But if may use the inverted operator we may switch to inverse iteration anyhow. If we do not have access to t h e inverted operator we may consider a n iteration having t h e equation z = A ( K - p M ) y a s its main step. One may be tempted to t r y to use t h e pseudo-inverse, A = M+,of M (if M is diagonal, for example). That choice will n o t work in general, since M + ( K - p M ) y E T ( M ) , a n d we do n o t g e t t h e required components in N. The right choice is to take A = XX* since

X X ~ ( K - ~ M )=YX ( A X ~ M - I . L X ~ M ) Y= X ( A - ~ I ) X ~ M Y a n d any y c a n be written Sin, for some a.This gives

X T M S a = X T M ( X , N ) n = ( I ,0 ) n a n d we would pick o u t the right p a r t of y The drawback with this approach is, of course, t h a t we do n o t know X,a n d t h e r e is no way we c a n find X f r o m M alone. We will r e t u r n to t h e matrix X X T in section 4, but it should be mentioned t h a t i t is a generalised inverse (though not, in general, t h e pseudo-inverse) of M .

3.3. The Lanczos Algorithm We will, in this paragraph, study the Lanczos algorithm for t h e generalised eigenvalue problem (see  for more details). We will, however, s t a r t to look a t the algorithm for a shifted and inverted standard problem, i.e given A r e a l and symmetric ( A - p 1 ) - ’ ~= ( A - ~ ) - ’ z AZ = AZ

assuming t h a t t h e inverse exists. The Lanczos algorithm for this problem c a n be written:

T. Er icsson

106

Given v l , urvl = 1 , compute

For i = 1 t o p do u = (A-pI)-bi if( i > 1) then u = u -/3i-lv,-l

ai = v r u

u = u-a,v, Pi

=l l ~ l l

if ( pi = 0) then va+1 = 0 stop else

endif

= u/Bi

The algorithm produces, in exact arithmetic,

V = ( v l , . . . , u p ) , with V T V = I and a tridiagonal matrix T , 'a1

81

81

a2

T=

.

> '

. . . .

. 4

. ap-1 a,-1

a,-1 ap

,

such t h a t

( A - ~ J ) - ' v = VT+ppvp+lepT If ( s , v) is an eigenpair of T , i.e. Ts = v s ,then I I ( A - p u l ) - l V s - v V ~ I / = lt3,~,

1

Suppose t h a t M is positive definite, then the problem Kx = h M x is equivalent to A2 = h z , where z = \$z, and A = M-5KM-g. Let us now reverse this transformation in the Lanczos algorithm above by introducing the new vectors 4 + u := M%L and This gives us the following algorithm in M-inner product. ui:=

A Generalised Eigenvalue Problem and the Lanczos Algorithm

107

Given v 1, v ?Mu = I,compute For i = 1 to p do 4 = (K-pUM)-lMVi if( i > 1) then 4 = di-&-lvt-, ai = v,'Mdi d, = d,-aivi Bi

= 114 I I M

i f ( Bi = 0) then stop else vi+l =

endif

di/@i

Again we get tridiagonal T and V such t h a t

VTMV = I (K-pM)-'WV =

If ( s , u ) is a n eigenpairs of T , then II(K-pM)-'MVs-vVs

vT+% e,'

llM

=

Isp

I lid, l l d d

(3)

We will now use the above algorithm when M is singular Formulas (1) and (2), will still hold, but (3) will not give us the whole t r u t h (since M does not define a norm over the whole space). I t should also be noted t h a t a, = 0 is possible even if dp f 0 (if dp EN).If this is the case, up+' is not defined, b u t % is. (The only reason we have a n index on d, is to be able to refer to the vector. In a implementation we would use the same vector, a s shown in the Lanczos algorithm for A , ) If I g p > 0 then % = v p + l &., We will use T and V in the coming sections without specific reference t o p If we must refer to submatrices of T or V we use Ti to denote the leading principal submatrix, of order j , of T ,and to denote the first j columns of V (so in particular T = Tp and v = VP). Sections 3.3.1-3.3.4.2 deal with the nondefective case. The defective case is t r e a t e d in section 3.3.5.

3.3.1. Contamination of Vectors. We s t a r t this section by an example.

Example. Let K = d i a g ( h l , . , , l), M = d i a g ( 1 , . . . , l , O ) and take v T = ( l , O , . . . , O , E ) . This gives al = (Al-p)-l, p, = 0, and d , = -(A1--p)-’&en. This means t h a t the eigenvalue is e x a c t but the Ritz vector Vs = v l can be arbitrarily bad depending on E . The Ritz vector c a n be refined, and in this case V s + a ; ' d l = e l is the exact eigenvector. The cause of this problem is t h a t v i X,i.e. in this case PNV,= t e n . We will examine the cause of this problem and a try to find a c u r e f o r it in the following sections.

T. Ericsson

108

We will s t a r t by examining how components n o t being in X may affect t h e algorithm in t h e nondefective case. A s was proved f o r inverse iteration (K -pM )-’%, f o r any y ,will always produce a vector in X, hence

Corollary 3.2. If v EX t h e n all t h e vi and any s a n d y .

4

lie in X, so t h a t Vs a n d Vs + y d , lie in X,for

Proof. Using induction and the f a c t t h a t dk =

((K-/LM)-lMvk--(XkV1,-pk-,vk-l) gives theproof. W We will now look a t how a i , pz, vi, a n d \$ change when we use a staring vector

v,@.xx.

have been produced by the Theorem33 Suppose t h a t a i , p i t v i i = 1 , . . . , p , and algorithm using v,EX. Let G i , & , Gi i = 1 , . . . , p , and%= denote the corresponding quantities produced by the algorithm, using t h e starting vector cl = v l + z , where z

EN,z

# 0.

Then

3E. = aI. ipi = pi , 8,= v i + y i z , i = I , . . . , p

ap = 4 + 6 , z

where 7 . = -(a. r-,Ti-,+Bi-27i-2)/Bi-l

6,

If

I

2

sP

8

= -(“pYp+Bp-IYp-l) pk f

70

= 0,

71 = 1

0, then 6 k = 7 k + l p k .

p k a r e formed by a&tf(K-pM)-’Mak and (blmk)# f o r some vectors a, Since Mu = 0 for any u E N,ak and pk a r e n o t affected by components in N. To prove the recursion use:

Proof. ak and and

bk.

= (K-/LLM)-’Kk which gives

6k

=

We leave it to r e a d e r to fill in the details.

-cxkYk-pk7k-l.

To s e e more clearly how Tk.

* -akGk-pk-IVk-,

6k

behaves, we express bA

In

terms of the eigenpairs of

A Generalised Eigenvalue Problem and t h e Lanczos Algorithm

109

Theorem35 L e t Tk have the spectral decomposition TkSi =

ViSi

, i = 1,. . . , k

then

Proof. Let

, page 129#gives sliskip'(vt)

and sincep'(vi) =

= @1'...'@k-l > 0 , 1 i

k,

k

(ui-vj) we g e t 3=1 1 #i

6k = P (o)(sliskiP'(ui))-l which proves the theorem. W This expression for bk holds for all i < k , s o let us pick a i such t h a t vi has settled down (converged) so IskiI is small. Let us also assume t h a t vi is well separated (the component of t h e Ritz vector from the other eigenvalues and t h a t s l i = ( VS,)~MV, in vl) is n o t particularly small. (All these assumptions a r e quite reasonable in a practical setting.) Then bk = s g l T k for some -rk, where it follows from the assumptions t h a t 7 k behaves reasonably, and s o I b k I can become quite large. Let us now look a t how the contamination affects a Ritz vector y^ = v s .

Theorem 3.6. Let g

= (rl,. . . ,y p ) then Tg = -bpep fj=v s = v s + z g T s = y - 6 p s p v - 1 z ,

(a) (VfO)

(b)

Proof. We know t h a t P = V + Z ? ~(the e r r o r is always in the same direction). Using this a n d (K-pM)-'MV = v T + a , ep we get (K-pM)-IMV = VT+\$ eT+z ( g T T + 6 pepT proving (a). (b) follows directly from (a). H

ck.

From (b) we see t h a t the Ritz vector y^ is not so badly affected by z a s the In a r e less affected t h a n fact, good Ritz vectors (smaller Isp I and usually larger 1.1) bad ones. In the next section we will show t h a t we can get rid of t h e contamination of the Ritz vectors altogether by adding a suitable multiple of the next Lanczos vector.

3.3.2. Correcting the Ritz Vector. We will now look a t a n alternative to v s . Is there any linear combination Pa or \$’a++,, t h a t does not have the unwanted z-component?

T. Ericsson

110

Theorem3.7. The vector

g = vs +spv-12p , v

# 0

has no component along z , and gives a residual (K-wM)S E T (M)for any w. Forming this vector is equivalent to taking one s t e p of inverse iteration

g = ~-'(K-puM)-l&Proof. Using ( K - p M ) - ' M v = v T + \$

Ts = u s , gives

, :e

(K-~M)-IMVS =

For the residual (K-pM)g = v - ' M V s so ( K - u M ) ~= ( K - , u M ) ~ + ( ~ - c J ) W = M(u-'

VY

+sP

dp = ~g E X

VS + ( p - - ~ ) g )

I

(The vector was introduced in [Z] b u t for a different purpose.) When computing &? in a program t h e r e a r e better ways of doing i t t h a n using t h e formula above. One

reason is t h a t s t a n d a r d programs (like t h e ones in EISPACK) may produce a sp t h a t is completely wrong when Isp I is small. See [ l ] for more details. It should be noted t h a t g is not t h e Rayleigh Ritz approximation to (K-pM, M ) from span(V,\$ ) o r s p a n ( y , \$) (for more details about these topics see [l]). Nor is PxG = g~ in general (T E R).Using (K-pM)-'&- = V \$ + S ~ I ? ~= ug w e have

pXg = V X ( A - , U I ) X ~ ~

v-’ap

However, given = 9 s !\$ X and a,, g = \$+sp is the only (up to a scalar factor) linear combination of y^ a n d a, t h a t lies in X. W e may also note t h a t it is impossible to find linearly independent t,, . . . ,tp such that

P&

E

x,

i = 1,. . . , p

since this implies t h a t gTti is zero for all ti,which is impossible unless g = 0. The reason i t works with ps +sp is because we take a linear combination of p + 1 vectors and get p vectors in X. It would, of course, not be possible t o g e t p + 1 vectors in X by using V,,,. We have unfortunately not found any practical way t o refine the G k . One would like to hit t h e Lanczos vectors by Px = I-PN = I-N(NTKN)-’NTK, but the expense is usually t o high.

.-'ap,

We end this section with a look a t t h e case when T is singular. Since T is unreduced the eigenvalues a r e distinct and s o if T is singular u = 0 is a simple eigenvalue. Due to interlacing and Gcl cannot both be singular. Assume Tk is singular, then from lemma 3.4 it follows t h a t 61, = 0 ( s o 2, EX) and Tg = 0 (from 3.6 a) s o g is a n eigenvector corresponding to v1 = 0 say. Hence psi = Vsi+zgTsi = Vsi E X provided i > 1 and t h e r e is no need to c o r r e c t these Ritz vectors. The Ritz vector vsl does not lie i n X ( b u t it is of little practical interest). Singular T c a n , of course, be produced by the Lanczos a1 orithm. One example is given by K = d i a g (A, -A, I ) , M = diag ( I , I , 0), y = 0 , a n d v f = (aT,a T ,bT), Then all ak = 0 which implies t h a t all Ti of odd order will be singular. (Proof: Let A be a square matrix satisfying uij = 0 if i + jis even. Let C = diag(1,-1, 1 , - 1 , ...) then C A E = -A s o if A z = X z then A ( & ) = -X(Cz), and in o u r tridiagonal case this implies t h a t eigenvalues occur in +- pairs.) For a numerical example see section 3.3.4.

G

111

A Generalised Eigenvalue Problem and the Lanczos Algorithm

3.3.3. The Nonsingular Case. In this section we will t r e a t t h e case when M is nonsingular but ill conditioned. We will limit the study to one simple problem, and we s t a r t by listing some of the properties of the problem. Let

W e a s s u m e t h a t I I K l j l I 1 , 11q11 = 1 , / u I s l , a n d t h a t O < ~ < < l . I t is not difficult to sh?w t h a t (K, M) has o n e large eigenvalue FY u / E (with corresponding eigenvector 2 , say) and n-1 eigenvalues in [ - 2 , 2 ] . If we partition ZT = ( z : , Z N ) , Z N E IR (the notation ZR, Z N should remind about range and null space in the singular case) we can show t h a t

x

/ Z N / // I z R I I

Id/&

This means t h a t z^ is very rich in the eigenvector of M corresponding to X ( M ) = E . This is not true f o r eigenvectors corresponding to the small eigenvalues of ( K , A!?). Here / z N / / lISR//

1/(1-2&)

As will be s e e n in the example in the next section T (G)will function a s N did when was singular. When M was singular we had P N = N ( N ~ K N ) - ~ N ~and K / y k I = / I PNGk I I I I PNG, I I - I . Now when M is nonsingular we define the projection

M

P;. = z^(ZT=)-'STK = z ^ s T M , STM? = 1 a n d provided z^Th?vl# 0

Iz^Tm

1% I = I I P Z V k II IIPP, 11-l = I lZTh,l-' ( ~ i g n ( 7 is~ defined ) below). Letc = VT&, so w i t h g T = ( y l , . . . , y p ) then 1g I = / c I lz^TM-u,I-l,and I I C / I = 1 1 V T M q c 1 1 VTk4lI IIk4z^II = 1

/ / g11 is not bounded in the same way since 119 I / I I z ^ T ~ I-', which can be made arbitrarily large by taking zi almost MI-orthogonalt o z^ Unlike the singular case where the norm of the contamination I I PNck 11 = Iyk I 11 11 1s not bounded ( 1 1 z 1 1 is arbitrary), it is bounded in the nonsingular case.

I]

liP;.Vll =

I I ~ c ~ I5I

11Z11 % (min h(M))-#

1, V c a n contain large numbers. This follows from the f a c t Even though / j c t h a t VTMV = I , and s o llzik / / can be a s large as (min X(M))-#. If llvk 1 1 is large then the vector must have a large component along z^. The above conclusions can be eneralised to the case when M has more t h a n one small eigenvalue (if lix = MXA. XM ' X = I then t h e r e is a t least one xk such t h a t jlzk I / 2 (n min x ( A ! ? ) ) - ~ ) .

Another way t o see the similarity between the two cases is by using (K-pM)-'MV = VT+\$ e,' which gives (multiply by z^'M) (T-(X-p)-lI)c = -STAT\$ ep Taking c = gZTh4u (defining sign(-yk)) (T-(X-p)-'I)g = - Z T ~ a i (ZTMuI)-’ep , which should be compared with t h e singular case (lemma 3.6 a).

T. Ericsson

112

Multiplying (K-pM)-'MV = V T + d , e J by 2^TMIvT& I-' and s we get (with y = V s ) a n expression for the relative growth of 2 in the Ritz vectors

IIPfY/ I 11%~111-1= I Y T E l / v T W - ' = I ~ p + 1 B p S p llu-(x-P)-ll-lM l~p+\$pspll~l-l where M hoIds for extreme u. This result can be compared with theorem 3.6 (b). When using f,?' instead of y the component along 2^ will decrease (but not be deleted as when M is singular). Using 3C^TM(K-pM)-'y = [email protected] we get (provided ZTMy # 0) 77 =

l l w l l ( i l e Y il ilf,?'IlMr1 = l ~ T ~ l ( Il l l f~ , ? 'TI l M~ r '

=(I4R-d

llf,?'llM)-'

IA-Pl IX-PI-'

where the M holds if u has converged (so h M p+u-l for some A). For our special example we would thus get 7 M E IX-pI ] CTI-', quite a substantial reduction. We would also like t o point out t h a t similar phenomena c a n occur when using standard Lanczos on the standard eigenvalue problem Ax = Ax, since we can transform the generalised problem if M is positive definite. An example is presented in the next section.

3.3.4. Two Examples.

In this section we present two examples. In the first Y is singular and in the second Y is nonsingular and moderately ill conditioned. The examples a r e completely artificial and have been included only to give the reader some idea of how the Lanczos algorithm may behave. In both examples n = 30 and

K = d i a g ( 5 - ' , 6-', and the starting vector was

...,

32-',

[i

I1

),

v f = (1, 1 , . , , , 1, - 0 . 5 + [ )

M = d i a g ( I Z z eE,) T

(where T had been chosen so t h a t vTMu = 1.) So (K, M ) has eigenpairs ((4+i)-', ei), i = 1 , . . . , 28. For the remaining ones we have to look a t the subproblem:

The results below were computed on a VAX in double precision (relative machine precision M 1.4.10-17) using a Lanczos algorithm with full reorthogonalisation implemented in Matlab . We r a n the algorithm for 15 steps, i.c. p = 15. The tridiagonal matrices satisfied: 5.10-'< lak I < 2 and 1.6,10-2 Ifi2 d 0.36. M 1.6.10-'.

3.3.4.1. Singular M Here E = 0 s o the subproblem (*) has eigenpairs (g, (1, -5)') and ( m , (0, l ) T ) . The starting vector had equal components of all eigenvectors corresponding t o finite eigenvalues. The contamination, z = [ T e 3 0 . Since we know t h a t P p k = ykz the size of 6 is not important if we just wish to find yk. (In practice we would like to have a

small ] [ I , of course, i.e. we would like t o have a w 1 t h a t lies in X.As we have seen (theorem 3.1) this can be accomplished by hitting any vector by (K-pM)-'M. Using

A Generaked Eigenvalue Problem and the Lanczos Algorithm

113

this technique on a computer would give a [ depending on the relative machine precision and t h e methods involved in forming (K-pM)M-'r, f o r some T .) Below a r e listed the eigenvalues, v k , of T and the absolute values of the bottom elements. I s p k 1 , of the corresponding eigenvectors (k , t h e index, equals the number of the row in the table). The third column contains the absolute values of the growth f a c t o r s (the 7k). The last column contains 1 1 P N I I ~ 1 1 P N~. ^ ~I I i.e. the relative growth of the contamination in the Ritz vectors. (All t h e values have been put in one table, although they should not necessarily be compared row-wise. A yk is not related to a n y specific u i , for example.)

-',

uk

3.1533d-02 3.4399d-02 3.9433d-02 4.6556d-02 5.5515d-02 6.5894d-02 7.71 1 Id-02 8.8464d-02 9.9504d-02 1.1108d-01 1.2500d-01 1.4286d-01 1.6667d-01 2.0000d-01 2.0000d+00

iSpk

I

7.5432d-02 1.8132d-01 2.7978d-01 3.6344d-01 4.2417d-01 4.5223d-01 4.3532d-01 3.5642d-01 2.1 15 Id-0 1 7.2107d-02 1.2412d-02 1.0479d-03 3.8827d-05 4.4893d-07 2.2286d-23

IYk

1 1 PNGk I I

I

1.0000d+00 3.8661d-01 1.7416d+00 3.8896d+00 8.8019d+00 2.0639d+01 5.0314d+01 1.2781d+02 3.3887d+O2 9.3915d+02 2.7241d+03 8.2803d+03 2.6409d+04 8.8501 d+04 3.1213d+05 1.1606d+06

~

~

~

N

4.4725d+04 9.8550d+04 1.3265d+05 1.4596d+05 1.4286d+05 1.2831d+05 1.0555d+05 7.5329d+04 3.9743d+04 1.2137d+04 1.8566d+03 1.3714d+02 4.3556d+00 4.1967d-02 2.0834d-19

6

1

~

~

In theory PNg = 0 but in this case it is of the order of lo-'' - la-" but t h a t is not of any r e a l significance since the problem is so trivial. Looking a t columns t h r e e and four it c a n be seen how the Lanczos vectors and t h e Ritz vectors a r e affected (theorems 3.5 and 3.6). In particular, it c a n be seen t h a t t h e good Ritz vectors a r e not affected very much.

3.3.4.2. Nonsingular M. Here one change was made, now changed to M fii

E

=

The eigenpairs of the subproblem (*) have

(4.999875.10-', (9.999875.10-', -5.000062.10-1)T) a n d (2.00 0 0 50.10 4 , (5.00 0 0 6 2.10-3,9.99 9 8 75.10 ') T,

(Which c a n be compared with the theory in section 3.3.3: U E - ' = 2.104 and /zNlj l ~ R 1 1 - 1M 2.104.) The [ in v1 does matter in this case (since it is measured by the M-norm) and we took = 0.4. The table below consists of v k a n d ispk 1 a s before. The third c o ~ u m n contains the growth factors as defined in section 3.3.3. The next two columns contain the relative growths of 5 in the Ritz vectors and the n o r r n a l i s e d g-vectors, i.e. using t h e notation in section 3.3.3 the columns contain 1Ipzyk 1 1 llppl 11-l and /)PzgkI/I/Ppll/-l(wherevhasbeennormalisedsothat ]Iglln= 1).

T. Ericsson

114

vk

5.0880d-05 3.1753d-02 3.5610d-02 4.2256d-02 5.1327d-02 6.2274d-02 7.4399d-02 8.6874d-02 9.8957d-02 1.1 101d-01 1.2500d-01 1.4286d-01 1.6667d-0 1 2.0000d-0 1 2.0001d+00

bpk

I

1.291ld-02 9.5617d-02 2.1961d-01 3.2745d-0 1 4.0956d-01 4.5697d-0 1 4.5933d-01 4.0 196d-0 1 2.7082d-0 1 1.1076d-01 2.2542d-02 2.1628d-03 8.8950d-05 1.1252d-06 7.5966d-23

I7k

I

1.0000d+00 3.8646d-0 1 1.7402d+00 3.8842d+00 8.7839d+00 2.0580d+01 5.0098d+01 1.2652d+02 3.2382d+02 7.2302d+02 9.1762d+02 5.3057d+02 1.9481d+02 6.2529d+01 1.8886d+0 1 5.4101d+00

I I p2Yk 1 1 I I P P 1 II

1.3463d+03 2.7726d-01 5.6772d-01 7.1321 d-0 1 7.3425d-01 6.7511d-01 5.6793d-01 4.2558d-01 2.5 171d-01 9.1768d-02 1.6585d-02 1.3922d-03 4.9077d-05 5.1731d-07 3.49 17d-24

I I PSVk I I 1 I p2v 11I

2.9890d+02 4.3600d-04 7.9278d-04 8.3667d-04 7.0076d-04 5.3787d-04 3.7959d-04 2.4418d-04 1.2704d-04 4.1327d-05 6.6339d-0 6 4.8726d-07 1.4723d-08 1.2932d-10 8.7288d-29

Comparing the two examples we see t h a t t h e 7k a r e roughly the same for small values of k . For larger values t h e boundedness forces t h e yk t o decrease in the nonsingular case. If we had t a k e n a smaller ( 1 the values would have followed e a c h other for even larger values of k . Looking a t t h e two last columns one c a n s e e how much b e t t e r fJ is. From section 3.3.3 it follows t h a t t h e quotient, between values in the two columns, is roughly 5.10-5v-’ (except for the first row). (To g e t t h e absolute values of t h e growths (instead of the relative values) use IIPpIjI M 7.4279.10-2 or Iz^TiCLIUl I fil 7.4280.10-4.)

3.3.5. The Defective Case. In this section we will briefly look a t the more complicated situation when ( K , M ) is defective. To make things easier to follow we assume t h a t M = &Lag ( I , 0) and so with o u r previous notation

i 1

= H zQR QRh

We may have unwanted components not only in n (M) b u t also in T (M). The following t h e o r e m describes how these components may grow in t h e v k .The --notation used is t h e s a m e a s in t h e o r e m 3 . 3 . To make the notation easier we have assumed t h a t BP # 0.

A Generalised Eigenvalue Problem and the Lanczos Algorithm

115

and

Since 6 k / tion, instead

E~

i s not constant, the contamination does not stay in the same direc-

pNv^k

=

[ ]

Yk Q N ~ wk

where wk E span ( H z Q N a , b ). From theorem 3.3 we see t h a t the recurrences f o r Y k and &k coincide with the previous r e c u r r e n c e for Yk (in theorem 3.3). SO withg = (yl,...., Tg = -S3p7p+leq. Theorem 3.8 now gives ~ k - l b k - I + \$ k b k + B k b k + l = Yk O r with d = (bl,...., 6 p ) , Fd = g -flp6p+lep Using this we could compute P N ~ s . As before the starting vector should be chosen inXwhich can be accomplished by hitting any vector twice by (K-pM)-'M. To refine the Ritz vectors it is n o t enough to a d d a multiple of w , + ~ (that will only ensure t h a t the refined vector lies in T ( ( K - p M ) - ' M ) . If we, however, r u n the Lanczos algorithmp + 1 steps and compute the Ritz vector, y = V p s ,from Vp then

(K-PM)-~H\$ = ~ j j + s p \$ G p + l

(K-pM)-1MVp

=

vp

c1

A T T p +BP +]UP +1ep + I

(1)

(2)

T. Ericsson

116

so ( ( K - p d f ) - ' ~ V ) ~ y=^~ ( K - p B ) - ~ i @ + s ~(& K - ~ J & ) - ' M V ~ +which ~, c a n be computed from (1) and ( 2 ) .

4. A Choice of Norm. When M is positive definite, we have several s t a n d a r d bounds in t h e M-'-norm, e.g. f o r arbitrary y # 0 and p , t h e r e is a n eigenvalue A (where Kx = A M z ) such t h a t

I A - p I l l m l l ~ -II([email protected])y l~ IlM-1 (see 141 Ch. 15). In our c a s e , when M is singular, this does not hold, of course. It is possible to find relevant norms, I I I I say, also in this c a s e , but t h a t is t h e subject of a coming report. We will, however, quote (without proofs) some of the more practical results from t h a t r e p o r t , and in particular those when W is singular (only positive semidefinite). In t h a t case we have a semi n o r m (there a r e vectors x # 0 s u c h t h a t 115

I / Iy = 0).

When M is singular a n a t u r a l choice of W would, perhaps, be M', the pseudo-inverse of Y. There is, however, a more n a t u r a l choice. If M is ositive definite, t h e n XTMX = I , so M-I = AXT.So one might guess t h a t I I T l l x x ~ = (T XX*r)%could be a reasonable

P

alternative also in t h e c a s e when M is singular. It is not difficult to prove t h a t W = XYT satisfies:

WMW = W , M W M = M when ( K , M ) is nondefective (WMW = W holds in the defective case). So W is a generalised inverse, b u t since WM and MW usually a r e not symmetric matrices, W is n o t in general the pseudo-inverse.

To justify this latter choice of W let us quote the following theorem: Theorem4.1. Given any y

E Rn a n d

IA-PI

r e a l p r there is X = A(K, M ) such t h a t

llMv I l m T g II(K-I,LM)y I I m T

This is not, of course, a particularly practical result, since we do n o t usually know

X.We cannot replace XYT by M', in general, as t h e following example shows. Example. Let K =

1:]. :1 i],

I A-p I 1 1 My I I

a n d y = ( - 1 , 2 - ~ ) ~ Then .

M =

= I 1-p

hold unless p = 1.

1,

and

1 1 ( K - p M ) y 1 1 M+ = 0 ,

and the norm inequality does n o t

If we, however, r e s t r i c t t h e choice of y we c a n regain the inequality: Corollary 4.2. If y E X then IA-pI

llMv I l y c c lI(K-PM)y

IIMt

If y E X then both My and (K-pM)y lie in T ( M ) = T ( M + ) . It may be noted t h a t this bound holds in t h e defective case too. The n o r m of the residual can be computed in practice a t least when M is dia i o d . 1 1 . 1 I M + defines a n o r m on X.In h e following theorem we also use (z , Y ) = ~zTMy which defines a n inner product on X.Using this inner product we c a n get a decomposition of

Q

117

A Generalised Eigenvalue Problem and the Lanczos Algorithm

y into z and e , where z is the eigenvector of the eigenvalue closest t o p and e is a vector orthogonal to z ( ( 2 , e)M = 0). Assuming t h a t y . z , and e have unit length ((5, z ) =~1 , etc.) we can define the error-angle cp such t h a t y = cosp z +sinp e . With these definitions we can prove theorems similar to those in  page 2 2 2 .

Theorem4.3. Let y EX, where yTh& = 1, and p be given. Let X be the eigenvalue closest to p and define the gap y = min I X i -p 1 . A' # A

L e t y =coscpz+sincpe,whereKx =hMx,zTMz = l , e T M e = l , a n d e T M x = O . Then the following bound holds for sincp:

I sinv I Y-’ I I (K-PM)Y I I M4 If y equals the Rayleigh quotient, y TKy,it is also true that IA-P I 5 7-' I I (K-PWY I I+;

These bounds also hold in the defective case. Using the above theorems we can give bounds on the accuracy of the approximations computed by the Lanczos algorithm. Before applying these bounds to the Lanczos case we prove the following theorem in which the residual of a inverted problem is used to construct another vector f o r the noninverted problem. (This is a slight generalisation of how the vector M was constructed in theorem 3 . 7 . ) From pow on we will only deal with the nondefective problem.

Theorem4.4. Let T = (K-pM)-'My -uy , y T M y = 1, y T& = I I (K-(P+w-1)M)i7 I I M + I I 1 I i!.

w7

Then q(w) is minimised for w o = u+ Ilr Il\$u-'

Hi)-'

= 0. Take

g = uy +r

and let

and the minimum value, v(w0), is

IIT I l M ( ~ 2 + I I ~

Proof. Let (v2+11r

US

first note t h a t

Hi)%= I I W I I M + = IlVllnr =

s i n c e y T m = 1. From the definition of r^

5

lI(K-PM)-'h&

IIM f 0

f7 we see t h a t My = ( K - p M ) q , s o

(K-(p+u-')M)V = My -o-'W = (1- u w - ~ ) & -w-’Ah

which gives

1 1 r^ 11*; = (1 - u w - ' ) 2 + W - Z r T m = (w-’(U2+T TMr)?+-u(uZtrT&

)-"Z+

1-u2(

U2+T T M r ) -1

whichisminimisedfor w o = v + u - ' T ~ M T giving I I r ^ I I ~ + = r T A h ( v Z + r T ~ I )-l. (Of course wO1 = gT(K-pM)g/[email protected] Note t h a t u = yTM(K-pM)-'My.) (p+w{’, g ) need not be a better approximation than (p,+u-', y ) . So let us find out when 77(wo) = T

It ( K - W w g ' ) M ) V I I M + I I w I I

= (K-pM)-'IWy-vy

5 I I (K-b+v-')M)y II M + IIM y I I s o (K-(p+u-')M)y = -u-'(K-pM)r and (*) is equivalent to:

1 1 ~ 1 1 ~ ( ~ 2 + 1 l 1l i~ ) - ' s l l ( K - ~ M ) rI I M + 1 4 - ' which is equivalent to I w o I - ' ~ //(K-&Wr l l M + l l M Ilia!

(*)

T. Ericsson

118

In other words, for the inequality (*) to hold, T should have large components in the eigenvectors corresponding to A ( K , M) f a r f r o m y o r I uoI should be large, i.e. g should approximate a n eigenvector corresponding to a n eigenvalue close to y. (In the same way a s was shown in section 3.3.3 we get (provided A ,utu-')

I&?I([email protected] which shows how

I IIflII)-l

IX-WI

1Ak-PI-l

is affected by other eigenvectors.)

Example. Let K = d i a g ( 5 , 10, loo), M = I , and p = 0. Take y T = (0, 10, l ) l O l - n . In this case we g e t a much better approximation using q T = (0, 10, 0 . 1 ) ~(for some 7). Furthermore, 10-u-’ fil -9.0.10-2, 1O-wO’ W -9.0,10-3, l\(K-(y+u-’)I)y IIy I \ - ' M 8.9, ) V l l M 0.90 and l l ( ~ - ( ~ + ~ O ~ VllVll-l If we, however pick y T = (1, 10, 0)lOl-5 t h e picture changes completely. Now 10-v-' FY 9.8,10-2, 10-wO1 FY 0.19, and g* = (2, 10, 0 ) ~ . II (K-(p+u-l)Z)y 1 1 I I Y 11-1 M 0 . 5 0 , and l I ( K - ( ~ u + w ~ ~ ) I ) f lIIMII-I I FY 0.96 We now apply t h e theorem to the approximations produced by the Lanczos algorithm in the nondefective case (the notation follows t h e theorems above). In this c a s e y = v s and I I T 1 1 = ~ ,3 1s I for some s , so w o = v + B ~ s ~ u -and ' the corresponding q(oo)= a, Isp I (u2+,3;sEf'. %e c a n also see t h a t the changes, when using g , will not be s o dramatic (except f o r the contamination), since:

which would be quite small. Also Iwo-u(

=

IlT

I~~luI-'=fSp2sp2IuI-~

which usually is even less. Since EXwe c a n apply 4.2 and 4.3 and get: Corollary4.5. L e t p = a, I sp I (u2+,3;sE)-'

t h e n sin

I sin p I

is bounded by

I py-'

and there is a n eigenvalue A t h a t satisfies I~-(yu+o;')

I

s min ( p , p2y-')

Acknowledgements. This work was completed during a visit to CPAM a t the University of California, Berkeley. I thank Professor B.N. P a r l e t t for making my stay so pleasant and interesting. This research was supported by the Swedish Natural Science Research Council.

5. References. 113 Ericsson, T., Jensen, P.S., Nour-Omid, B., and Parlett, B.N., How to implement the spectral transformation, Math. of Comp., to appear.

A Generalised Eigenvalue Problem and the Lanczos Algorithm

119

 Ericsson, T. a n d Ruhe, A,, The spectral transformation Lanczos method for the numerical solution of large sparse generalised eigenvalue problems, Math. of Comp. 35:152 (1980) 1251-1268.  Moler, C., An interactive matrix laboratory, Tech. rep., Univ. of New Mexico (1980, 2nd ed).

[41 Parlett, B.N., The symmetric eigenvalue problem (Prentice Hall Inc., 1980)  Uhlig, F., A recurring t h e o r e m about pairs of quadratic forms and extensions: A survey, Lin. Alg. Appl. 25 (1979) 219-237.

 Wilkinson, J.H.. The algebraic eigenvalue problem (Oxford U.P., 1965). [ 7 ] Wilkinson, J.H., Kronecker's canonical form a n d t h e QZ algorithm, Lin. Alg. Appl. 28

(1979) 285-303.

This manuscript was prepared using troff a n d eqn a n d printed on a Versatec printer using a 11 point Hershey font.