Chapter 3
Lanczos algorithm The previous chapter on linear algebra for Hankel matrices was mainly intended to give a linear algebra interpretation of the Euclidean algorithm and its variants. It turned out that this gives rise to block factorizations of indefinite Hankel matrices and their inverse. In this chapter, we start with the iterative solution of linear systems with an arbitrary matrix or the computation of the eigenvalues of such a matrix. The relation of the results in the previous section with what is known as the Lanczos algorithm [178] to tridiagonalize a matrix is striking. The correspondence is explicitly given in a paper by Gragg [119] for the normal case, i.e., for the case that all the degrees ak are equal to 1. The relation between the Lanczos polynomials and Pad~ approximation is used in a paper by Cybenko [72]. In another paper by Gragg and Lindquist [121] the connection is mentioned again as corresponding to the normal case, but it is not difficult to give the formulation in the general case as a generalization of the usual Lanczos algorithm. It is in fact a variant of the LookAhead Lanczos (LAL) algorithm of Parlett, Taylor and Liu [201]. The Lanczos method and related algorithms like conjugate gradient methods etc. have been a very popular research topic recently. See for example [94, 124, 125, 127, 128, 160, 170], and the many recent references therein. Much more on iterative methods in linear algebra is discussed in [4, 213]. The classical Lanczos method for positive definite matrices is discussed in [67]. We shall give in this chapter only the basics of the method. Since recently there are many variants and implementations of this basic idea. Best known are QMR and its variants by Freund et al [95, 96, 97] and the algorithms by Brezinski et al [30, 31, 32]. We refer to the literature for more information. 99
100
3.1
CHAPTER
3.
LANCZOS
ALGORITHM
Krylov spaces
Suppose a matrix A C F (m+l)x(m+l) is given. This matrix should not be confused with the upper triangular matrix A of the previous chapters. In this chapter we shall also need the submatrices An of that upper triangular, but since they will always have an index here, there is probably no confusion. Let there be an involution defined in F which we shall denote by an upper b a r think of it as the complex conjugation in C   and we shall use the superscript H to mean A H  ~T. We say that A is selfadjoint if A H = A. With some arbitrary vectors x0 and y0, one generates the Krylov sequences zk

Azk_~  Akzo,
k1,2,...
Yk

A H y k _ l  [ A H ]ky0,
k
1,2,...
These vectors define Krylov spaces A'n  s p a n { z 0 , . . . , zn} and y,~  s p a n { y 0 , . . . , Yn}. There will be a smallest integer n , _< m such that dim X,~  n~ + 1 for all n _> n~. This is indeed true, if we denote the matrix [Zo[Zll...Iz,~] as Xn, then A'n = range Xn and extending X n to X n + l or further does not add anything as the following lemma shows. We have used there the notation a instead of n~ + 1 for simplicity. L e m m a 3.1 L e t X k  [xolAxol." IAkxo], k  O, 1 , . . . be Krylov matrices. Then rank Xo~  a if and only if the first a columns are linearly i n d e p e n d e n t and the first a + 1 columns are linearly dependent. P r o o f . Let us denote zk  A k z o . Suppose there exists an a such that ot
akzkO, a.#O. i=0
Then the first a + 1 columns are linearly dependent and we can express z~ in terms of the previous columns. But it then follows by induction that all the following columns are also linear combinations of the first a columns and therefore the rank of Xor is at most a. Again by choosing a minimal a, we catch the rank of Xoo. [:3
3.2. B I O R T H O G O N A L I T Y
101
It follows from this l e m m a that the Krylov spaces are strictly included in each other until a maximal space is reached"
Xo c ,r~ c . . .
c x,,.

,v,,.+x =   . =
xoo.
(3.~)
Thus we can give the following definition. D e f i n i t i o n 3.1 For a given vector Zo C F m+l and a square (m+ 1) x ( m + 1) matrix A, we can generate the Krylov spaces as in (3.1). The space 'u
 Xoo  span{z0, A z 0 , . . . , An'z0}  range[z0]Az01... IA~'x0]
is called the maximal Krylov subspace generated by zo. We define the grade of Zo (with respect to the matriz A) as the dimension n . + 1 of the mazimal Krylov subspace of A generated by zo. At the same time, ,In. is an Ainvariant subspace as one can readily see. It is the smallest Ainvariant subspace containing z0. Indeed, if it has to contain z0, and be Ainvariant, it should also contain Az0. Therefore, it should also contain A2z0 etcetera until we get some A'*'+lz0 which is already in the space. The first number nx + 1 for which this happens is precisely the grade of z0. Note t h a t the minimality of the Ainvariant subspace and the maximality of the Krylov subspace are no contradictions, but designate the same thing. A thorough account on invariant subspaces for matrices and their applications can be found in [112]. By analogy, the spaces Yk will terminate in an AHinvariant subspace of dimension n u + 1. It are these invariant subspaces t h a t we want to catch. T h a t this can be done by the Euclidean algorithm is explained in the next section.
3.2
Biorthogonality
To describe the invariant subspaces, we have to find some bases for them. Of course the Krylov sequences form bases, but because they are not orthogonal, it does not show at what point we loose linear independency unless we examine the linear independency, which could be done by an orthogonalization, or in this case, more efficiently by a biorthogonalization procedure. Thus the idea is to find nested biorthogonal bases {xk} and {Yk} such t h a t
2dk def k  xk,  span{zi }i=0
k
Yk d~f span{:yi }i=o k
k  0 , . . . , n u.


Yk,
o,...,~,
(3.9.)
CHAPTER
102
3. L A N C Z O S
ALGORITHM
Note that this condition is equivalent with requiring that ~k has the form "xk  p ( A ) z o with p(z) some polynomial of degree k. A similar relation holds of course for yk. We also require some formal biorthogonality relation 
i , j  0, 1 , . . .
with ~i nonzero and B a square nonsingular matrix that commutes with A. The above product is a genuine inner product if B is a selfadjoint positive definite matrix, however we shall allow an arbitrary nonsingular matrix here. R.elevant examples are B  I, B  A or B  A 1. This matrix B was introduced by Gutknecht in [125]. However, we shall discard it by assuming that B has been taken care of by replacing x0 by Bxo or by replacing Y0 by BHyo. It can at any instance be reintroduced by taking it out of the initial vector again and using the commuting property with A. If we define the matrices and which are both in written as
~,.,,+1)x(,.,+I), ~g~.
then the biorthogonality condition can be
_ On  d i a g ( ~ 0 , . . . , ~n).
It will turn out that these matrices reduce A to a tridiagonal matrix, i.e., ]THAXn is a tridiagonal matrix. We shall show that the extended Euclidean algorithm as we have seen it before solves this problem. To begin with, we show that it solves the problem at least in the generic case, by which we mean that all the degrees ak of the polynomials ak are K~n+ 1 one. This implies that n,~+l z..,k=0 ak will be equal to n + 1. First, define the numbers fi by the relations
Y H z j  y H a i a J z o  fi+j+l, Thus when we set X ,  [ x 0 1
i , j  O, 1 , 2 , . . .
.. I ~ , ] and Yn  [Y01 . . . l y, ], then
r#x. is a Hankel matrix based on the coefficients fi. In general these matrices had dimensions ~n+l, which, in our generic case, is just n + 1. The extended Euclidean algorithm, appfied to the data f  ~ i ~ 1 fi z  i will generate among other things, the polynomials ak (supposed to be of degree 1) and the polynomials a0,k whose coefficients appear in the matrix An as defined in (2.5, 2.7). Because this matrix is (unit) upper triangular, the matrices (recall that denotes the involution in F)
Jr,,,  X,,,An and 1~,~  YnAn
3.2. B I O R T H O G O N A L I T Y
103
shall have columns that satisfy the conditions (3.2). We verify next the biorthogonality relation. We can express the previous relations as :~k

"'" I Akzo]ao,k
[zolAzol
= ao,k(A)zo and similarly we find that
flk. ao,k(AH)yo [ao,k(A)]gYo. We can easily verify that
~H](,.,_ ATyHx,.,An  ATH,.,A,., D,.,
(3.3)
with Dn a diagonal matrix as we have seen in Theorem 2.1. Recall ak = 1 for all k which implies that Dn is n + 1 by n + 1 diagonal and its diagonal blocks Dk,k are scalar. Thus in the notation of the previous sections: Dk,k = ~k = d2k+~. Writing down the ( i , j ) t h element of the previous relation (3.3) we get ^H^ T 0] T  ~ i j D i , i (3.4) Yi :r'.i  [a0,i 0 . 9 " 0 ] H n [ a 0,j T 0... which shows how the orthogonality of a0,i and a0,j with respect to the symmetric indefinite weight matrix Hn is equivalent with the biorthogonality of
f I i  [ao,i(A)]Hyo
and
~j  [ a o d ( A ) ] z o
with respect to the inner product with weight I. Now we should show that yTApf~ is tridiagonal. relations zk+l = Azk for k = 0 , . . . , n 1
We can write the
and z , , + l  ao,,,+l ( A ) x o  A n + l zo + [ Zo I "'" ] z,, ]ao,n+ 1
as one matrix relation
AXn  Ix1 I "'"
]  X,.,F(ao,,.,+l)+[O . . . 01
]
where as before, F(ao,,.,+l) represents the Frobenius matrix of the polynomial a0,n+l. Because we have shown in Theorem 2.3 that
F(ao,,.,+l )A,.,  ANT,., with Tn a tridiagonal Jacobi matrix, we find that
AX,.,A,.,  X,.,F(ao,,.,+I)AT, + [ 0
.
.
.
0 I ~,~+1 ]An
104
CHAPTER
3. L A N C Z O S
ALGORITHM
or, by definition of )(n and because An has diagonal elements equal to 1, A](,.,  XnT,., + [ 0
"  0 [ ~n+t ].
(3.5)
... 0 I Sn+, ].
(3.6)
As a side result we note that by analogy A H y n  Y,~T,~ + [ 0
Hence we get the symmetric Jacobi matrix Jn in
2A.L,,

? 2 L , T,, + [ o
=
D.T.J.
... o
]
because xn+l is orthogonal to all yk, k  0 , . . . , n by the relations (3.4). Thus y H A X n is tridiagonal and we are done. We do not have that ]~Hxn is the unit matrix, but we can always write the diagonal matrix Dn as the product of diagonal matrices D~,~D" and replace ]I,~ by the normalized Y,~[D~]1 and 5(,~ by Xn[D,.,] ^ ,  1 9 In that case Yn and )f,, will be biorthonormal. More generally, we could have imposed any normalization on these vectors by taking for D~ and D~ arbitrary nonsingular diagonal matrices.
3.3
The generic algorithm
Now that we know that the extended Euclidean algorithm can solve the problem, let us reformulate it in terms of the data of the present problem. Kather than computing all the fi first, then applying the algorithm to f to get the a0,k from which zk = ao,k(A)zo follows, we want to translate the algorithm directly in terms of a recursion for the zk. The result corresponds to a twosided GramSchmidt biorthogonalization of the initial Krylov sequences formed by the columns of Xn and Y,~. The fact that ak is of degree 1, means that the orthogonahzation process can be obtained in a very efficient way. It is indeed sufficient to subtract from A~k1 its components along xk1 and zk2, since its components along xi for i = 0, 1 , . . . , k  3 are zero. This is how in the recurrence relation 5:k  ak(A)~.k1 +
~kxk2
appear only three terms. This three term recurrence is an immediate translation of the three term recurrence relation for the polynomials ao,k a s indeed by definition xk  ao,k(A)zo and because ao,k  akao,k1 + flkao,k2 as we know from (1.13). Recall that flk  u k  l c k . For the monic normalization
3.4.
THE
EUCLIDEAN
LANCZOS
105
ALGORITHM
(see Theorem 1.14) we should choose u k  1  ~k_12 and ck then

~ k  1 , SO t h a t

(3.7)
The Tk1 elements are found as ~k1Dkl,k1
=
T [aO,k1
0 ' ' ' 0]H.[a O,k1 T 0...0] T
=
[a0,k_ 11T r k Hl X k  l [ a O , k  1 ]
:
The polynomial coefficients ak  [ a Tk 1 ]T are given by the solution of Dk_l,k_l
a_k  d_. k
as in (2.19). In the present case where all ak are assumed to be 1, this gives a very simple expression because a_k is just a scalar and so is Dkl,k1 = d 2 k  1  r k  1 . The vector d_k also reduces to a scalar, viz. d2~k_~+2 = d2k, which, according to (2 911), is given by Y^S k  1 A&k 1. The algorithm can start with xo  x0, !)o  Yo and x1  0, !}1  0. The general iteration step is
X,k
Ax, k1

^H A~.k ~ ^ ~H_ 1 ~k, Yk1 ^H ~.k z~:_~ ^H 5:k Yk1
I
Yk2
^ Xk_ 2 .
2
Of course a similar relation holds for the ~)k recursion. This is closely related to the recursion of the 0RTHODIR variant of the conjugate gradient m e t h o d as described by Young and Jea in [242] and in the thesis of Joubert [162]. It is possible to find other expressions for the inner products. For example, it follows from the biorthogonality relations and because the a0,k are monic that
3.4
The
Euclidean
Lanczos
algorithm
Note t h a t a zero division will cause a breakdown of the generic algorithm. T h a t is when one of the rk1 is zero. This will certainly happen when a maximal subspace is reached. For example when the m a x i m a l Xk1 is reached, then Xk C A'k1 and thus also xk C A'k1. Because Yk is chosen orthogonal to A'k1, it will also be orthogonal to zk Thus ~H~k  0. By s y m m e t r y the same holds when a maximal subspace Yk1 is reached. If we are looking for the invariant subspaces, then this breakdown does not m a t t e r because we then have reached our goal. So the question is in which situation can a breakdown (i.e., rk1 = 0) occur.
106
CHAPTER
3.
LANCZOS
ALGORITHM
If we are in the generic situation where all the blocks have size 1, then by the determinant relation det Hn  det Dn, we see that this will happen if and only if one of the leading principal submatrices of the Hankel matrix becomes singular. For a positive definite selfadjoint matrix A and with z0  y0, it turns out that Yn  Xn and consequently H,.,  x H x n is selfadjoint positive semidefinite. Thus det Hn  0 if and only if n _> rank H  n~ + 1  n~ + 1 where n~ + 1 is the maximal dimension of the Krylov spaces ,u  Yn. Thus in this case, at the moment of a breakdown, we always reached the rank of the Hankel matrix and at the same time we reached a maximal Krylov subspace. Note that reaching the rank of the Hankel matrix means t h a t we know its symbol exactly. This does not mean though that we know everything about A; only the information about A that is caught in the symbol f ( z )  ~ k f k z  k where fk  y H A k  l z o , k  1 , 2 , 3 , . . . will be available. How much of A is kept in f will be explained below. Since we are after these invariant subspace, this case is not really a breakdown, but the algorithm just comes to an end because everything has been computed. However, when A is not selfadjoint or when x0 ~ y0, it may happen that the algorithm breaks down before r a n k H is reached. This is a situation that can be remedied by a nongeneric Lanczos algorithm. Because the generic algorithm breaks down but the situation can be saved by the nongeneric algorithm, it is said that we have a "curable" breakdown. This shortcoming of the generic version explains why for some time the Lanczos algorithm has not been very popular, except for positive definite matrices where a breakdown theoretically never happens except at the end of the computations. So we can always continue the computations until we reach the rank of H. Of course we then always have a breakdown. Either we have then reached an invariant subspace, i.e., a maximal Krylov subspace, or not. The first case is what we wanted and the algorithm stops normally as in the positive definite case. However in the second case, we have an "incurable breakdown". T h a t is a breakdown when the rank of H is reached before a maximal Krylov subspace is obtained. Note that rank H < m i n { n ~ , n ~ ) + 1 is perfectly possible also for a nonsingular matrix A. E x a m p l e 3.1 As a simple example consider zo  [ 1 0] T, yo  [0 1]T and A the 2 by 2 identity matrix. Clearly n .  nu  0 while H  0 and thus rankH0 < 1.
3.4. THE EUCLIDEAN LANCZOS ALGORITHM
107
A more detailed analysis of breakdown will be given in Section 3.5. First let us show that the extended Euclidean algorithm also works in the nongeneric case, i.e. the case where some of the ak can be larger than 1 and that it really heals the curable breakdown. A most important step in the design of a nongeneric Lanczos algorithm was made in 1985 when Parlett, Taylor and Liu proposed the LookAhead Lanczos (LAL) algorithm [201]. They propose to anticipate the case where rk1 = 0. If it occurs that 9H_l~k_l  0, then they look for the number 9H_IA~k_I and when this is nonzero they can compute both pairs 9k, yk+l and ~k, ~k+l simultaneously. The price paid is that Tn is not quite tridiagonal, but it has some bump or bulge where a vanishing rk occurs. For numerical reasons, it is best to do such a block pivoting every time some rk is not zero but small. This is precisely what Parlett et al. propose. Computing couples of vectors simultaneously may not be sufficient because if the 2 by 2 block
k1
^H A YkI
][
~k1
A~k_~
]
is still singular, one should take a 3 by 3 block etc. If we go through the previous train of ideas again, but leave out the assumption that ai = 1 for all i, then we see that the Euclidean algorithm does precisely something like the LAL algorithm. We should stress however that the correspondence is purely algebraic. There are no numerical claims whatsoever. The Euclidean algorithm has no sound numerical basis, but at least in theory it gives a reliable algorithm. We start as before from a matrix A E F (m+l)• and some initial vectors x0, y0 C 1~+1, to define the Krylov sequences
zkAkzo
and
yk(AH)kyo
for
k0,1,...
as wel as the corresponding spaces ~'k and Yk. Defining the elements fk+l

we can associate with f ( z ) 
Hk,k

yHAkxo,
k  O, 1 , 2 , . . . ,
Ek>l fk zk the Hankel matrices
H k , k ( f )  [Y0 Yl..yk]H[xo
Xl...Xk]

ykHXk
Let ~k and ak be the Kronecker and Euchdean indices respectively which are associated with the sequence {fk}. It may happen that 9H~k  0 without having reached an invariant subspace, thus k < min{nz, nu} , with nx + 1 and n u + 1 the dimensions of the maximal Krylov subspaces. Then
CHAPTER 3. LANCZOS ALGORITHM
108
we have to deal with some ak > 1 and the simple recursion of the generic case does not hold anymore. The solution to this problem is the following. We generate the xk and ~)k in clusters of ak elements. To describe a general step computing such a cluster, it is useful to renumber the vectors, so t h a t the grouping into clusters is reflected in the notation. We shah use the following notation to indicate the first vector of the kth cluster : z0,kz~ kao,k(A)zo,
k0,1,...
where we define '~0  a0  0. Similar notations are also used for other vector sequences like xk, yk, yk etc. We also use the abbreviation we used in Section 2.2 to denote Hk  H~kl,,~kl(f ). Suppose t h a t we have computed the vectors xo,i
ao,i(A)xo
~/o,i [ao,i(A)]Hyo,
and
for
i 0,1,...,k
1
and t h a t ~H 0,k_l$0,k_l  0. The generic version described above breaks down at this stage since we have to divide by this number. This is the place where a new cluster of more t h a n one element starts. The number of vectors in the cluster is defined by the smallest number i such t h a t the Hankel m a t r i x [Y0,k11A H~)o,k1 [...
I(AH)i1Yo,k1 ]H [z0,k11Az0,k1 I . . .
IAi1 z0,k1 ] (3.8) is nonsingular. The first nonsingular such m a t r i x is precisely D k  l , k  1 which we need in the c o m p u t a t i o n of ak. The ( i , j ) t h d e m e n t in this m a t r i x is

yH[ao,k1 (A)]A i+j2 [ao,kl(A)]xo
__
[zi+j2
=
d2,~k_t+i+j1.
^H Ai+J2 z0,k1 Y0,k1
a0,/~l] T Hk[a0,k1]
As we know from Section 2.2, the first nonzero element in the sequence d2~k_~+k, k  0, 1 , . . . is T k  1  d 2 ~ k _ l +otk  Y^H o , k  1 A~kl& 0 , k  1 .
So the first i for which (3.8)is nonsingular is i  ak since the Hankel m a t r i x (3.8) then takes the form 0 ......
0
d2~h_~ +ak
9
Dkl,k1 
,
" 0 d2~k_t+ctk
" 999
d2~k1
, with d2,~_~+~k = ~k1.
(3.9)
3.4.
THE E U C L I D E A N L A N C Z O S A L G O R I T H M
109
We then choose e.g. z~k~+,  Aiz0,k1 and y,~j,_~+ifor i 
1,2,...,ak
1. This defines the ( k 
(AH)i~lo,k1
1)st clusters
Xk1  [5:o,kllA:/:o,kl[... [A'~k15:o,k1] and
Ykx  [
o,kll... [(AH) ak1 Y0,k1]
o,kxlAn
so that s p a n { ) ( o , . . . , J(k1}
span{:~o,..., x~k1}

=
span{xo, . . . , Z~;k_l }
and 
=
span{~)o, 9 9 ~)~k1 } span{yo,...,y,~k_l}
while
g/H~j _ ~i,jDi,j;
O <_ i , j <_ k  1 .
The first vector of the next (kth) block can be computed as
Z o , k  ak(A)$o,k1 + ~kZo,k2 with flk given by flk  rk_12rk1 as in (3.7)where rk1
0 9 9 "]H[aoTk1 0 . . . T ]
[ z~
=
yH[ao,k_l(A)]Aa~'l[ao,k_l(A)]x o
=
Yo,klA~
,,H
1
~.o,k1 r 0
and the polynomial coefficients ak  [a_T 1] T are given by the solution of D k _ l , k _ l a__k  _ d k
with Dkl,k1 the lower triangular Hankel matrix (3.9) and d k a vector containing dk  [d2~k_l+ak+l... d2~] T with the dk sequence obtained from ^H AtIX 0 , k  1 , d2~k_~+t  Yo,k1
l  1, 2 , . . . , 2ak.
CHAPTER
II0
3. L A N C Z O S
ALGORITHM
Algorithm 3.1" Lanczos Given A C ~ m + l ) • zo,Yo E I ~ +1 Set Xo,1  !)o,1  0, r  1   1 Set mo,o  zo a n d ~)o,o  yo, no  0 f o r k  0, 1 , 2 , . . . Set i  tCk for l
1,2,...,m
i
^H d2i+,  ~ H ~ i + ,  1  Yi+,_l~,i if d2i+l # 0 t h e n Set a k + l  l Leave this forloop else
~,i+~  A~,i+~l; ~)i+~  AHgi+l1 endif endfor if ak+l not defined t h e n K,k+ I

forl

M,k
stop e n d i f
+ Ctk+l
1,...,ak+l
^H ^H d2i+ak+l+l  Yi+l1 i+ak+l = yi+ak+l~.i+l1 endfor Solve Dk,ka_k+l = dk+1 with lak+i Dk,k
[d2i+p+qljp,q=l
and d_&+l  [d2i+~k+~+l...d2~k+~ ]T Set
Zo,k+l  [ak+l(A)]~0,k +/~k+l Zo,k1 ~lo,k+l  [ak+l(A)]gYo,k + ~k+l~)0,k1 endfor
3.4. THE EUCLIDEAN LANCZOS ALGORITHM
111
So we can reformulate the extended Euclidean algorithm in terms of the data of the current problem as given in Algorithm 3.1. We note that the initialization r1 =  1 corresponds to our earlier choice of setting s   1 . This allows us to define ~1 like all the other ~k. However the value of/31 is not important since it multiplies x1 and y1 which are both zero anyway. We note also that we have given an implementation that corresponds to the normalization of Theorem 1.14 which yielded monic polynomials aO,k. It is left as an exercise to translate this to the more general form. Besides this type of normalization, one can also implement a normalization of the xk and ~)k vectors by multiplying each one by an arbitrary nonzero constant. We have also taken a special choice for the vectors within a cluster. We have just chosen them to be m0,k1, A&0,k1,..., A ~k1 x0,k1. The subspace that corresponds to such a block can however be spanned by any other set of basis vectors for this subspace. That would lead to the same block structure, but the diagonal blocks Dkl,k1 would in general not be Hankel anymore. E x a m p l e 3.2 ( E x . 1.5 c o n t i n u e d ) Consider the following data 0 1 0 00 0
A
0 0 0 0 0 1 0 1 0 0
0 1 0 0 0 00 1 0
;
1 0 0 0 1
x0
;
1 0 0 0 0
Y0
.
Then
X4
1 0 0 1 001 0 0 0 0
0 0 0 0 0 0 00 0 1 0 0 0 1
;
1 0 0 0 1
I/4
0 0 0 1 1
0 0 1 1 0
0 1 1 0 0
1 1 0 0 0
Note that
{A}~~
{yTAk1 ;:;gO}~:)=l  {1,0,0,0,1,1,0,0,0,1,1,0,0,...}
Note that this is the same sequence as in Example 2.2. Lanczos produces the following results
5:oXo;
~)oYo;
dlfTxo
The algorithm
1'ro:/:O.
Hence the degree al I and the recursion coefficient f~l   1 /  1 The other recursion coefficient requires 
o~1



0.
 1.
C H A P T E R 3. L A N C Z O S A L G O R I T H M
112
T h e solution of the s y s t e m Do,oa_1   d 1 with Do,o  ro  i a n d dd_l  d2  0 gives a__1  0 and therefore ~o,x

Xl  A&o,o + 1 9 0  xl
00,1

~1  A~o,o + 1 9 0  yl.
T h e next step of the a l g o r i t h m indicates a b r e a k d o w n of the generic s i t u a t i o n since d3  ~ T0,1~0,1  0. So we look f u r t h e r to find the first nonzero dk" ~)oT,1A&o,1  0 ,
d4 
ds  Yo,IA2&o,I ^T  1.
T h u s we found t h a t a2  3 and c o n s e q u e n t l y a2  al + a2  4. Using ro  1 a n d rl  ds  1, we find f~2   t o / r 1   1 . T h e next three vectors are j u s t shifted versions of xo,1 and yo,1 ~2A&o,~x2;
&3A2&o,lX3;
 Y2;
~12  AT~Io,1
Y3  ( A T ) 2 y 0 , 1
 Y3;
We need 3 m o r e dk values to define D1,1 and d 2. These are d6  Y o^T ,1 A 3&O,l1,
So we solve
Dl,la_2 
d ~  y o ,^T lA
d2,
4^x O , l  0 ,
i.e.
I~176 Ill 0 1
1 1
1 0
d8  Yo,1 ^T A s X0,1  0.
a_2  
0 0
111
to g e t ~ 
1 1
.
T h u s the next vectors are given by xo,~
~lo,~

x4(A
3A
2+AI)~o,ll~o,o
=
[1
1

~14 
( A 3  A 2 + A  I ) TyO,1  l y 0 , 0
=
[0 0 0 0 0
1 1
1
1] T
2] r
At his m o m e n t '~2  r a n k H and the a l g o r i t h m stops. We have c o m p u t e d now the matrices 1 0 0 0  1 0 1 0 0 24
0
0
1
0 0 0 1 0 0 0 0
0
1 0 0 0 0 0 0 1
1 1
1 1
;
Y4
0
0
1
0 0 1
0 1 1 0 1 1 0 0  2
0
0
.
3.4.
THE EUCLIDEAN
LANCZOS
113
ALGORITHM
Note that these matrices are indeed found as and I24  Y4A2
f(4  X 4 A 2
where A2 is the upper triangular matrix containing the coefficients of the polynomials ao,k and their shifts as in Example 2.2. We verify that 1 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 OOOO2
1 1 0 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0  1 0  1 0 1  1 0
1 1
1 001 0 1 110
1
 D2. 2
This is the same block diagonal as in Example 2.2. Finally we check t h a t 00 0 1 11 1 0 00
1 1 0 0 2
0 0 0 2 2 000 1 0 010 001 000
001 011 110 2
0
1 1 1 1 1
0 2 0 0 1
which is indeed J2  D2T2 with T2 the block tridiagonal matrix of Example 2.3. Compare also with what was obtained in Example 2.5
Suppose t h a t at a certain stage in the algorithm, we have computed x,+l ~ + 1 with i + l v + 1 ~nt'I (3.10) 
for some n where tcn+l is a Kronecker index of the sequence f k , k  1, 2, . . . . In this case we have the precise decomposition of the Hankel m a t r i x Hn as
114
C H A P T E R 3. LANCZOS A L G O R I T H M
discussed in Theorem 2.1. The matrix Hn  H~,,~(f)is then nonsingular. However, when Hu,~, is singular, there is no such n satisfying (3.10) and then the last block computed in the decomposition of H~,,~, is not complete. Let us resume some of the results we have obtained so far. T h e o r e m 3.2 Let A E ~,,,,,+1)x(m+1) be a square matrix. nonzero vectors Xo, Yo E 1~+1. Define the Krylov matrices
Choose two
I A=o i " I A"=o ]  [ = o Ix1 i ... I=~, ]
x,,  [ = o and
I(AH)"Yo][yo
Y~ [Yo I AHyo I "'"
l yl I "'" lye, ].
We also define fkyHAklxo,
k
1,2,...
and the Hankel matrix yHx~,
[fi+j+l]i~j=o H~,,~,($)
with
k
Sk>l
Associate with the sequence {fk}k>__1 the same quantities as in Corollary 2.2. We also need the Jacobi matrix T,~ as in Theorem 2.3. Finally, we define )(v  XvA_~ and ~H  ~ y ~ T H . Then the following relations hold: span Y~  span ]zv and span X~  span 2u,
yHAXv
H ~ , ~ ( z f )  [fi+j+2]i~5:o,
(3.~1) (3.~2) (3.~3)
Moreover, if v + 1  •n+l, then Jn
(3.~4)
yHA](,,  ] RnTn  FT(ao,n+l) i R n
(3.15)
YfiAff(~ ATHn(zf)An
nnTn
and also P r o o f . All these relations follow from what we have seen above and some simple algebra. For example the last equahty in (3.14) is obtained from (3.6) which after taking the H transform and multiplication with X~ gives
?flA2~
=
^H T T ~ ' H x n + [0 .. 01y~+12~] T

T~Dn
3.5. B R E A K D O W N
115
For the last relation we have used Dn

A T f Rn so that from (3.14)
y H A x ~  A~ TD.T~  ] R,T~  A,TT~D~.T Now from (2.29) we get A~TT T  FT(ao,~+I)A~ T. Hence
yHA](~
A~TTTD,
F T ( a o , , + I ) A ~ T D ~  FT(ao,,+l) ] R,.
This is the relation (3.15). So far we did not say m u c h about the termination of the algorithm. This shall be investigated in the next section. This will also explain why we computed only m + 1 of the vectors xk, yk in Algorithm 3.1.
3.5
Breakdown in the Lanczos algorithm
We investigate in more detail the situations when a breakdown can occur in the Lanczos algorithm. Before t h a t , we shah recapitulate what we had so far and we give a number of simple lemmas t h a t we shall need. Recall t h a t H = H ( S ) w i t h oo 
k=l
with fk+l

yHAkzo  YHxk~,
0 <_ i <_ k  O, 1,...
so t h a t f(z)
yo'(ZI A)l o

is a rational function (if A is finite dimensional) and therefore the rank of H ( f ) will be finite. Let ~k, k  0, 1 , . . . be the Kronecker indices associated with the sequence {fk}, then the rank of the submatrices H~,,~, will go up in jumps like rankHv, v  ~ k
for
v+l~k,~k+l,...,~k+l
1;
k0,1,
....
If the rank of H is ~n+l, we can think of the last block as being infinite, i.e. ~,~+2  c~ and hence also an+2 oo. Thus the rank of Hv,~ is ~n+l for all ~' >_ ~n+l  1. The nonsingular submatrices H~,,~ of rank v + 1 are found for v + 1 equal to a Kronecker index, i.e. v + 1  ~1, ~ 2 , . . . , ~n+l. These are the matrices t h a t we denoted as Hk, k  0 , . . . , n . 

C H A P T E R 3. L A N C Z O S A L G O R I T H M
116
We also h a d the Krylov matrices (A C F (m+l)x(m+l))
X , . ,  [xo[Axol... fA~'xo] and Y ~ ,  [YolAHyol ... I(AH)"yo] and the Krylov spaces 2'~  r a n g e X ~  s p a n { z k  Akzo 9 k  0 , . . . , v } y~  range Y~  span{yk  (AH)kyo " k  0 , . . . , u} where dim X~  r a n k X~ _< r a n k Xn~  r a n k X o o  nx + 1 dim Y~  r a n k Y~ < r a n k Yn~  r a n k Yoo  nu + 1. Because H  Yoo H Xoo, we clearly have ~n+l  r a n k H < m i n { n ~ , n u } + 1 < m + 1. We s t a r t by a simple l e m m a (see e.g. [87, w 1.2]). L e m m a 3.3 The infinite Hankel matrix H has rank a if and only if its first a columns (rows) are linearly independent while the first a + 1 columns (rows) are linearly dependent. Proof.
If hk, k  0, 1 , . . . represent the columns of H , t h e n hi  Z  l h 0 , . . . , h k
 Z kh0,.. .
where Z  1 represents the u p w a r d shift. If a is an integer such t h a t
aoho + al hl + . . . + a~h~  O,
ao, ~ O,
t h e n the columns h 0 , . . . , h~ are linearly d e p e n d e n t . s t a n t s such t h a t h~ can be w r i t t e n as
h~  ~
Thus there exist con
bihi.
i=0
We can plug this into c~I
h,~+l

Z  l h`~
 ~
bzhz+l.
i=0
This shows t h a t also h~+l is a linear c o m b i n a t i o n of the first a columns. By i n d u c t i o n we find t h a t all the other columns are also in this span, which
3.5. B R E A K D O W N
117
proves that there can be no more than a hnearly independent columns. Thus the rank can be at most a. If a is the smallest such number, then it is equal to the rank. By symmetry considerations, this holds also for the rows. D Note the similarity with the result of Lemma 3.1 which we showed earlier for Krylov matrices. Any function of the form f ( z )  y H ( z I  A)lZo with A finite dimensional will be rational, and conversely, for any rational f(z), there exists a triple (A, zo, yo) so that the previous relation holds. By the Sylvester determinant identity formula, it holds that
f ( z )  yoH(zI A) 1 X 0
yo 
dj(  A) o d e t ( z I  A)
Here a d j ( z I  A) represents the adjugate matrix containing the cofactors. Hence, it follows that the poles of f ( z ) will be zeros of the denominator, which are clearly eigenvalues of A. However, the converse is not true : it may well be that zeros of numerator and denominator cancel. In that case, it is possible to write f(z) as f(z)  ( y ~ ) g ( z I  A')lZ~o with the size of A ~ smaller than the size of A. We give in this connection the following definitions. D e f i n i t i o n 3.2 ( ( m i n i m a l ) r e a l i z a t i o n ) If f ( z ) is a rational function and B, C C F (m+~)x~ and A C y,,,• such that f ( z )  C H ( z I  A)~B. Then we call the triple (A, B, C) a realization of f(z). If m is the smallest pos
sible dimension for which a realization ezists, then the realization is called minimal. D e f i n i t i o n 3.3 ( E q u i v a l e n t r e a l i z a t i o n s ) If (A, B, C) and ( ft, B, C) are
two realization triples of the same size, then they are called equivalent if they represent the same function f(z)  C H ( z I  A )  I B  O H ( z I  /~)1~. The following is true. Two realizations related by C F H c , [~  F  1 B and A  F  1 A F with F some invertible transformation matrix are equivalent and if the realizations are minimal the converse also holds [165, Theorem 2.47]. 
The dimension of A in a minimal realization (A, B, C) of f ( z ) is called the McMillan degree of f ( z ) and also of the sequence {fk} if f ( z )  ~ = 1 fk zk.
D e f i n i t i o n 3.4 ( M c M i l l a n d e g r e e )
C H A P T E R 3. LANCZOS A L G O R I T H M
118
Clearly if f ( z ) = N ( z ) / D ( z ) w i t h N ( z ) and D(z) coprime, then the McMillan degree of f is also equal to deg D. We can now give the Kronecker theorem.
Theorem 3.4 (Kronecker) The rank of the Hankel matrix H ( f ) is equal to the McMillan degree of f. P r o o f . The entries of the Hankel matrix are the coefficients fk in the asymptotic expansion
f(z)
f l z 1 q f2z 2 Jr f3z 3 Jr...
If f has McMillan degree a, then f has a coprime representation f ( z ) = g ( z ) / D ( z ) with degree of D(z) equal to c~. It then follows from equating the coefficients of the negative powers of z in the equality
f(z)D(z) = N(z) that there exists a nontrivial linear combination of the columns h0, h i , . . . , h~ in the Hankel matrix which gives zero. Thus the rank can not be larger t h a n a by Lemma 3.3. On the other hand, the rank can not be smaller than a, say a ~ < a because then
hoqo + hlql + ' " + h~,q~, = 0 would mean that there exists a relation of the form
f ( z ) D ' ( z ) = N'(z) or equivalently f ( z ) = N ' ( z ) / D ' ( z ) with D' and N ' polynomials and with the degree of D ~ less than a. This is impossible because f = N / D was irreducible. D We shall now describe how one can, at least in principle, find a minimal realization triple for some rational f(z). We consider the space 8 = F m+l and the Krylov matrices Xoo and Yoo whose columns are in 8. The space is called the state space for the realization triple (A, z0, Y0). On the other hand we have the space
Fr~ { a  (fo, f l , . . . ) ' f k
e F}
of one sided infinite sequences of elements in F where only a finite number of f~ are nonzero. We can see the infinite Hankel matrix H  H ( f )  Yoo H Xoo as an operator on the space F N. Some input sequence u C F N is mapped onto
3.5.
119
BREAKDOWN
some output sequence w E F ~ by first mapping it to a state s,," u ~ s,, = X o o u E 8, and then mapping this state to the output w" su ~
w 
yHsu.
The subspace ,~,  range Xoo is the part in the state space that can be reached by applying all possible inputs. The subspace S ~  ker y H will be mapped to zero by y H and hence can not be observed in the output. The subscripts r and o stand for r e a c h a b l e and o b s e r v a b l e respectively. When they have an upperbar, this stands for the negation of that. These terms originate from linear system theory and we refer to Chapter 6 for further explanation. The state space can obviously be decomposed as 8  range Xoo  ker X H  $r  S~ or as ,9 
range
Yoo
 ker y H _ ,9~  ,9~.
Recall that range Xoo ( r a n g e Y ~ ) i s the smallest Ainvariant (AHinvariant) subspace containing x0 (Y0). On the other hand, k e r X H (kerYH ) is the largest AHinvariant (Ainvariant) subspace which is orthogonal to x0 (y0). However, for our purposes, we want to split the state space as
a&174 where Sr  range Xoo and S ~  ker Yoo H as above, but where S~ and 8o are subspaces complementary to Sr and S~ respectively but they need not be the orthogonal complements. So we define them to be complementary, i.e. dimS~
m+l
dimS~
and S~N So 
{0}
 8r N 8~
and such that we maximize the dimensions of 8ro SonSr
and
S~5 ,~nS~.
By taking intersections, we can define four complementary spaces which subdivide the state space. We set
~r5 &o
We have obviously

Sr N , ~ 
range Xoo N ker y H
=
S r N So  rangeXcr N So

S~N&
=
&nSo.
$~NkerYH
C H A P T E R 3. LANCZOS A L G O R I T H M
120
L e m m a 3.5 With the definitions given above &
=
&
=
&
=
$
=
&o
P r o o f . We first note that these four spaces S,~, S,o, 8 ~ , and S~o are complementary because the spaces So and S~ as we1 as $~ and S~ are complementary. Therefore, they can only have the zero vector in common. To prove the first relation, assume that v, is an arbitrary vector from 8r. Since ,~o C S,, there is a unique way of writing v, = V,o + v' with Vro E S,o and v ' in the complementary space S, & $~o. Obviously v ~ E S,. Thus it can have no components along S~o or along $ ~ since these are both subsets of S~. This v ~ can not have a component along So because that component would be in Sro by definition, hence contained in V,o. Therefore v ~ is completely contained in S~. In other words, v ~ C S, N S~ = S,~. The next three relations follow for similar reasons and the last one is a direct consequence. El This construct is somewhat more complicated than the one in [200, p. 584] which uses the orthogonal complements. However the latter does not seem to be quite correct since each of the four spaces could have dimension zero, which can never give the complete state space of course. Also the proof of this result in [172, p. 23] is not completely correct. If we now choose a basis for these four subspaces, we can obtain a realization triple with a special structure, which is known as the Structure Theorem or the Canonical Decomposition Theorem. See for example [165,
p. 133]. T h e o r e m 3.6 ( S t r u c t u r e T h e o r e m ) Let (A, xo, yo) be a realization triple for f ( z )  y H ( z I  A)lXo. Let the subspaces ,,,era, Sro, S ~ and S~o be as described above. By choosing bases for these subspaces, then clearly, with respect to this basis, f ( z ) can be realized by an equivalent triple (A, ~0, y0) which has the form
,ill 2~ 
,ila iila ,il4
0
A22
0 0
0 0
0
~/24
A33 A34 0 A44


F 1AF
(3.16)
3.5.
121
BREAKDOWN
for some invertible matrix F and
L Xo 
0
O)
 Flxo
a n d ~1o 

(3.17)
FSyo.
o
P r o o f . This follows immediately from the fact that the columns of F represent the basis vectors of the subspaces. Using the invariant nature of these spaces, we get the zeros at the correct places. First, since Sr~  range Xoo N ker y H is Ainvariant as the intersection of two Ainvariant subspaces, we get the three zeros in the first column of A. Because also range Xoo  ,S,.  ,S,.6  S,.o is Ainvariant, we have the block of four zeros in the left lower corner. Furthermore, a vector in the third subspace S ~ Sv N S~ is part of ,.q~ ker y g which is Ainvariant and thus its image under A will be in the same space Sz = S ~  S ~ . This explains the two zeros in the third column of A. The format of the vectors :Co and Y0 can be seen as follows. The vector x0 is in range Xoo  ,S,.6  ,9,.~ Thus, it has no components in the complementary space, which gives the two b o t t o m zeros in x0. The vector y0 belongs to range Yoo _L ker y g _ 8~. Thus it can have no components along the subspaces in S~. Since both S ~ and S ~ are subspaces of S~, we get the two zeros as in Y0. [7 Note that some of the blocks in this theorem could be empty. We give an example. E x a m p l e 3.3 Assume
AThen Xoo
0
2
,
x0

[11 0
[11...] 0
0
Thus SrrangeXooSpan{
'
Y0 
[
.
[1] 0
}
11] 11 ] 1
1
"
1
..
[1]
and
By our definition of the complementary subspaces, we have 1
}
and
"
Sospan{
0
}"
C H A P T E R 3. LANCZOS A L G O R I T H M
122 Note that
s an [01 1
}
and
[ ]
9
S~  r a n g e Y o o  s p a n {
1
1 }"
Then we define the following subspaces by taking intersections 0
}'
S~ospan{
[,]~ },
s~~pan{
S~o{
0
}'
[0]
o }.
Note however that
S, NS~ Sr nS~  S~ NS~ S~ NS~  {0}. Taking the basis vectors of the decomposition
S = S~z  S~o  S ~  S~o as the colums of F: F
we find after applying the equivalence transformation on the triple (A, x0, yo) that 
1
0
A
0
2
"1
,
~0
,
y0
O
"
This is indeed as predicted by the Structure Theorem with empty rows and columns in the first and fourth position. The Structure Theorem leads to the following corollary. C o r o l l a r y 8.7 Let (A, x0, y0) be a realization for f ( z )  y H ( z I  A )  l x o and let (A, Z.o, ~1o) be an equivalent realization as described in the Structure Theorem. Then
f ( z )  y H ( z I  A)~Zo  [ ~ 0 2 ] n ( z I  fi~22)1~02 .
(3.~8)
Note that A22  P~o Al~o where A]~orepresents the restriction of the operator A to the subspace S~o and P~o the projection on this space. The size of the
3.5. B R E A K D O W N
123
matrix, 7t22, which is the same as dim &o, is the McMillan degree of f ( z ) . Thus dim &o  deg f ( z )  rank H. In other words, the second realization of f(z) is minimal. It also holds that the grade of ~2 with respect to ~t22 and the grade of f/2 with respect to ft H are both equal to the rank of H. P r o o f . That (3.18) is true can be simply verified by filling in the transformed quantities. That it is indeed the minimal size, follows from the following considerations. We have the following mappings.
The rank of H Y~H X ~ which is the dimension of its range will be given by the number of linearly independent vectors that can get through from left to right. Since all the vectors which are in the kernel of y H are annihilated by y H , we should look in the range of Xoo for the vectors that are not in the kernel of y H . These are precisely the vectors in S~o with dimension equal to the size of A22. Since by Kronecker's theorem, the rank of a Hankel matrix H ( f ) is equal to the McMillan degree of its symbol f ( z ) , it follows that there is no matrix smaller than A22 to represent f ( z ) . All the poles of f ( z ) are given as the zeros of the determinant d e t ( z I  A22). There is no cancellation of poles and zeros. The statement about the grades can be seen as follows. Let )~cr and Yoo be the grylov matrices that are generated by the pairs (/]22, Xo 2) and (/]H,~02) respectively, then H  l:H)(oo and therefore rank H is at most equal to these grades. On the other hand the grades can never be larger than the size of the matrix A22. Since the size of the latter is precisely equal to the rank of H, the assertion follows. D 
The ideal situation is when the Lanczos process can continue until the very end. That is when the following quantities are all the same 1. The McMillan degree of f ( z )  the rank of the Hankel matrix H ( f ) N;n+l
2. The grade of x0 with respect to A 9 n~ t 1 3. The grade of Y0 with respect to A H" n~ + 1 4. The size of the matrix A 9 m t 1. In that case, we have :ff(nlAXm  Tn because ]:HJ(rn  Dn, and thus ~ H _ Dn~gl. Therefore A and Tn are similar matrices and as such have
CHAPTER 3. LANCZOS ALGORITHM
124
the same eigenstructure. The characteristic polynomials of A and of Tn are the same and the principal vectors of A can be easily transformed into the corresponding ones for Tn and conversely. This is more than we can hope for in general. Kecall t h a t the inequalities n,,+l _< min{nx, ny} + 1 <_ m a x { n . , ny} + 1 _< m + 1 hold. Since the Lanczos algorithm will compute all the polynomials a0,k from the information in H(f) or, equivalently in f(z), as we have seen in the previous sections, one can not expect that more information about the eigenvalues of A can be found than what is contained in f(z). This means t h a t if at a certain step of the algorithm we reach the degree of f(z), thus when we extracted all the information stored in the Hankel m a t r i x H(f), we have an exact approximation of f(z) and thus everything t h a t can be found out about the eigenvalues of A, using f(z) only, should be known at t h a t instance. So what is this information? Well, the poles of f(z) are certainly eigenvalues of A, but the information about the different Jordan blocks t h a t go along with a multiple eigenvalue is entirely lost. The multiplicity of a pole can at most be equal to the size of the largest Jordan block t h a t is attached to this eigenvalue. This means that, by letting x0 have components along the (right) principal vectors that are associated with the largest Jordan blocks of that pole and similarly for y0, we shall be able to extract the largest possible multiplicity of that eigenvalue as a pole of f(z), if ever t h a t is really contained in f(z). If the pole has not this maximal multiplicity, then we can still find it with x0 and y0 chosen as described above. Let us have a look at this in some detail. We suppose without loss of generality t h a t A is already in its Jordan form. This can always be obtained by an equivalent realization. Thus A = diag(J0, J1,...,J I) where the Ji are the different Jordan blocks. Let x0 and Y0 be subdivided correspondingly : Xo T  [XoT , x T , . . . , x / T ] and yoT  [yoT , y l T , . . . , y T ] . Thus we can decompose
f(z)
as I
f(z) E y H ( z I 
Ji) 1 Xi.
i=0
The contribution of the ith term is obtained as follows (we drop temporarily
3.5. BREAKDOWN
125
the index i). First note that for a J of size k + 1, we get Z)~
z
(zI j)I
1
1 ,~
"'. 9 9
D
1
Z,~
( z  ~)~ ( z  ~)~ ... ( ~  ~)k~ ( ~ _ ~)_~ 9..
( ~ _ ~)~
( z  ~1~ ( z  ~)~[x + (~ ~)~z ~ + . . . + (~ ~)kZk] where, as before, Z is the down shift operator. Hence
yH(zI J )  I x
k

yHz jx
j=o
( z  ~)~[~o~o + 6~x +... + ~k~k] + ( ~  ~)216~o + ~ x + " " + ~k~k~] ...
+ ( z  ~)k~ [~k~0] where we have set x T   [ ~ o ,  . . , ~k] and y T _ [rio,... ' rlk]" In other words, if p3  yHZ:ix, then it holds that ~o ,fl ~2
~~ ~2 f:3
"" '
~k~ ~k 0
,~k 0 0
'~% ffl if2
9
~k
0
0
0
0

,,:,o ,ol p2
9
.
.
9
77%.
(3.19)
,ok
is an upper triangular Hankel system. From this expression for the ith term, we can see that the partial fraction decomposition of f(z) has the form x k(i) i=0 j=0
Pij
C H A P T E R 3. L A N C Z O S A L G O R I T H M
126
In this expression the sum for i should only range over all the different ),i, i.e., I is equal to the number of different poles. The Pij will be the sum of all the pj that were obtained above for the different Jordan blocks that were associated with $ = ,Xi. The k(i) is then the maximal k (the size of the maximal Jordan block minus 1) associated with one particular $i. Also observe that the order of the pole ),i can be at most k(i) + 1, but it may be smaller. The precise order is k ( i ) + 1 where
k ( i )  max{j 9 pij r 0}. The McMillan degree of f ( z ) i s ~ i e i ( k ( i ) + 1). This shows that the best possible choice for x0 and Y0 that one can dream of is the one which gives an f ( z ) whose denominator, after reduction, is equal to I
li(zi=0
with )~i, i = 0 , . . . , I all the different eigenvalues of A and k ( i ) + 1 the size of the corresponding maximal Jordan block. This is the minimal polynomial of A. That is the monic polynomial p of smallest possible degree for which p(A) = O. The geometric multiplicity of )q is lost and only part of the algebraic multiplicity can be recovered. Thus we see that the ideal situation will occur when the triple (A, z0, y0) is a minimal realization of f ( z ) . This implies that A can not have more than one Jordan block per eigenvalue, thus there is only one Jordan chain per eigenvalue, thus the geometric multiplicity of all eigenvalues is 1. It also implies that k(i)  k(i) for all i. Thus all the Pi,k(i) are different from zero. But for a minimal realization, there is more, we do not only get the maximal information about the eigenvalues, but we get also the maximal possible information about the left and right principal vectors. Indeed, we have the following classical result about minimal realizations. T h e o r e m 3.8 Let (A, zo, Yo) be a realization triple for f ( z ) , then it is a minimal realization if and only if the following three numbers are equal ~
The McMillan degree of f ( z ) = the rank of the Hankel matrix H ( f ) = ~n+l
2. The grade of zo with respect to A : n~ + 1 3. The grade of yo with respect to A H 9 n u + 1. They are all equal to the size of the matrix A : m + 1.
3.5. BRE,AKDOWN
127
P r o o f . If the triple is minimal, then the size of A should be equal to the rank of H(f) by definition. On the other hand, both n~ + 1 and ny + 1 are bounded from below by the rank of H(.f) and from above by the size of A. Hence the equality of all four follows. If the three numbers are the same, then the rank of H(f) is equal to the rank of X ~ and also equal to the rank of Yoo, while H  yHxoo. Therefore, both X ~ and Y~ must have full rank, which is equal to the number of columns, which is the size of A. Thus we find t h a t the size of A equals the rank of H(f) which is the size of the minimal one. [3 The previous theorem implies that for a minimal realization, the square matrices X,,, and Ym which consist of the first columns of Xoo and Y~,, are invertible matrices whose columns span an Ainvariant, respectively an AHinvariant subspace. This situation is equivalent with the fact t h a t S ~ m + l ) _ S~o, the other spaces reducing trivially to {:0}. Given an arbitrary matrix A, it will now be obvious t h a t it is not always possible to find starting vectors z0 and y0 which turn (A, z0, Y0) into a minimal realization triple for some f(z). The condition being t h a t A should have eigenvalues with geometric multiplicity 1. Now let us come back to the Lanczos algorithm and discuss the breakdown situations t h a t can occur. We should give an answer to the question: if we choose arbitrary starting vectors x0 and y0, and the Lanczos algorithm breaks down, how much have we discovered about the eigenstructure of A? Suppose we are at stage k of the Lanczos algorithm. For simplicity of notation we set as before v + 1 = '~k. At this stage we compute first ~0,k  x~,+l and y0,k  y~,+l and get matrices X,,+I [~}01""" Ix~+l] and ]>~+1 [Y01 "'" lYe+l]. and it holds t h a t range X~,+I  range X~,+I and range ]/~,+1  range Y~,+I. ^H Suppose t h a t in the next step we have a breakdown, i.e. y~,+li~,+l Thus it holds t h a t
y H 1 x ~ + I  D__~+I  d i a g ( D 0 , 0 , . . . , D k  l , k  1 , 0 ) . There are two possibilities. 1. k < n + 1 with '~n+l = r a n k H . In this case there is always some y~, with # = v + ak+l such t h a t H^
^H
,,
y~, a,,+l  Y~+I x~,  rk ~ 0
0.
CHAPTER
128
3. L A N C Z O S A L G O R I T H M
which follows from L e m m a 3.3. This ak+l  #  v < oo defines the size of the next block and the a l g o r i t h m can go on. In the t e r m i n o l o g y of Taylor [223] this is called a curable breakdown. 2. k  n + 1 with gn+l   r a n k H . In this case there does not exist a y~, such t h a t H
A
^H
y , z~+ 1  y~+ 1 z , :/: 0. This follows again from L e m m a 3.3. W i t h e~+l (v + 1)st unit vector, we now find g
^
Ycx~ X v + l
[0 . . . 0 1] T, the
H

Yoo X~,+I a0,n+l
=
Hoo.~,+ 1A__~+1 ev+ 1
=
J_R_R~,+Ie~,+l

 0.
rn+l
T h u s we see t h a t , in the t e r m i n o l o g y of Section 1.5, this c o r r e s p o n d s to an exact fit of the series f . T h e next block is infinite. T h e last case can be subdivided into two subcases which c o r r e s p o n d to success or failure of the a l g o r i t h m 9 an A or A g  i n v a r i a n t subspace is found or not. We consider the following two cases, called 2.A. and 2.B. below. 2.A. k  n + 1 and v + 1  '~,,+x  m i n { n ~ , nu} + 1. Suppose min{n~, nu}  n~. T h e o t h e r case is similar. We shall show t h a t z0,,,+l  x~,+l  0. Since v  n~, we now have z~,+l C 2'~,+1 = 2'~,. T h u s z~,+l can be w r i t t e n as a linear c o m b i n a t i o n of the columns in X~ 9 ~ + 1  X~a. F u r t h e r m o r e it holds t h a t Y H ~ , + I  0, thus gY2x

Since in this s i t u a t i o n A_~ simpler as

An and H~,,~, 
0.
H , , we can write this
A T H n a  O. Because b o t h An and H , are nonsingular, it follows t h a t a  0 and hence also d:~,+1  XL, a  O. On the o t h e r h a n d , x~,+l  ~o,n+1  a0,n+l (A)z0  0.
3.5.
129
BREAKDOWN
Multiplying with powers of A shows t h a t also a0,n+l(A)Akz0  a0,n+l(A)zk  O,
k  O, 1 , . . . , n~
where {xk, k = 0 , . . . , nx} spans the Ainvariant subspace X~. Thus ao,n+l(z) is the minimal polynomial of A restricted to this subspace. The converse is also true, if ~ = ao,n(A)xo = O, with a0,n monic of degree v, then x~, = A~zo can be written as a linear combination of the previous v vectors x 0 , . . . , Z~l. By L e m m a 3.1 this means t h a t we have an invariant subspace. We have the following theorem for case 2.A. T h e o r e m 8.9 In the Lanczos algorithm, case 2.A occurs i.e. we have a breakdown at stage n + 1 with v + 1 = rank H = ~n+l = min{n~, ny} + 1 = min{rankXoo, rankYoo} if and only if ~'v+l

"XO,n+l  
and/or
0
~h,+l  ~.1o,n+1  O.
Moreover if ~,+1  O, then X,,  range X~  range ~'~ is an A  i n v a r i a n t subspace and the spectrum of the Jacobi m a t r i x Tn  ~ H A J f v is the s a m e as the spectrum of A restricted to X~. It is given by the zeros of the p o l y n o m i a l
a0,n+l. Likewise, if ~lv+l  O, then Yv  range Y~  range Y~ is an A H  i n v a r i a n t 8ubspace and the spectrum of the Jacobi m a t r i x T H  ) ( H A H y ~ , is the s a m e as the spectrum of A H restricted to y~,. It is given by the zeros of the p o l y n o m i a l ao,n+ l . P r o o f . We already proved the first part and the fact t h a t range X~ is Ainvariant if ~.+1 = 0. T h a t a0,n+l is the characteristic polynomial of Tn was shown in Corollary 2.4. The rest follows from zI
Tn

z I  D~, 1DnT,,
=
zID~I~'HAS(~.
Now the m a t r i x A  D ~ I y H A ] ( ~ is the restriction of A to the space range X~,  r a n g e r ' s . And this proves the case where z~+l  0. The remaining case is similar. [:3 The case 2.A is the one t h a t occurred in Example 3.2. We can continue the computations for one more step to get.
C H A P T E R 3. L A N C Z O S A L G O R I T H M
130
E x a m p l e 3.4 ( E x . 3.2 c o n t i n u e d ) The next step in the previous example gives H d9  Yo,2X0,2  ~2   2 . Using this value together with ds = rl = 1 gives f13 =  r 2 / ~ 1 = 2. The next value we compute is dlo  ~0H,2A~o,2  2. Thus, with D2,2 = ~2 =  2 and d_2 = dlo, we can solve Dl,la_ 2 = d_ 2 to find a2(z) = z + 1. Therefore &0,3  (A + I)&o,2 +
2~0,1 [0... 0] T
00,3  (A H + I)y0,2 + 200,1
[0... 0]T.
Since both xo,3 and yo,3 are zero, we have in this case reached an Ainvariant as well as an AHinvariant subspace.
The second case is the following. 2.B. k = n + 1 and v + 1 = '~,~+1 < m i n { n , , n u } + 1. This is the problematic situation. In the terminology of Taylor [223] this is called an incurable breakdown. However, not everything is hopeless in this situation. Taylor proved in his thesis [223] t h a t when the algorithm has an incurable breakdown, the eigenvalues of the block tridiagonal Tn will be eigenvalues of A and that, ao,n+l(Z) is still a minimal polynomial of A restricted to some invariant subspace. The characterization of this subspace is a tedious m a t t e r in general. We prove these claims in the next theorem, which Taylor called the Mismatch Theorem. T h e o r e m 3.10 ( M i s m a t c h T h e o r e m ) If incurable breakdown occurs in the Lanczos algorithm, at step v + 1 = an+l, then ao,n+l is the characteristic polynomial of the block tridiagonal matrix T,~ of Theorem 2.3 and at the same time it is a minimal polynomial of A restricted to some invariant subspace. P r o o f . The first statement was proved in Corollary 2.4. The second statement will be proved if we can find some x~ and y~ which generate the same entries in the Hankel matrices, i.e. for which  (yg)'Ak
g  yo"Ak
o for k  O, 1 , 2 , . . .
3.5. BREAKDOWN
131
and such that tcn+l  min{n,,, nu, } + 1. Indeed, we are then back in case 2.A for which the result has been proved. Suppose that the partial fraction decomposition of f(z) is given by i kCi)
P~j
/ ( z )  ~ ~ ( z  ~,)J+~ i=O j=O
where )q; i = 0, 1 , . . . , I are all the different poles of f(z) and the multiplicity of )q is k(i)+ 1. We shall construct some starting vectors that will satisfy the previously mentioned requirements. We do this as we did earlier, using the Jordan block form of A. Let F be the transformation which brings A into the Jordan form F  1 A F  A. Suppose we ordered the Jordan blocks as follows. The first I + 1 Jordan blocks are the largest Jordan blocks that correspond to eigenvalues of A that are effective poles of f(z). Suppose we denote them as J i ; i  0 , . . . , I with Ji of size k(i)+ 1 with k(i) >_ k(i). The McMillan degree of f(z) is thus ~ i e i ( k ( i ) + 1) and this is equal to the rank of the Hankel matrix H ( f ) which is an+l. Now we choose the components of ~ and ~ which are not related to these largest Jordan blocks equal to zero and subdivide ~ and ~ as
90'  [ ~ 0 ~ .
~T . .0 . 0] .9 .~ a .
.~  . [y0 . ~ ..~T 0
0] ~,
then we may write I
[0~]U(zi

A)I 9,o
_
E~(z~J,)l~,. iO
Now if this function has to be the same as f ( z ) , we should require that each x  xi and y  Yi whose components we denote as x T
[~0,...,~k] and y T _
[~0,..,rlk]
should satisfy a system like (3.19) where the Pij for j > k  k(i) are supposed to be zero. This system can always be solved by choosing ~i and rli equal to zero for k < i <_ k. Because Pi,~ ~ 0, it follows that both ~ and 770 are nonzero. There are at most
~ ( ~ ( i ) + 1)

~.+~
nonzero elements in ~ . This ~ is invariant under A, by which we mean that . , t ~ cart have only nonzero elements at the same positions where ~
132
CHAPTER
3. L A N C Z O S
ALGORITHM
could have them. This means that the grade of x"~0 with respect to A can never be larger than this number of nonzero elements, which is an+l. On the other hand the grade can not be smaller because the grade should be larger than the rank of the Hankel matrix, which is an+l. Thus the grade is precisely equal to an+l. This property is maintained when we undo the transformation by F. Thus if we set I x 0
_F~
I Y0
and
_
F  H  IY0,
then we still have a realization triple (A,x'o,y~) for f ( z ) and the grade of x~ with respect to A is precisely '~n+l. Thus we are back in case 2.A. o In the proof we have chosen a vector x~ which has the minimal grade. It is also possible, using the Jordan form for A H to construct y~ having the minimal grade. In general however, it is not always possible to make them both have the minimal grade as the following counter example illustrates. E x a m p l e 3.5 Let the triple (A, x0, y0) be given by
(A, xo, yo) 
([11][1][1]) 0
1
'
0
'
0
The function f(z) is obviously equal to ( z  1)1, which has McMillan degree equal to 1. The only vector that has grade 1 with respect to A is a multiple of x~  [1 0] T while the only vector which has grade 1 with respect to A T is a multiple of y~  [0 1]T. However y ~ H ( z I  A)lX~o  O, so that it becomes impossible to give both x~ and y~ the minimal grade and still realize the required function. <> The construction of x~ and y~ as in the previous Theorem is also found in Gutknecht [125] and [124]. It is also possible in all cases to reconstruct eigenvectors and principal vectors of A from those of Tn. We shah not pursue this any further at this place.
3.6
N o t e of warning
In this chapter we have only elucidated an elementary link between the polynomials generated in the Euclidean algorithm and the orthogonalization procedure that is performed in the process of the Lanczos algorithm for arbitrary matrices. It was shown that in principle the Euclidean algorithm,
3.6. N O T E OF W A R N I N G
133
when properly translated in matrixvector terminology, is a very natural formulation of the lookahead version of the Lanczos algorithm. In the next chapter, we shall explain that these polynomials are formal (block) orthogonal polynomials and that the orthogonalization is a fast GramSchmidt procedure for the case of a Gram matrix with a Hankel structure. The link with system theory and rational approximation allowed to explain the precise mechanism of breakdown. However our only objective was to give a brief introduction to these connections and there is no numerical ambition whatsoever. For the practical numerical implementation of the Lanczos algorithm and the determination of the size of a lookahead step (i.e., of the Euclidean index) appropriate condition numbers should be estimated. In numerical computations, exact breakdown will almost never occur, but instead very ill conditioned steps may be performed, which in practical computations can be desastrous. The strategy is then to perform lookahead steps, even if they are theoretically not necessary, but to avoid the ill conditioned computations, such steps are performed not only to step from a nonsingular situation to the next one, but to step from one well conditioned situation to the next one. The practical implementation is usually not simple, but the basic principles remain the same. Another important numerical problem arising from the implementation is that the theoretical (bi)orthogonality of the computed bases for the Krylov spaces which has been assumed here will deteriorate during numerical computations. A remedy for this is to restart the procedure after a while or to perform reorthogonalization steps etc. For much more information on these numerical aspects, we refer to the vast literature that appeared lately on these topics [4, 9, 195, 197, 213] and many papers, too many even to give here yet a partial selection. In Chapter 7, we give some ideas about lookahead type strategies in the context of general rational interpolation problems. Because of the previous connections, such strategies can also be applied to the Lanczos or block Lanczos algorithm.