Multiscale Statistical Signal Processing and Random Fields on Homogeneous Trees

Multiscale Statistical Signal Processing and Random Fields on Homogeneous Trees

SIGNAL MODELLING AND ADAPTIVE ARRA YS Copyright © IFAC Identification and System Parameter Estimation, Budapest, Hungary 1991 MUL TISCALE STATISTICA...

1MB Sizes 1 Downloads 83 Views

SIGNAL MODELLING AND ADAPTIVE ARRA YS

Copyright © IFAC Identification and System Parameter Estimation, Budapest, Hungary 1991

MUL TISCALE STATISTICAL SIGN AL PROCESSING AND RANDOM FIELDS ON HOMOGENEOUS TREES A. Benveniste*, M. Basseville*, A. S. WiIIsky**, K. C. Chou** and R. Nikoukhah*** *IRISA. Campus de Beaulieu. 35042 Rennes Cedex. France **UDS. MIT. Cambridge. MA 02139. USA ***INRIA. Rocquencourt. BP 105. 78153 Le Chesnay Cedex. France

Abstrac t

'IjJ(x) = Lg(n)4> (2x - n)

In many applicati ons, it is of interest to analyze and recognize phenome na occurring at different scales. The recently introduce d wavelet transforms provide a time-and -scale decompo sition of signals that offers the possibilit y of such an analysis. Until recently, however, there has been no correspon ding statistica l framewor k to support the developm ent of optimal, multiscal e statistica l signal processin g algorithm s. In this paper we investigate some of the fundame ntal issues that are relevant to this topic.

1

where g(n) and h(n) form a conjugate mirror filter pair [l1J, and that

fm+l(X) = fm(x)

fm(x) =

L

'IjJ(x) =

and

h(n) = {

~

n =0 otherwi se

(1.6)

(1.7)

(1.8)

and {2m/2'IjJ(2mx - n)} is the Haar ba3i3. From the precedin g remarks we see that we have a dynamical relations hip between the coefficients f(rn,n) at one scale and those at the next scale. Indeed this relationship defines a lattice on the points (rn, n), where (rn+l, k) is connecte d to (m,n) if f(m,n) influences f(rn + l,k). In particul ar the Haar represen tation naturall y defines a dyadic tree structur e on the points (rn,n) in which each point has two descend ents correspo nding to the two subm divisions of the support interval of 4>(2 x - n), namely - 2n - 1). This x 4>(2(m+l) and 2n) x 4>(2(m+l) of those observat ion provides the motivati on for the developm ent of models for stochast ic processe s on dyadic trees and associated system theory as the basis for a statistic al theory of multi resolutio n stochast ic processes.

(1.1)

(1.2)

For this equation to define a wavelet, h( n) must be the impulse response of a quadratu re mirror filter [7, 11J. The simplest example of such a 4>, h pair is the Haar approximation with 1 O:Sx< 1 4>(x) = { 0 otherwi se

I 0 < x < 1/2 -1 1/2:S x < 1 { o otherwi se

n=O g(n) = { -1 n = 1 o otherwi se

As m -+ 00 the approxim ation consists of a sum of many highly compres sed, weighted , and shifted versions of the function 4>( x) whose choice is far from arbitrary . In particular in order for the (m + 1 )st approxim ation to be a refineme nt of the mth, we require that 4>(x) be exactly represen table at the next scale:

4>(x) = Lh(n)4> (2x - n)

n)

fm(x) is simply the partial orthono rmal expansio n of f(x), up to scale m, with respect to the basis defined by 'IjJ. For example if 4> and h are as in eq. (1.3), eq. (1.4), then

The multi-sc ale represen tation [9, 10J of a continuo us signal f( x) consists of a sequence of approxim ations of that signal at finer and finer scales where the approxim ation of f(x) at the mth scale is given by

m f(m,n)4>(2 x - n)

+ Ld(m,n) 'IjJ(2mx n

Mult iscale Repr esent ation s and Wave let Trans form s

+00

(1.5)

n

Dyna mic Mode ls on Dyad ic Trees

2

In this section we discuss linear models of random processes on the dyadic tree. Our first objectiv e is to introduce models that can be 3pecified by finitely many parameter3 in order to provide associat ed effective algorithm s. Another relevant feature of these models is their possible connecti ons with useful statistic al notions on signals such as "statistic al time invarian ce" I or "statisti cal selfsimilarit y". These notions should apply to the ordinary signals we get by consider ing the restricti on of a process on the tree to a given level of resolutio n as depicted on

(1.3)

(1.4)

Multisca le represen tations are closely related to wavelet transform s. Such a transfor m is based on a single function 'IjJ( x) that has the property that the full set of its scaled translate s {2m/2'IjJ(2mx - n)} form a complet e orthonorm al basis for L2. In [7J it is shown that 4> and 'IjJ are related via an equation of the form

the figure 1. prefer to avoid using here the word "stationary" which will be used with a different meaning in the sequel

1 we

405

to finer scales

to coarser

translational shift Figure 1: Level-by-level drawing of the dyadic tree

2

There are basically two different ways to specify such models of processes on the tree, namely by

T 3 ,t

des, t)

Both approaches will be sketchily presented in this paper, and we discuss associated basic issues next. On the other hand, the theory of ordinary time series suggests that both approaches require an infinite tree be used as an index set. Unfortunately, the figure 1 gives little insight on how the considered tree could be made infinite toward the top. In contrast, the figure 2 provides the adequate solution: just take T as being the infinite acyclic graph such that every node has 3 branches to other nodes! This will be the proper definition of T we shall use in the sequel, and both drawings of figures 1 and 2 will be used for convenience.

#{branches linking s to t}

(d(s,t) is thus a distance on T). Before stating the corresponding theorem, let us briefly discuss the corresponding theorem for ordinary time series, namely Bochner's theorem. Bochner's theorem states that a sequence rn,n = 0,1, ... is the covariance function of a stationary time series if and only if there exists a nonnegative, symmetric spectral measure S( dw) so that

.2..j~ eiwnS(dw)

...

1

=

rt t

[ •. ']i.i=I .... .l

_~

2IT

~ IT

r cos(wn)S(dw)

Jo

If we perform the change of variables x = cos wand note that cos (n w) Cn(cos w), where Cn(x) is the nth Chebychev polynomial, we have

As we discussed previously, this requires a characterization of which doubly indexed functions can be the covariance function of some stochastic process on T. Assume {r •• .}( •• t)ETxT be the covariance function of a process (yd. Then r •. t must satisfy the following: select an arbitrary finite family {t;}i=I .... .l of nodes, we have Yt,

(2. 10)

rd( •• t)

Specifying via covariance function

[



Criteria for positive definiteness are not available in general, but have been provided by Arnaud, Dunau, and Letac in the area of harmonic analysis on combinatorial structures [1] in the particular case of isotropic covariance functions of the following form:

• specifying a linear dynamical system on the tree ; the associated process is obtained by considering t.he output of this system when white noise is provided as input. This amounts to study linear system theory on the dyadic tree.

cov

o

~igu~e 2:. The dyadic tree redrawn as a "fractal" object, m thIck IS shown a subtree which is obviously isometric to the level-by-level tree

• specifying the process on the tree directly via its covariance function. This amounts to characterize such {r •• ,}( •• t)ETxT that can be covariance functions of stochastic processes on the tree (a Bochner-like theorem), and then to find finite parametrizations for these covariance functions.

2.1

successive horocycles:

rn =

[I Cn(x)/-l(dx)

(2.11)

where /-l(dx) is a nonnegative measure on [-1,1] (also referred to as the spectral measure) given by 1

(2 .9)

2'

/-l(dx) = - (1- x t'S(dw)

(2.12)

IT

Yt!

For example, for the white noise sequence with rn = lino,

where [ai.iki=I .... .l denotes the square matrix with ai.i as the (i, j)-th entry. From (2.9) follows that the right handside of this equation must be a nonnegative definite matrix for any finite collection of nodes ti : this property of r will be referred to as positive definiteness in the sequel.

1 , /-l( dx) = - (1 - x 2 ) - , IT

406

(2.13)

The analogous theorem for isotropic processes on dyadic trees requires the introduction of the Dunau polynomials

Boundary points. An important concept here is the notion of a boundary point [1, 3J of a tree. Consider the set of infinite sequences of T where any such sequence consists of a sequence of distinct nodes t\, t 2 , ••• where

[1,4J: Po(x) = 1 , PI(x) = x 2 x Pn(x) = 3"Pn+l (x)

+

1 3"Pn- l (x)

(2.14)

d(t" t,+I) = 1. A boundary point is an equivalence class of such sequences where two sequences are equivalent if they differ by a finite number of nodes. Let us choose one boundary point which we denote by -00.

(2.15)

Theorem 1 (generalized Bochner's theorem) [IJ: A sequence rn, n = 0,1,2, ... is the covariance function of an isotropic process on a dyadic tree if and only if there exists a nonnegative measure f1. on [-1, IJ so that rn

= LI P n (x)J1(dx)

Isometries. The following classification of isometries may be found in [4, 12], see [12J for a proof and related lemmas on the geometry of the homogeneous tree :

(2.16)

Lemma 1 (classification of isometries) Given an isometry f of the homogeneous tree T , three cases are possible, namely:

The simplest isotropic process on the tree is again white noise, i.e. a collection of uncorrelated random variables indexed by T , with rn = 8no , and the spectral measure f1. in (2.16) in this case is [8J

1 f1.(dx) = 21T X

[_¥ .¥]

(8 - 9x2)t (x) l-x 2 dx

3s E T 3s, t E T 3(Sn)nEZ ET, 3i > 0

(2.17)

where XA (x) is the characteristic function of the set A. A key point here is that the support of this spectral measure is smaller than the interval [-1, IJ . This appears to be a consequence of the large size of the boundary of the tree, which also leads to the existence of a far larger class of singular processes than one finds for time series. While this theorem provides a necessary and sufficient condition for a sequence r n to be the covariance of an isotropic process on T, it does not provide any way to derive finite parametrizations of them : this will be the purpose of section 3.

2.2

f(s) = s d(s,t)

= 1 and

f(s)

For obvious reasons, isomet ries of the third type associated with paths originating from -00 will be called translations. Horocycles. Being at the "same distance" of -00 is an equivalence relation on nodes. The corresponding equivalence classes are the horocycles, horocycles are shown in the figure 2. These equivalence classes are best visualized as in Figure 1 by redrawing the tree, in essence by picking the tree up at -00 and letting the tree "hang" from this boundary point. In this case the horocycles appear as points on the same horizontal level and s ::S t means that s lies on a horizontal level above or at the level of t. Note that in this way we make explicit the dyadic structure of the tree. With regard to multiscale signal representations, a shift on the tree toward -00 corresponds to a shift from a finer to a coarser scale and points on the same horocycle correspond to the points at different translational shifts in the signal representation at a single scale.

Specifying via dynamical systems

Here we put emphazis on systems while processes will follow subsequently, as outputs of systems driven by white noise. A full account of system theory on T is presented in [12J : in this latter paper the general issue of how linear operators acting on signals can be encoded via finitely many parameters (i.e. the issue of rationality) is investigated. We refer the interested reader to [12J for this topic. Since our main purpose here is to discuss models for stochastic processes, we shall rather concentrate on the following issues: 1. defining proper notions of stationarity on T for both linear systems and stochastic processes with the desirable feature that the output of a stationary system driven by a stationary process should itself be a stationary process;

2. introducing rational systems, i.e. stationary systems that can be encoded via finitely many parameters. We shall discuss point 1 next, while point 2 will be briefly investigated in section 4. 2_2.1

= t,j(t) = s

d(Sn,Sn+l) = 1, f(sn) = sn+,

Some results on the geometry of the tree.

Since "stationari ty" generally refers to invariance via translations (cf. the case of 1 - D and 2 - D time series) our primary objective is to investigate the notion of translation on T.

Figure 3: Primitive translations

407

Translations and primitive translations. Translations certainly are the isometries of the third class according to the classification of lemma 1. However, for the sequel, we shall need primitive tran3lations encoding "moving away from -00", i.e. the counterpart of the shift operator z on Z. These are defined as follows:

Theorem 3 1. If the process y satisfies (2.21) for any primitive translation T, then it must be of the form E (y,Yt) == r[d(s, sAt), d(t, sAt)]

ft . Conversely if the covariance /unction of y satisfies (2.22) then y is stationary, i.e. satisfies (2.21) for any translation .

1. select an infinite path (tn)nEZ originating from -00, call it the skeleton of the primitive translation,

9. If the process u and the transfer /unction H are both stationary, so is the process H u .

2. denote by Sn the unique point outside the skeleton such that d(sn, t n ) == 1

Note that the last statement is an imediate consequence of the former ones.

3. denote by T.! the semi infinite dyadic tree with root Sn composed of the semi infinite paths originating at Sn and moving away from -00

COMMENTS: Arnaud-Dunau-Letac isotropic processes are stationary in the above sense. Also from (2.22) follows that the distribution of the restriction {ydtETo of a stationary process y to a given horocycle To does not depend upon To : stationarity on the tree is a convenient mathematical formulation of statistical self-similarity (or statistical fractalness). Finally, it is worth noticing that no other relevant notion of stationarity can be properly defined on the tree, in particular no model class is available which be both amenable of a convenient mathematical treatment and be connected to the usual notion of time invariance for time series. Last but not least , Theorem 3 has the following interesting result as a consequence. Collect the data at a given level of resolution of a finite subtree (as in the figure 1) into a vector Y. Then the covariance matrix ~y of Y has the following recursively defined struct ure:

4. then the primitive translation with skeleton (tn) is the unique isometry T such that (cf. Figure 3)

(2 .18)

2.2.2

Stationary systems and stochastic processes

The following results are proved in [12]. Given a translation T of T, by abuse of notation, we also denote by Tits action on signals defined by T(Y)t

==

Yr(t)

Definition 1 (stationary systems) A linear operator H acting on 3ignal3 is said to be a stationary system if 2 HOT==TOH

for any tran3lation

(2.22)

(2.19)

T.

The following fundamental result is proved in [12] : Theorem 2 1. If the linear operator H satisfies (2.19) for any primitive translation T, then it can be written a3 follows

where Um is a 2m x 2m-matrix whose entries are 1. It is then easy to show that the eigenvectors of ~y are the discrete Haar basis, cf. [2, 5] for more details.O

(Hu)t h[d(t,sAt),d(sAt,s)]

(2.20)

3

Isotropic Processes

where sAt is depicted in the figure 1. ft. Conver3ely, any H of the form (2 .20) is stationary, i. e. satisfies (ft.19) for any translation T.

Definition 2 (stationary stochastic processes) A zero mean (scalar) 3tocha.,tic process Y is said to be stationary if its covariance /unction is translation-invariant, I. e. (2.21) for any translation

T.

In this section we briefiy review how isotropic processes may be finitely parametrized and how properties of processes may be checked on their associated parametrization. The interested reader is referred to [2] for details. Models for Isotropic Processes on Trees : a first guess. As for time series it is of considerable interest to develop white-noise-driven models for processes on trees. The most general input-output form for such a model is simply Yt ==

L

Ct" W,

(3 .23)

'ET

where W t is a white noise process with unit variance. In general the output of this system is not isotropic and it is of interest to find models that do produce isotropic processes. One class introduced in (1] has the form

The following theorem shows that this definition of stationarity for processes is consistent with that of stationarity for transfer functions:

Yt = 20

L

sET

denotes the composition of maps.

408

Cd("t) W,

(3.24 )

It is readily verified that such a process is isotropic, see [2J. The class of systems of the form of (3.24) are the generalization of the class of zero-phase LTI systems (i .e. systems with impulse responses of the form h( t, s) = h( Its!)). An alternative guess would be the proper version of AR processes in our case, namely

y,

=

L.00(.

a".y.

+ uW,

(3.25)

d(.,t)$p

where s :::5 t means that s lies on the same level as or above t (i .e. at the same scale or at a coarser one) . However, even if we take a, .• to be "shift invariant" in some sense, the form (3.25) involves O((2 P/ 2 ) many parameters for AR(p) processes, while p + 1 degrees of freedom are indeed available, namely the p + 1 first covariances. Thus unlike in t he case of standard time series, another parametrization is needed. Figure 4: Information space of order 3 (0), its boundary ( {O, <> }) and the isometries leaving the set of 0 globally invariant.

Back to ordinary time series, and deriving a SchurLevinson parametrization of isotropic AR processes on trees. Consider now an ordinary time serie Xk and introduce the spaces Xk,n = 7i{Xk' .. . ,Xk_n}. Forward and backward prediction errors or "residuals" are defined as ek,n = xk-E{xkIXk_1,n_d andh,n = Xk-n-E{Xk-nIXk,n-d respectively. The formulae

1. Characterization of AR processes: an isotropic process is AR(n) if and only if its reflection coeffi-

cients of order

Xk - E{XkIXk_l,n}

> n are all zero.

2. Schur criterion: if the sequence (rn) is the covariance function of an isotropic process, t hen the Schur recursions must yield reflection coefficients satisfying the inequalities

Xk - E{XkIXk-l,n-d +E{XkIXk-l,n-d - E{XkIXk_l,n} ek,n - E{XkIXk_l ,n 8 Xk-1,n-d} ek,n - E{ek,nlfk-l,n}

(3.26)

show that the key to the Levinson recursion is the computation of the prediction of the forward residual ek,n given the backward one fk-l,n' Similarly, the prediction error of the backward residual given the forward one is needed for updating the backward residual fk,n' It is a remarkable property of time series that both prediction operators are identical: this is due to the invariance of the distribution of any time serie under a symmetry k -> -k. The correlation coefficient of the two involved residuals is also known as the PARCOR coefficient of Xk and Xk_n given Xk-1,n-l' This is illustrated in the following diagram: X"

X"_l,n_1

Xk_n



00000



3. Parametrizing AR processes conversely, a finite family of coefficients satisfying the above strict inequalities (3.26) defines a unique AR process. 4. Regular and singular processes : If the sequence (k n ) satisfies the strict inequalities (3.26) and furthermore the condition 00

L

k~n+l

+ Ik2nl < 00

n=l

holds true, then it is the reflection coefficient sequence of a regular (i.e. purely nondeterministic) isotropic process.

The corresponding patterns are shown in the figure 4, with corresponding isometries that leave the set of 0 globally invariant and exchange the {O, <> }. As the reader may guess, the key to Schur-Levinson recursions are all PARCOR coefficients involving an arbitrary pair {O, <>} given the space spaned by the 0 in the figure 4. The key consequence of t his fact is that all pairs {O, <>} possess the same PARCOR coefficients given the space spaned by the circles. Hence, as for time series, a single PARCOR or reflection coefficient will be involved in eac h stage of the Levinson recursions. Details may be found in [2J. We also summarize the kind of result and characterization of isotropic processes we may get using these PARCOR or reflection coefficients (kn )n>O :

4

Systems Theory and Estimation for Stationary Processes

From formula (2.20) of theorem 2 follws easily the following coding of stationary systems:

H

=

L i,j~O

409

Si,j -r-yj

(4.27)

where 'Y is a "backward" shift towards -00 whereas, is a "forward-and-average" shift (the "Haar smoother" , adjoint of the former one). These operators are depicted in the figure 5: A system theory for such formal power series is presented in [12J : rational systems are introduced and properly parametrized, state space forms are developped, and a spectral calculus is introduced. Finally, the interested reader is referred to [6J for smoothing and filtering algorithms for a restricted subclass of these models.

Publications du Laboratoire de Statistique et Probabilites, Univ. Paul Sabatier, Toulouse, no. 02-83, June 1983. [9J S.G. MALLAT, "A theory for multiresolution signal decomposition: the wavelet representation," Dept. of Computer and Info. Science- U. of Penn., MS-CIS87- 22, GRASP LAB 103, May 1987. [10J S .G. MALLAT, "Multiresolution approximation and wavelets," Dept. of Computer and Info. Science-U. of Penn., MS-CIS-87-87, GRASP LAB 80, Sept . 1987. [11J M.J. SMITH AND T.P. BARNWELL , "Exact reconstruction techniques for tree-structured subband coders," IEEE Trans. on ASSP, vo!. 34 , pp. 434-441, 1986. [12J A. BENVENISTE , R . NIKOUKHAH, A.S . WILLSKY, "MuItiscale system theory", CDC '90, Hawai, 1990.

Figure 5: The operators 'Y and ,.

References [lJ J.P. ARNAUD, G. LETAC, "La formule de representation spectrale d'un processus gaussien stationnaire sur un arbre homogime," Publi. Labo. Stat. and Proba., UA 745, Toulouse. [2J M. BASSEVILLE , A. BENVENISTE , AND A.S . WILLSKY, "Multi scale Autoregressive Processes", IRISA and LIDS (MIT) report, submitted for publication. [3J P. CARTIER, "Harmonic analysis on trees," Proc. Sympos. Pure Math ., Vo!. 26, Amer. Math. Soc., Providence, pp. 419-424, 1974. [4J P . CARTIER, "Geometrie et analyse sur les arbres". Seminaire Bourbaki, n407, Feb. 1972. [5J K.C. CHOU, A .S . WILLSKY, A. BENVENISTE, AND M . BASSEVILLE, "Recursive and Iterative Estimation Algorithms for Multi-Resolution Stochastic Processes," Proc. of the IEEE Conf. on Decision and Control, Dec. 1989. [6J K. CHOU, A Stochastic Modeling Approach to Multiscale Signal Processing, MIT, Dept. of Electrical Engineering and Computer Science, Ph.D . Thesis, 1990 (in preparation). [7J 1. DAUBECHIES, "Orthonormal bases of compactly supported wavelets," Communications on Pure and Applied Math., vo!. 91, pp. 909- 996, 1988. [8J J .L. DUNAU, "Etude d 'une classe de marches aleatoires sur l'arbre homogene," in Arbres homogenes et Couples de Gel/and, J.P. Arnaud, J.L. Dunau, G. Letac.

410