Finding multiple abrupt change points

Finding multiple abrupt change points

COMPUTATIONAL STATISTICS &DATAANALYSIS ELSEVIER Computational Statistics & Data Analysis 22 (1996)481 504 Finding multiple abrupt change points J.H...

1MB Sizes 29 Downloads 42 Views

COMPUTATIONAL STATISTICS &DATAANALYSIS ELSEVIER

Computational Statistics & Data Analysis 22 (1996)481 504

Finding multiple abrupt change points J.H. V e n t e r a'*, S.J. Steel b "Department of Statistics, Potchefstroom University, Private Bag X6001, Potchefstroom 2520, South Africa bDepartment of Statistics, Stellenbosch University, Stellenboseh, South Africa

Received March 1995;revised November 1995

Abstract The problem of identifying multiple abrupt change points in a sequence of observations is approached via hypothesis testing. The null hypothesis of no change points is tested and if it is rejected the number of change points present are indicated by the procedures introduced here. These procedures allow an arbitrary number of change points and patterns of changes. Apart from being computationally intensive, they are easy to apply and interpret and do not require a great deal of user input. Keywords: Change points; Hypothesis testing; Non-parametric tests; Computer intensive methods

1. Introduction Let X 1 , X 2 , . . . , X n be an o b s e r v e d sequence of n i n d e p e n d e n t r a n d o m variables with m e a n s #~ = E X i , i = 1, . . . , n and c o m m o n variance a 2 = Var(X~). T h e simplest m o d e l for the /~{s p o s t u l a t e s t h a t they are all the same. W e shall refer to the h y p o t h e s i s t h a t this m o d e l is true as Ho, i.e. Ho: #i = Ot

for i = 1, . . . , n

(1.1)

for some ( u n k n o w n ) 0 1 . W h e n the o b s e r v a t i o n s are p r o d u c e d in time (or in some o t h e r o r d e r e d fashion) a n d factors influencing the o b s e r v a t i o n s m a y c h a n g e abruptly, the m o d e l specified b y Ho m a y n o t be a p p r o p r i a t e a n d m a y need to be

*Corresponding author. 0167-9473/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved P I I S0 1 6 7 - 9 4 7 3 ( 9 6 ) 0 0 0 0 7 - 2

482

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

replaced by a change points model. The m change points model postulates that for some (generally unknown) integers rl, ..., r,, with 1 < rl < "'" < r,, < n we have ~ti=0j

ifrj_l
.... ,m+l,

(1.2)

where 01 . . . . ,0,,+1 are distinct and where we define T0 = 0 and ~,,+1 = n for notational convenience. This means that the/~i's are equal within m + 1 consecutive groups of indices, the jth group beginning at index r j_ ~ + 1 and ending at vj, j = 1, ..., m + 1. Hence, in this parameterization, the vj's are the 9roup end points while -cj_ ~ + 1 are the 9roup begin points. Notice also that the m change points model formally reduces to the Ho model given by (1.1) if we take m = 0. The aim of this paper is to study tests for Ho against the alternative of some m change points model. Of course, the tests must have a prespecified level of significance, i.e. if Ho is in fact true, the probability of rejecting it must be e, say. More importantly, we wish to structure the test such that if Ho is rejected, the m change points model which caused rejection must also be indicated. In addition, estimators of the relevant parameters must be supplied. Many papers on change point problems have appeared in recent years. Most of them are concerned with "at most one change (AMOC)" alternatives. A review of such procedures is given by Hugkovfi and Sen (1989) and further references can be found in, for example, Hugkov~ (1994). By contrast much less have appeared on procedures dealing with multiple change points models. A useful recent reference on contributions dealing with the hypothesis testing approach is that of Antoch and Hugkov/t (1994) where further references can be found. Contributions not mentioned there were made by Haccou and Meelis (1988) and Meelis et al. (1991), who deal with at most three change points and Gombay (1994), who deal with two change points and a special change pattern. The current state of affairs on the multiple change points problem is not satisfactory in the sense that there seems to be no procedures available for selecting the number of change points that can be routinely applied in practice and routinely be relied on to perform well without substantial user input and judgement. Our aim in this paper is to contribute procedures that meet this goal at least in the case of abrupt change points. In Section 2 we review the maximum likelihood estimators of the parameters of an m change points model in the normal case and in Section 3 we construct a test of Ho in the normal case which produces a value for the number of change points if rio is rejected. This test operates entirely differently from any existing proposal in the literature: the computation of the relevant statistics is numerically intensive but the interpretation of the results is straightforward and do not require tables of critical values or large sample approximations. In Section 4 we show that the normal test is highly non-robust with respect to type I error rate against incorrectness of the normality assumption and we introduce nonparametric tests which do not have this drawback. In Section 5 we introduce tests which are appropriate if there is a lower bound on the group sizes that may be allowed. These tests are also shown to be robust with respect to type I error rate. In Section 6 we study various aspects of the procedures introduced here by way of simulation and Section 7 ends with some remarks and conclusions. Illustrations of the procedures are made

J.H. Venter, S.J. Steel~Computational Statistics & Data Analysis 22 (1996) 481-504

483

throughout the paper and we feel that this will help to convince the reader that the proposed procedures represent a significant step towards reaching the goal set out in the previous paragraph. Finally, the appendices contain some technical results, the most important of which is the optimization algorithm which forms the basis of the computational methods required here.

2. Maximum likelihood estimators in the normal case

Assume that Xi is N(pi, 0.2)-distributed, independently for i = 1. . . . . n. To begin with, also assume that an m change points model is appropriate for the #i's for some given m. Then the following m a x i m u m likelihood estimators (MLEs) of the parameters of the model m a y easily be derived. For given z ~ , . . . , z " the M L E of the jth group mean Oj is

G(Tj-1, zj) =

E

(2.1)

X i / ( r j - - Z j - 1)"

Zj-l < i <_ ~j

Let S(m,'~l ....

"l~ra) =

m+l Z E IX i j = l zj_l
G(T,j-I,T,j)] 2

(2.2)

denote the within groups sum of squares associated with given m and Zl, ..., z". The associated unbiased estimator of 0.2 is ~2(m, z,, ..., Tin) =

S(m,

Zl, ...,

z")/(n

-- m --

1).

(2.3)

Further, the M L E s of zl, ... ,z" are those choices z* . . . . ,z* of zl, ... ,z" which minimizes S ( m , ' c ~ , ..., "c,,). Put S* = S(m,z*, ... ,z*) = min{S(m,'q, ... ,z"): 1 _< zl < "-. < z" < n}.

(2.4)

An efficient optimal partitioning algorithm to compute S* and z*, ..., z* is introduced in Appendix A. S* will be called the optimal within groups sum of squares associated with m change points. The M L E of the Oj assuming only rn given, is a ( z * - l , z * ) and an estimator of 0.2 is "2 = ~2(m,z]', ... ,z*) = 0.,,

S*/(n

-m

-

1).

(2.5)

This estimator of 0.2 is no longer unbiased, even if in fact an m change points model is correct. We return to estimation of 0.2 in Section 6 below. The sequence So, * $1, * S*, ... is decreasing with S*_ 1 = 0. Thus, the M L E of m is m * = n - 1 but this is an unsatisfactory solution. It always chooses the most complicated model thus violating the principle of parsimony (Box and Jenkins, 1976) which requires that the simplest possible model providing an adequate description be used. In practice, a procedure which routinely selects the most complicated model, in effect deciding that every point is a change point, is of no use. The tests described below will be used to find a value m* for m and estimators for the other parameters are obtained as above using m = rn*.

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

484

3. A normal case test

To construct a test of Ho which will be sensitive to multiple change points models as alternatives to Ho, consider the behavior of the sequence {S*} when the/~'s in fact follow a change points model with m* change points and substantially different successive group means Oj and suppose we fit an m change points model as in Section 2. For m < m* the fitted model does not allow enough change points so that some of the optimal groups will contain undetected change points and therefore widely varying data values. Hence, S* will be large for m < m* and more so ifm is further away from m*. For m = m* the fitted model can cater for all change points and the minimization in (2.4) should detect them if they are prominent enough. Within groups the X~'s should be more homogeneous and S** should be much smaller than S* with m < m*. For m > m* more than enough change points are allowed in the fitted model and some of the fitted change points should coincide with the actual change points while the superfluous ones would be placed within the groups and merely serve to reduce the remaining smaller variation somewhat further. Thus, S* would continue to decrease as m increases beyond m* but only slowly. We see that m* m a y be described as the index m where S* reaches a stable (slowly decreasing) level. Looking at {S*} backwards from large m values, we m a y describe m* as the last m preceding the j u m p in S* as m decreases. We turn these considerations into a formal test of Ho with a guaranteed significance level as follows. Let M( < n) be an integer large enough so that we are sure on a priori grounds that the true number of change points are not more than M. Consider the ratios W1 =

* * So~S,, W2

= S *1 / 8 2*,

...

,

W M

= S*_I/S~.

(3.1)

Since W1, ..., W, are both translation and scale invariant functions of X1 . . . . , X , their joint distribution under Ho does not depend on the u n k n o w n values of 01 and 0-2 and is completely determined. Let F,, be the marginal distribution function of Wm under Ho, i.e. Fro(x) = P(Wm < x[Ho). Then it would be possible to compute critical values for W,, from Fm and to compare the observed Wm'S with these critical values to find the choice of m* described above. Such critical values will differ with m and require a commitment to levels of significance which is inconvenient at this point. Therefore, introduce P m = 1 --Fro(W,,) SO that Pm is a p-value associated with Win, m = 1, ..., M. Then Pm m a y be intepreted as a measure of how significant the j u m p is that occurs at m. For testing purposes Pm contains the same information as Win, but expressed on a scale c o m m o n to all m, namely the interval [0, 1]. To test whether any of the jumps are significant enough to warrant rejection of Ho we now propose the test statistic CM = min{Pm: 1 < m < m}.

(3.2)

Again the distribution of CM under Ho does not depend on the unknown values of 01 and 0-2 and its quantiles can be calculated as indicated below. If cu(e) is the c~th quantile of the null distribution of CM, we reject Ho if CM < cu(e) and accept Ho otherwise. In the rejection case we find the largest index m* in the range

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

485

1 < m* < M such that P,,. < cM(~) and this is the indicated number of change points for the selected model. This testing procedure involves two computational difficulties, namely the computation of the P,,'s and the computation of the quantiles cM(e). Calculation of the P,,'s must be done by simulation. The process is as follows: • Calculate the statistics S~, S*, ..., S~t from the observed data X1, ..., X, using the algorithm given in Appendix A; also get the ratios Wa, ..., WM. • Repeatedly generate n-dimensional r a n d o m vectors (Z]b), ..., Z(,b)), b = 1, ..., B with B large (say 10 000) and where Z}b) is N(0, 1)-distributed, independently for i = 1, ..., n (so that (Z~b), ..., Z~b)) is distributed as (X1, ..., X,) under Ho with o- = 1). For each b compute the ratios (W ]b) ..., W~)) related to (Z]b), ..., Z ,~b)) as (W1, ..., WM) is related to (X1, ..., X,) via (3.1). (Here the algorithm in Appendix A is used B times; without such an algorithm this step is simply not feasible!) • With I(A) denoting the indicator of the event A, Pm may be approximated by 1

8

Pm= -~ ~ I(W~ ) > Win)

(3.3)

b=l

for m = 1, ..., M and this approximation may be made arbitrarily accurately by taking B large enough. The null distribution of CM must also be calculated by simulation as described in Appendix B. Table 9 gives a selection of quantile values for n = 30 and 100 and M = 1(2)9. For M = 1 we must have CM(~)= ~ SO that an idea of the accuracy of the values in Table 9 can be obtained by comparing the entries of the e and M = 1 columns. Also note that the quantiles in Table 9 do not vary much with n; this is to be expected since we have constructed CM in terms of p-values. A simple lower bound

cM(~) > o~/M

(3.4)

follows from the Bonferroni argument, since P ( C M ~ ~/MIHo)

= P(P,, <
_<

P(P~ ~ ~/MJHo) = M(~/M) = m=l

using the fact that each Pm is uniformly distributed over [0, 1] under Ho. It is apparent from Table 9 that this bound is very sharp. For example, for e = 0.01 and M = 5 the lower bound is 0.01/5 = 0.0020 compared to the entry 0.0021 for both n = 30 and n = 100 in Table 9. It is not as sharp as this for large c~,but testing with CM using c~/M as critical value makes the test only slightly conservative. Also a conservative overall p-value associated with an observed CM is given by MCM. These are very fortunate circumstances since they make the use of the proposed test a lot easier. The real work is in the calculation of the P,,'s as detailed above and the interpretation of the results follows by inspection.

486

J.H. Venter, S.J. Steel~Computational Statistics & Data Analysis 22 (1996) 481-504

Table 1 Normal test C~t applied to the Nile data

1 2 3 4 5

0.0000 0.9480 0.1428 0.1712 0.2296

28 19 28 28 28

28 83 41 37

95 45 40

47 45

47

An example to illustrate the process is provided by the problem of the Nile. The data concern the annual discharge of the Nile river at Aswan over the period 1871 to 1970. The most comprehensive discussion on this example is given by MacNeill et al. (1991). Although they indicate that serial correlation is present in this example which would invalidate our model assumptions, many authors have analyzed this data under the independence assumption and we do so also for comparative purposes. Table 1 gives the values of 1°,, and -c* calculated as explained above with B = 25 000 repetitions. This took about 200 C P U seconds on a H P 712//60 workstation but basically the same conclusions can be drawn by doing only B = 1000 repetitions with a proportional reduction in C P U time. The correct choice of M rests on prior knowledge concerning the situation. In the present example we are predisposed towards a small choice of M in view of all the existing findings that only one change point occurred here. With M = 5 and a level = 0.05 the critical value of C5 is 0~/5 = 0.01. Clearly, C5 = 0.0000 leading to rejection of Ho. Also inspecting the P,,'s from m --- 5 upwards, the first choice of m for which Pmis below 0.01 is at m = 1 leading to the choice m* = 1 for the number of change points. The end of the first group is at z* = 28. Clearly, the choice of M and ~ has little bearing on the outcome of this analysis. Moreover, it agrees with the finding of other authors. We return to this example below. 4. Non-parametric tests The normal test introduced in Section 3 is not robust against incorrectness of the normality assumption in the sense that its type I error rate is strongly affected by deviations from this assumption. Table 2 shows the actual type I error rate of the CM test with a 5% nominal level for selected values of n and M when the Xi's are generated from a contaminated normal distribution which is N(0, 1)-distributed with probability 0.9 and N(0, 9)-distributed with probability 0.1. Each entry was obtained using 250 000 simulation repetitions. Evidently, the test is very prone to falsely rejecting Ho when the actual error distribution is heavy tailed, especially with increasing n and M > 1. A little reflection reveals the cause of this phenomenon: observations with large error components will be interpreted as change points implying rejection of Ho.

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

487

Table 2 Actual type I error rate of the 5 % Cu test o n data from a c o n t a m i n a t e d normal distribution

n

1

M 5

9

20 50 100 200

0.081 0.089 0.092 0.095

0.221 0.462 0.677 0.859

0.195 0.446 0.687 0.887

Non-parametric tests suggested by the normal test m a y be used to find change points if the normality assumption is in doubt or indeed more generally, if Ho states that the Xi's are iid according to some unspecified continuous distribution function while an m change points alternative states that there are m + i consecutive groups of indices and within these groups identical distributions occur but between groups the distributions differ. Following e.g. L o m b a r d (1987, 1988) such procedures are as follows. Let R1 . . . . ,R, be the ranks of X1, ... , X , and let ~p be a score function appropriate to the situation at hand. For example, the Wilcoxon score (p(x) = x or the Van der Waerden score ~p(x) = ~0- l(x), 0 < x < 1, with ~b- 1 the inverse N(0, 1) distribution function, is appropriate in the case of near normality. For i -- 1. . . . , n set

Y, = [qg(R,/(n + 1)) - el~V,

(4.1)

where ¢ = (I/n) ~ qg(i/(n + 1)), i=1

V 2 = (l/n) ~ [qg(i/(n + 1)) -- ¢]2.

(4.2)

i=1

Let C~ denote the CM test statistic of Section 3 calculated by replacing X1, ..., X , by their scores Y1, ..., I1,. Thus, to get C~, we do the minimization in (2.4) using the algorithm in Appendix A applied to Y 1 , - . . , Y,; then we get the ratios W1 . . . . , WM and the Pro-values using the three steps of Section 3 and finally C~ = min{Pm: 1 _< m <_ M}. Note that the second step may be simplified by generating a r a n d o m permutation (R~b), ...,R~ b)) of (1, ... ,n) and taking ZI b) = [~p(R~b)/(n + 1) -- (p]/V for i = 1. . . . , n and b = 1, ..., B. Denote the quantiles of C ~ by c~(~). It is again true that c~(a) >_ o~/M. We also calculated c~(~) for the two choices of cp mentioned above and found that o~/M is again a sharp lower bound for c~(a) as for the normal case. Consequently, application and interpretation of these non-parametric tests m a y be done exactly as before. To illustrate these tests consider again the Nile data. Tables 3 and 4 show the :# values of Pm and zj obtained using the C~ test with the two score functions mentioned above.

488

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

Table 3 Wilcoxon score C~ test applied to the Nile data

m

em

C

~

~

~

~

1

0.0000 0.4988 0.0236 0.8628 0.0420

28 28 28 28 28

97 83 68 45

95 75 47

97 83

95

2 3 4 5

Table 4 Van der Waerden score C~t test applied to the Nile data

1

2 3 4 5

0.0000 0.7730 0.0770 0.2000 0.7420

28 28 28 28 28

97 83 69 69

97 71 71

97 83

97

Table 5 Wilcoxon score C~ test applied to the mill data

1 2 3 4 5

0.1310 0.0350 0.2290 0.3260 0.0820

76 32 32 35 35

76 76 52 52

92 59 59

76 76

92

W i t h M = 5 a n d e = 0.05 the critical v a l u e is 0.01 a n d b o t h tests reject Ho a n d suggest the s a m e o n e c h a n g e p o i n t m o d e l as the n o r m a l test. It is instructive to c o m p a r e the entries of T a b l e s 1, 3 a n d 4. N o t i c e t h a t if all t h r e e tables c o n t a i n a small P,, for s o m e m, t h e n their p l a c e m e n t of the "c*'s t e n d to be similar (see e.g. the m = 3 entries). T h i s seems to indicate t h a t if a p a r t i c u l a r m o d e l is relatively s t r o n g l y s u g g e s t e d b y the d a t a , all m e t h o d s will p o i n t t o w a r d s t h a t m o d e l . As a s e c o n d illustration, T a b l e 5 s h o w s the results o f a p p l y i n g the W i l c o x o n score C ~ - t e s t to the mill d a t a also a n a l y z e d in L o m b a r d (1987, 1988). W i t h M = 5 ( r a t h e r a r b i t r a r i l y c h o s e n here) we h a v e C~ = P2 = 0.035 w i t h a c o r r e s p o n d i n g p - v a l u e of 5 x 0.035 = 0.165. W i t h M s m a l l e r this p - v a l u e w o u l d be smaller. Alt h o u g h n o t o v e r w h e l m i n g l y small, there is s o m e e v i d e n c e h e r e in f a v o r of a t w o

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

489

change points model with the group endpoints at 32 and 76. This is in line with the conclusions of Lombard (1988). To conclude this section, note that the C~ test with ~0(x) = x is related to the testing procedure of Meelis et al. (1991). In particular, their L,, and our S* are closely related but their Aj's are differences rather than the ratios Wj used here. However, they calculate individual critical values for their ASs and use these to reach a decision. Compared to the proposal here, this seems to be a somewhat tedious way to proceed. Further comparisons are reported in Section 6.

5. Tests with prescribed minimum group size The normal test of Section 3 can also be modified to be more robust with respect to type I error rate by placing a lower bound on the group size allowed. If the actual error distribution is heavy tailed, observations with large error components will tend to appear in short bursts and if the minimum group size allowed is large enough, such outlying observations will no longer automatically signal change points and cause rejection of Ho. More precisely, let CM,k be the test statistic defined by (3.2) but with the restriction ~j - zj_ 1 > k for j = 1, ..., m + 1 added in the minimization of (2.4). Appendix A also shows how the optimal partitioning algorithm can be extended to handle this minimization problem effectively. Then Table 6 shows what happens to the actual type I error rate of the CM,k test with a nominal level ~ = 0.05 and n = 100 if the true error distribution is the contaminated normal used in the first paragraph of Section 4. Evidently, as k increases, the C~,k test is increasingly more robust than the test based on CM = CM, 1. In some applications we may be interested in distinguishing only long lasting changes and then the use of CM,k is particularly appealing, its greater robustness being an added bonus. In other applications, identification of short duration disturbances may also be important. A possible approach in such cases may be to determine long lasting groups first using CM.k with a large k and then to look for short-term groups within each long lasting group using CM, 1. We will not explore such possibilities here. We also mention in passing that the non-parametric tests Table 6 Type I error rate of the 5 % C~t,k test on data from a c o n t a m i n a t e d n o r m a l

(n = 100) k

M=I

M=5

M=9

1 2 3 5

0.092 0.076 0.067 0.058

0.677 0.294 0.156 0.079

0.687 0.264 0.129 0.063

490

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504 Table 7 Normal test C5,5 applied to Nile data

m

Pm

~

z~

z~

z~

z~

1

0.0000 0.8016 0.0624 0.4020 0.0212

28 19 28 19 10

28 83 28 19

95 83 28

95 83

95

2 3 4 5

C~t of Section 4 can also be extended to tests CM, k with lower bounds k on the group sizes in the same manner. To conclude this section, Table 7 shows the results of applying the CM, k test with M = 5 and k = 5 to the Nile data. With ~ = 0.05 the critical value is 0.05/5 = 0.01 as in Section 3 and the single change point conclusion follows as before. The smaller P5 = 0.0212 makes this conclusion more dependent on the choice of ~ than was the case with Cs. 1 in Table 1. The optimal placements of group end points are the same in Tables 1, 2 and 7 up to m = 3, as must be the case since the corresponding group sizes are larger than 5. For m = 4 and 5, Table 1 contains group sizes less than 5 and here the placements of Table 7 differ.

6. Simulation results We experimented widely with the procedures introduced here in order to gain some understanding of their behavior in different circumstances and also to compare them with other procedures. Unfortunately, existing procedures tend to be limited in scope being restricted to 2 or 3 change points at most, or dealing with particular patterns of alternatives only or being only asymptotic proposals. The procedure of Meelis et al. (1991) is the only one explicitly allowing alternatives with up to 3 change points with arbitrary patterns and accompanied by the necessary tables to allow implementation in finite sample cases. Therefore, this is the only procedure against which our procedures will be compared. Results of only a small selection from all simulations runs we have done will be reported here. To specify the/~i's it is convenient to speak of a (m;zl, . . . , Zm+ 1; 01, ..., 0,,+ 1) configuration meaning that • there are m change points, • the group end points are at indices z l , ... , Z m , Z m + I with "c,,+1 = n the sample size, • the c o m m o n #i-levels in the jth group is 0j, j = 1. . . . , rn + 1. In the simulation studies we m a y take 01 = 0, without loss of generality, since the procedures are translation invariant. We will consider 0j-values of the form Oj = s J j with s > 0 referred to as the 'spacing' of the configuration. If s = 0 all such configurations revert to that of Ho but as s increases the groups are more clearly

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

491

distinguished and procedures should generally perform better. To make 01 = 0 we take 61 = 0 and by choosing 62, ..., 6,,+ 1 appropriately various patterns of group levels can be obtained. We take the c o m m o n error variance 0 - 2 = 1 throughout the simulation study. Procedures can be judged on m a n y aspects of their performance but only power, PCS, CPCS and C P C P (explained below) will be addressed here. Until further notice, we assume the error distribution to be normal.

6.1. Power Here we are concerned with the probability of rejecting H0. Since the Meelis test considers at most three change points we take M = 3 for our procedures as well. To begin with, suppose n = 30 and we use the 5% critical values of Table 2 of Meelis et al. (1991) in their test. A simulation run then showed that the actual type I error rate of their test is slightly above 0.10. To be comparable in this respect we chose = 0.10 for our tests C~t, C~ (henceforth we restrict attention to the Wilcoxon score only) and CM, k (with k = 5 in the reported cases). We then determined the power curves of these tests as functions of the spacing s for m a n y configurations and Fig. 1 shows these curves for two 2 change points (cases (a) and (b)) and two 3 change points configurations (cases (c) and (d)). The curves are based on 25 000 simulation repetitions at each s-value in the grid s = 0(0.25)3. For m a n y configurations the C~t, C~t, k and Meelis tests have similar power and somewhat higher than that of CM (e.g. cases (a), (b) and (d)), but there are exceptions as shown by case (c). It is notable that the non-parametric test C~t often has higher power than the parametric test CM. Our simulations show that CM, k consistently either has the highest power or appears among those having the highest power. All tests eventually reject Ho with certainty if the true model is visible clearly enough (i.e. when s becomes large). These conclusions also hold for other sample sizes n and generally the rate of increase of power as s increases becomes larger with increasing n.

6.2. PCS and CPCS PCS refers to the probability of correct selection of the number of change points and CPCS refers to the conditional PCS given that Ho is rejected. By way of illustration, Fig. 2 shows the PCS curves of the four procedures for four configurations and Fig. 3 shows the CPCS curves for the same configurations. In both these figures the Meelis procedure is competitive in case (a), lags in cases (b) and (c) and fails in case (d) where its curves stay close to 0. We have seen failure of the Meelis procedure also in m a n y other 3 change point configurations which suggests that this procedure as it stands m a y be eliminated from further consideration. In cases (a) and (b) of Fig. 2 the PCS ofCM, C~t and CM,k do not rise to 1 as s becomes large. The reason for this p h e n o m e n o n is that with M = 3 three change points models are also considered. When s is large the true two change points are clearly visible but an occasional large error c o m p o n e n t within one of the groups may mislead the procedure to think that a third change point is also present. Thus, it happens that a model with more change points than the true number is selected with small

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

492

(a)

(b)

Config. (2; 10, 20, 30; 0, 2s, s)

Config. (2; 5, 15, 30; 0, s, 0)

0.8

0.8

,J/ n- 0.6 LU

o n

0.4

/ !

0.2

/f

0.6

LU

o

;~'/,

n

0.4

,j:;

0.2

i

i

i

i

1

2

1

2

SPACING s

SPACING s

(d)

(c)

Config. (3; 7, 15, 23, 30; 0, s, 0, s) 0.8

.i j

tY 0.6 LU

i

O n 0.4



i

e" :'

"

i

f /

.."

./ .-"

:"

l

Config. (3; 4, 11, 19, 30; 0, s, 2s, s 0.8

.f"

sf o s

n-

se

,,.,'2

0.6

I.#" •t~.~"

LU

t"

o n

/i"

0.4

tt

•.i / Y .e .."f ~

H

0,2

0.2

0

I

I

1

2

3

0

SPACING s

i

i

1

2

SPACING s

-c.

....

"'-

CM, k

-- Meelis

1

F i g . 1. Comparison of power (c~ = 0.10, n = 30).

probability even if s is large if the m a x i m u m number of change points considered M is greater than the true number in the configuration. As in the case of power comparison we also find that CM, k consistently performs best or close to best in terms of PCS when configurations with true group sizes at least k are considered. Since C P C S = PCS/power and power tends to 1 as s increases, it is to be expected that CPCS and PCS will behave similarly as s increases and this is evident when Figs. 2 and 3 are compared for large s. For small s behaviors differ depending

J.H. Venter, S.J.

Steel/Computational

Statistics & Data

Analysis 22 (1996) 481 504

(a)

493

(b)

Conflg. (2; 10, 20, 30; 0, s, 0)

Config. (2; 10, 20, 30; 0, s, 2s)

"..-.-.......-.:-.:.7:.'.;.;.7.7..7.;-7.~

0.8

0.8

0.6

0.6

O~ 0 13.

~0 (J

0.4

0.4

W

0.2

0 0

0.2

i

i

i

1

2 SPACING s

3

4

0

i

i

i

1

2 SPACING s

3

(c)

(d)

Config. (3; 7, 15, 23, 30; 0, s, 0, s)

Config. (3; 4, 11, 19, 30; 0, s, 3s, 4s

././i;':;:":" ......,--" 0.8

0.8

0.6

0.6 i

0 i3.

£

iS

0.4

i i i

0.2

i

i i

~ :

I" -

sS

.;

.

i

4 , °"

.

0.4

f

• i

.::



i i ./ .."

s•

0.2

s

i

1

2 SPACING

3

4

i

1

3

4

SPACING s

s

-c.

....

c;,

""

CM, k

-- Meelis

I

Fig. 2. Comparison of PCS (~ = 0.10, n = 30).

on the true configuration. With an u p - d o w n pattern in the configuration such as those of cases (a) and (c) the C P C S curve is largely increasing in s as one expects intuitively. However, with an u p - u p pattern such as those of cases (b) and (d) there is a phase as s increases where the procedure is increasingly confused into selecting a 1 change point model rather than the correct 2 or 3 change points model before recovering at larger s (this explanation comes from additional inspection of the simulation data).

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

494

(a)

(b)

Config. (3; 7, 15, 23, 30; 0, s, 0, s)

==================================

/.,..,._::;:~." .........

0.8

u~ (O tl

Config. (2; 10, 20, 30; 0, s, 2s)

0.8

...i . "- i6 :'d

0.8

0.4

/

0.6

!

O O. O

0.4

:, i # /;'j

,\

0.2

0.2

I

i

1

3

\ i•

0

4

/

S

t

l

i

,

1

2 SPACING s

SPACING s

(c)

,s" /" ,4," ,," •

£o 0 o. (.3

0.6

/

p

/

Config. (3; 4, 11, 19, 30; 0, s, 3s, 4s

0.8

,..

j ,-" y .: .,-""

0.4 J /

d

u~ 0.6 £3 tl (9 0.4

o,j. ~°

0.2

3

(d)

Config. (3; 7, 15, 23, 30; 0, s, 0, s)

0.8

//

/

::

/;

sp

/,

...

f

0.2

j~s ~

i

I

i

1

2

3

i,. i %..

0" -~'

..," .... "'

i

0

1

SPACING s

....

-'-

CM, k

2 SPACING s

3

-- Meelis

I

4

Fig. 3. Comparison of CPCS (c~ = 0.10, n = 30).

6.3. CPCP CPCPj refers to the conditional probability of correct placement of thejth change point given that the correct number of change points m has been selected, j -- 1, ..., m. Measures relating to the placement of change points but conditioning on other aspects or being unconditional may also be considered. Figs. 4(a) and (b) show the C P C P 1 and C P C P 2 curves for an illustrative 2 change points

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

(a)

(b) Config. (2; 5, 15, 30; O, s, 3s)

Config. (2; 5, 15, 30; O, s, 3s) 1

.. .........................................

0.8"

0.8

(1. 0 CL 0

495

0.6

~_ 0.6

0.4

0

0.4

f:

0.2 j....:

0.2

0

2 3 SPACING s

4

1

O.8 ¸ 0.6'

0,4 •

0.2" 0

J !

2

I

I

I

I

1

2

3

4

SPACING

t

5

s

(d)

(c) (3; 7, 15, 23, 30; 0, s, -s, 0

Config.

i 0

(e)

Config. (3; 7, 15, 23, 30; 0, s, -s, 0

!

Config. (3; 7, 15, 23, 30; 0, s, -s, (

n~ 0.6

U~0.4

¢J 0.4

0.2

O2

0 3

SPACINGs

4



,

1

~

.

,

,

3

4

SPACINGS



,~* /"

0 0

1

2

3

4

SPACINGs

....

-'-

~M,k

Fig. 4. Comparison of CPCP (~ = 0.10, n = 30).

configuration and Figs. 4(c), (d) and (e) show CPCP1, CPCP2 and CPCP3, respectively, for a 3 change points case. The Meelis procedure having been dropped, Fig. 4 relates only to the CM, C~t and CM,k procedures. Considering first (a) and (b) notice that the curves for (b) rise more steeply than those for (a). The reason is that the jump in level at the second change point (from s to 3s) of this configuration is larger than the jump at the first change point (from 0 to s) and also the group sizes around the second change point are larger than those around the first change point. For (c), (d) and (e) the curves of CPC1 and CPCP3 are similar as is to be expected in view of the symmetry of the configuration while the curves of CPCP2 rise more steeply which again is due to the larger jump in level at the middle change point. An interesting difference between the two parametric procedures (CM and CM,k) and the non-parametric procedure C~ is visible in Fig. 4: for C~t both CPCP1 and CPCP2 do not tend to 1 as s becomes large in the 2 change points cases (a) and (b)

496

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

and similarly, both CPCP1 and CPCP3 do not tend to 1 in the 3 change points cases (c) and (e). This means that no matter how large the change of level at these change points, the non-parametric procedure do not estimate their positions correctly with certainty. Additional inspection of the simulation data and some further reflection reveal the reason: replacing the original observations by their ranks lessens the sharpness of the boundaries between the groups and if the rank sequence across a boundary happens to be in the natural order the non-parametric procedure may fail to place the boundary correctly. This is a drawback not shared by the two parametric procedures. It is again notable that the group size limited procedure CM,k performs best or close to best on the C P C P measure also. A plausible reason for the good performance of CM,k when the true group sizes are all at least k is that limiting the group sizes is a way to incorporate more information on the parameter configuration into this procedure than the other procedures. Of course, we cannot expect CM,~ to do well if some of the true group sizes of the configuration are less than k. For example, if C~t.k with k = 5 is applied when the actual configuration is (3;4, 11, 19, 30;0, s, 3s, 4s), say, then CPCP1 will be 0 for s>0.

6.4. Effect of the value of M Up to this point M was kept fixed at the value 3. The only qualification on the choice of the value of M is that it must be at least equal to the true number of change points. It is to be expected that choosing the value of M much larger than the true number of change points will harm the performance of the procedures. Fig. 5 provides an illustration of the effect of the value of M. Fig. 5(a) shows the power curves of the CM test for the three choices M = 3, 5 and 10 for a 3 change point configuration. Evidently, power increases less steeply as M increases, but the effect of a large choice of M is perhaps less serious than might have been expected. Fig. 5(b) shows the corresponding PCS curves and here the effect is perhaps more pronounced; note that the curves for M greater than 3 no longer rise to 1 as s becomes large, i.e. the effect already noticed in the PCS discussion above becomes more visible as M increases. Similar comments holds for Fig. 5(c) which illustrates the effect of M on the CPCS. However, as is illustrated in Fig. 5(d) the choice of M has virtually no effect on the CPCP, which is to be expected since this measure conditions on the correct model having been selected.

6.5. Effect of a non-normal error distribution Up to this point the error distribution was normal. In this paragraph we briefly illustrate the effect on the various measures discussed above, if all procedures used the normality assumption while the true error distribution was actually a double exponential. Fig. 6(a) shows the power curves of the three tests C~, C~ and CM,~ for M = 3 for the same configuration as Fig. l(c). Evidently, the power of the CM test is higher at small s but this is due to the lack of type I error rate robustness of this test already pointed out in Section 4. The powers of the other two tests are

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

(a)

497

(b)

Config. (3; 7, 15, 23, 30; 0, s, 0, s)

Config. (3; 7, 15, 23, 30; O, s, O, s

0.8

0.8 ..

j

•. . ¢

n" 0.6

..'

UJ

/

/

O n 0.4

.."

t

,,"#, .-"i ,,'e' .."/

0.6

A;

(3 n

• d

0.4 ...,,

/s

."

0.2

0.2

i

1

1

2

.'e

1

2

SPACING s

(c)

(d)

Config. (3; 7, 15, 23, 30; 0, s, 0, s

Config. (3; 7, 15, 23, 30; 0, s, 0, s

1

0.8

O3 (3 0.. (3

3

SPACING s

0.8

0.6

0.6 n (3

0.4

02

il

O

0.4

0.2

0

,

0

1

2 SPACING s

3

1

-

M=3

.... M=5

2 3 SPACING s

4

-:M=10 J

Fig. 5. Effect of values of M (~ = 0.10, n = 30).

affected only slightly. Fig. 6(b) shows the PCS curves of the three procedures for the same configuration as Fig. 2(b) and as for power the PCS of CM is somewhat improved at small s but otherwise the effect is not too serious. Similar comments apply to Fig. 6(c) which may be compared with Fig. 3(b). Finally, Fig. 6(d) shows the CPCP1 curves and comparing these with those in Fig. 4(a) where the errors were normal, we see that the effect of non-normality is very slight on CPCP.

498

J.H. Venter, S.J. Steel~Computational Statistics & Data Analysis 22 (1996) 481-504

(a)

(b) Config. (2; 10, 20, 30; O, s, 2s)

Config. (3; 7, 15, 23, 30; O, s, O, s

..... :,,.,.,.,...~..-;,'.';.._T-.-"E

0.8

/ i "S '

ii:."

i.,,' n- 0.6 LU

0.6

ii..f

i ;

~o 0

O o. 0.4

L." i.: ./....~,..

0.4 .#:.

0.2

0.2

, ""

0

, 0

i

f

i

i

1

1

2

1

2 SPACING s

3

SPACING s

(d)

(c)

Config. (2; 5, 15, 30; 0, s, 3s)

Config. (2; 10, 20, 30; O, s, 2s) j °.." 14'~'i~ .°:

0.8

0,8

/...

0.6

_ 0.6

/_... //

0.4

¢J 0.4

;..-i-

k.

4

0.2

0.2

i

i

i

i

1

2

3

1

SPACING s

I

- c,

.... c;,

i

i

2 3 SPACING s

-.-c,.,

i

4

5

I

Fig. 6. Effect of non-normal errors (c~= 0.10, n = 30).

6. 6. E s t i m a t i o n o f a 2

M a n y existing change point test statistics require an estimator of the c o m m o n error variance a z to begin with. O u r a p p r o a c h does not require such an estimator. Nevertheless, estimation of o-: is still of interest. O n e possible estimator is the m e a n ^2 within groups error sum of squares associated with the selected model, i.e. ~z = o-m* with m* the selected n u m b e r of change points and 92mgiven by (2.5). This estimator

J.H. Venter, S.J. Steel~Computational Statistics & Data Analysis 22 (1996) 481 504

499

m a y be appropriate in the case of normality but may also be expected not to be robust against error distributions heavier tailed than the normal. M a n y alternatives may be proposed, but since estimation of 0.2 is of secondary importance in this paper, we indicate only one possibility briefly, namely an estimator based on m e a n deviations. F o r Z1, . . . , Z k iid N(0, 1) define h(k) = E ~',~=1 k [Zi __ Med{Z1, ...,Zk}]. h(k) may easily be calculated numerically and for data VI, ..., Vk which are iid N(#, o-2), an unbiased estimator of a is given by Y,~=I [V~ - Med{V~, ..., Vk} I/h(k). Returning to the change point problem, once the n u m b e r of change points m* and the group end points z*, ... , z,,. * have been selected the proposed estimator of a is

= m*+l Z Z j=l ~*_1
Xi

-- Med(z*_ 1, z*) / m*=~11h(z* -- z*_ 1) j=

with Med(zj_ 1, z j) = Median {Xi:'cj- t < i ___z~}. The corresponding estimator of O"2 is (~2. We c o m p u t e d the bias and M S E of fi2 and ~2 as part of our simulation study and Fig. 7 provides an illustration of typical results. Here the CM,k test (with M = 3, k = 5 and e = 0.10) was used to select the change points model; also n = 30 and the true configuration is shown in Fig. 7. In cases (a) and (c) the error distribution was normal and in cases (b) and (d) it was double exponential, the true value of a 2 being 1 throughout. Figs. 7(a) and (b) compare the biases of the two estimators. Both are seen to be generally negatively biased and this is a likely effect of the minimization in the model fitting process. It is perhaps a bit surprising that ~2 is somewhat more biased than ~2 in the normal case and less biased in the double exponential case. Figs. 7(c) and (d) compare the MSEs of the two estimators. We multiplied the actual M S E by the factor n/2 ( = 15) to get a value which is less dependent on n (since 2/n is the asymptotic M S E of the sample variance for a sample of size n from a N(#, 0-2) population). Here it is evident that ~z performs better than ~2 in the normal case but poorer in the double exponential case as may be expected. The shapes of these MSE curves occur quite generally. F o r spacings s in the region 0.5 to 2.0 the estimators perform rather poorer than elsewhere; this region corresponds to cases where the true model does not stand out very clearly and procedures m a y select wrongly of which one result is relatively poor estimation

of o"2.

7. Concluding remarks 7.1. The optimization algorithm The huge computational effort required to do the optimal partitioning in (2.4) without an algorithm such as that of Appendix A has been noted before (see e.g. Meelis et al., 1991, p. 134). Availability of this algorithm and good c o m p u t i n g facilities open up m a n y avenues of research on the change point problem. Apart from the extensions of the algorithm already pointed out there are m a n y further

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481-504

500

(a) Normal errors 0.1

(b) Double exp. errors

Confi ............. g. (3;7,,15,23,,30;0, i' 0, s)

o Congif. l (3; 7, 15,23,30;0, s, 0, s

-0.1

t...__../..,"........."...\................. ............................................................

-0.1

-0.2

-0.2

SPACINGs

SPACINGs

(c) Normalerrors Config.(3; 7, 15,23,30;0, s, 0, s)

(d) Double exp. errors Config. (3; 7, 15, 23, 30; 0, s, 0, s)

IM CO

LU ¢a9

.f.,: . °..-"'...,

1 0

"'%......

........ .

i

i

i

i

1

2

3

4

SPACINGs

1

...,, ........ , ....................

i

o

i

2 SPACINGs

1

I -

~.2

.... ~2

I

Fig. 7. Comparison of 0,2 estimators (c~= 0.10, n = 30).

possibilities. Suppose for example, that Xi is assumed N(#i, a~)-distributed and both #i and o-~ stay fixed within groups but both m a y change at change points. Then maximum likelihood estimators given the change points zl, ..., z,, are easily found and the maximization over "rl, ..., rm given m can be done by an algorithm that is only a slight adaptation of that in Appendix A. Similar remarks apply to the exponential models discussed in H a c c o u and Meelis (1988). One m a y also base

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481 504

501

partitioning on absolute deviation loss functions, e.g. S(m, z l , ... ,Zm) = ,.~+ 1 j=l

~

X i -- M e d ( r j - a , ~j) •

(7.1)

g) l
Minimizing (7.1) over "ca, ... ,"cmmay be done again by modifying the algorithm of Appendix A in the obvious way. 7.2. Implementation o f tests

By defining the test statistics in terms of the p-values whose computation is numerically intensive (but made feasible by the algorithm) we avoid the need to calculate critical values. In this respect much of the existing change point testing literature seems to provide only asymptotic distributional results which are often not all that helpful to the user. By contrast, the distributional aspects of our tests happen to be simple, making their implementation easy. It is true that the computational requirements needed to deal with large n (in the thousands) become onerous but such cases can also be dealt with by breaking the data up into manageable sections and dealing with these separately. Of course, this does not mean that the asymptotic theory of our tests is of no interest. Asymptotic theory could e.g. provide insight into why the inequality (3.4) is sharp here and lead to deeper understanding of the operating characteristics of the procedures pointed out in Section 6. Some progress on such issues has been made. 7.3. The model selection approach

The hypothesis testing approach towards selecting the number of change points may be questionable in some applications. As an example, suppose we have noisy measurements on rock density made at regular intervals down a borehole. Since density changes abruptly between lithological layers, change point analysis should be helpful to determine the boundaries of successive layers. Typically however, the existence of different layers is a foregone conclusion. What sense would it make to set up the change point analysis here in the form of testing the null hypothesis that there are no layers present? A model selection approach using criteria such as Cp or A I C would seem more reasonable. Some results using the Schwarz BIC criterion are provided by Yao (1988). Notice that the minimization required in (1) of Yao (1988) is the same as that of (2.4) of this paper so that the algorithm of Appendix A also comes into play with that approach. We are currently doing further research in that direction also. 7.4. A C P A

Software (called ACPA for Abrupt Change Points Analysis) implementing the procedures introduced here is available from the authors. It runs under WINDOWS 3.1, manages data via its own built-in spreadsheet and allows the user to select change points models interactively.

502

J.H. Venter, S.J. Steel~Computational Statistics & Data Analysis 22 (1996) 481-504

Appendix A A.1. The basic optimization algorithm

Using the notation of Section 2 we have S(m, z l , ... , Z m ) = ~

X 2 - - A ( m , zx, ... ,Zm),

i=1

where m+l

A ( m , z l , ... ,Zm)= ~

(Tj--'cj_I)G('cj_I,'Cj) 2.

(A.1)

j=l

Hence, we may find ~*, ..., ~m * by maximizing the expression (A.1). Also let A* = A(m,z* .... ,z*) = ~

X 2 - S*.

(A.2)

i=1

The algorithm below computes A* and z*, ... ,Zm * by exploiting simple recursive relationships between these quantities for the full sequence X1, ..., X , and parts of this sequence. Further notation is needed. Consider the initial section X1, ..., Xs of the full sequence, 1 < s < n. P u t a(O,s) = sG(O,s) 2 and for 1 < r < min(s - 1,m) put a(r,s) = max

( z j - z ~ - l ) G ( z j - l , z j ) 2 : 0 = % < Zm < "'" < r, < r,+1 = s kj=l

(A.3) with G as defined in (2.1). Also let (Zx (r, s ) , . . . , zr(r, s)) denote the choice of(zl, ..., z,) in (A.3) at which the maximum is achieved. In this notation we want to calculate A * = a(m, n) and z* = z l(m, n), . . . , Zm * = Zm(m, n). Considering the different possibilities of placement of the rth end point, we see that for 1 _< r ___min(s - 1,m) we have a(r,s) = max{a(r - 1, i) + (s - i)G(i,s)2: r < i < s - 1}.

(A.4)

Also with i*, the choice of i which maximizes the right-hand side of (A.4), we have z l ( r , s ) = zl(r - 1, i*), ... , z r - a ( r , s ) = z r - l ( r - 1, i*), z~(r,s) = i*.

(A.5)

We calculate a(r, s) and the vi(r, s)'s recursively by varying the index s in an outer loop between 1 and n and for each s we vary r in an inner loop from 1 up to min(s - 1, m). At s = n and r = m we end up with the required a(m, n) = A * and the corresponding z*'s. Before starting the recursion we calculate the partial sums P(i) = Y ~ = I Xj for i = 1, . . . , n from which the group means follow as G(zj_ 1, Z'j) = (P(zj) -- P(zj_ 1 ) ) / ( T j - - "~'j_ 1)" Table 8 shows the C P U time in seconds required to do this calculation on a HP712/60 workstation with a double precision Fortran routine. Evidently, the time required increases a b o u t linearly in m for fixed n and somewhat faster than quadratically in n for fixed m.

J.H. Venter, S.J. Steel/Computational Statistics & Data Analysis 22 (1996) 481 504

503

Table 8 m n

5

10

20

40

100 200 400 800 1600

0.008 0.036 0.158 0.700 3.330

0.015 0.063 0.274 1.236 6.570

0.027 0.049 0.115 0.228 0.509 1.040 2.498 4.850 1 1 . 5 7 0 23.830

A.2. Restricted group size extension o f the optimization algorithm

The basic algorithm can be extended to handle also the maximization of A(m,'c~, ... ,z,,) over ~ , ... ,z,, subject to the restriction that all group sizes are at least k( > 1), i.e. the restrictions on the ~fs in (2.4) are replaced by ~j - zj_ 1 -> k for j = 1, ..., m + 1. To allow for m change points (implying m + 1 groups) each of size at least k we must have (m + 1)k < n or m < [n/k] - 1 where Ix] denotes the integer part of x. The basic recursive relation for the optimal partitioning given by (A.4) must now be replaced by the requirement that for 1 < r <_ m i n ( [ s / k ] - 1, m) we have a(r,s) = max{a(r -- 1, i) + (s -- i)G(i,s)2: kr < i <_ s - k},

(A.6)

and this is implemented as before. Since fewer entries are evaluated the algorithm becomes faster as k increases.

Appendix B B. 1. Calculation o f the null distribution o f C ~

The quantiles of the null distribution of CM can be approximated by the sample quantiles of a large sample (of size N say) of repeated realizations of CM generated under Ho. To get these realizations we must generate realizations of (X1 . . . . , X,) with Xi idd N(0, 1), then calculate (P1, ..., P,.) and CM (via (3.2)) corresponding to each realization of (X1 . . . . , X,). If the calculation of (P1, .--,Pro) is done by the simulation m e t h o d of the three steps of Section 3 a total of N B sets of n-vectors of lid N(0, 1) r a n d o m variables will need to be generated. Since both N and B must be large for accuracy (both > 10 000), this is not feasible. Instead, we first compute an accurate representation of all the distribution functions F,, in a simulation study involving 1 000 000 repetitions. Then the P,,'s for the present simulation calculation of the null distribution of CM are computed from P,, = 1 - F,,(W,,). Doing N = 1 000 000 repetitions here also a total of 2 000 000 sets of n-vectors ofiid N(0, 1) r a n d o m variables are involved and this is practically feasible. Table 9 was calculated in this way. The r a n d o m number generator D R N N O R of I M S L was used in the process. The technique outlined here was also used to speed up the simulation work reported in Section 6.

504

J.H. Venter. S.J. Steel~Computational Statistics & Data Analysis 22 (1996) 481 504

Table 9 Quantiles cM(e) of the null distribution of the test statistic CM M n

e

1

3

5

7

9

30

0.01 0.05 0.10 0.25

0.0100 0.0499 0.0999 0.2495

0.0035 0.0173 0.0355 0.0959

0.0021 0.0105 0.0220 0.0614

0.0016 0.0077 0.0161 0.0457

0.0012 0.0060 0.0127 0.0365

100

0.01 0.05 0.10 0.25

0.0100 0.0496 0.0997 0.2494

0.0034 0.0176 0.0362 0.0969

0.0021 0.0109 0.0228 0.0636

0.0016 0.0080 0.0170 0.0487

0.0012 0.0064 0.0137 0.0399

Acknowledgements This r e s e a r c h w a s s u p p o r t e d b y a g r a n t f r o m the F o u n d a t i o n of R e s e a r c h D e v e l o p m e n t of S o u t h Africa to the first a u t h o r . T h e s e c o n d a u t h o r received s u p p o r t f r o m S t e l l e n b o s c h U n i v e r s i t y in the f o r m of a H P 7 1 2 / 6 0 w o r k s t a t i o n o n w h i c h the c o m p u t a t i o n a l w o r k was done.

References Antoch, J. and M. Hu~kovfi, Procedures for the detection of multiple changes in series of independent observations, Proc. 5th Prague Symp. on Asymptotic Statistics (Physica-Verlag, Wurzburg 1994) 3 20. Box, G.E.P. and G.M. Jenkins, Time series analysis: Forecasting and control, 2nd edn. (Holden-Day, San Francisco, 1976). Gombay, E., Testing for change points with rank and sign statistics, Statist. Probab. Lett., 20 (1994) 49-55. Haccou, P. and E. Meelis, Testing for the number of change points in a sequence of exponential random variables, J. Statist. Comput. Simul., 30 (1988) 285 298. Hu~kovfi, M., Miscellaneous procedures connected with the change point problem, in: Compstat 1994: Proc. Computational Statistics (Physica-Verlag, Wurzburg, 1994) 443 457. Hugkovfi, M. and P.K. Sen, Nonparametric tests for shift and change in regression at unknown time points, in: P. Hachl (Ed.) Statistical analysis and forecasting of economic structural change (Springer, Berlin, 1989) 71-85. Lombard, F., Rank tests for the change point problem, Biometrika, 74 (1987) 615-625. Lombard, F., Detecting change points by Fourier analysis, Technometrics, 30 (1988) 305-310. MacNeill, I.B., S.M. Tang and V.K. Jandhyala, A search for the source of the Nile's change points, Environmetrics, 2 (1991) 341-375. Meelis, E., M. Bressers and P. Haccou, Non-parametric testing for the number of change points in a sequence of independent random variables, J. Statist. Comput. Simul., 39 (1991) 129-137. Yao, Y-C., Estimating the number of change-points via Schwarz' criterion, Statist. Probab. Lett., 6 (1988) 181 189.