BEHAVIORTHERAPY17, 302-311 (1986)
Letters to the Editor Clinical Significance Revisited To the Editor.--In an a t t e m p t to make the reporting o f research results relevant and informative to practitioners, J a c o b s o n , Follette, and R e v e n s t o r f (1984) suggested a n u m b e r o f conventions for reporting clinical significance. Jacobson et al.'s proposed measures fell into two types. First, several measures were suggested that indicate whether or not a subject has, as a result o f the therapeutic treatment, m o v e d from the dysfunctional range to the functional range. Second, to insure that this m o v e was statistically reliable, several measures were suggested that assess whether or not the change f r o m p r e t r e a t m e n t to p o s t t r e a t m e n t was due to chance or not. We will review the former. Jacobson et al. m a d e a persuasive case that a clinically significant result is one for which the client m o v e s from the dysfunctional to the functional range o v e r the course o f the treatment. Three measures that index this m o v e m e n t were suggested. Jacobson et al. (1984) operationalized these three measures, which will be referred to as Measures 1, 2, and 3, respectively, as follows: I. Does the level o f functioning at posttest fall outside the range o f the dysfunctionalpopulation, where range is defined as extending to two standard deviations above (in the direction o f functionality) the m e a n for that population?
2. Does the level o f functioning at posttest fall within the range o f the functional or n o r m a l population, where range is defined as beginning at two standard deviations below the m e a n for the n o r m a l population? 3. Does the level o f functioning at posttest suggest that the subject is statistically m o r e likely to be in the functional than in the dysfunctional population; that is, is the p o s t t e s t score s t a t i s t i c a l l y m o r e likely to be drawn from the functional than the dysfunctional distribution (p. 340)? O f strategic i m p o r t a n c e is that each o f these measures depend on the assumption that there are distinct, although overlapping, distributions for the dysfunctional and functional populations, as shown in Jacobson et al.'s figure 1 (p. 341). T o calculate two o f the measures (Measures 2 and 3) the parameters o f both distributions are needed (normality is assumed, as well). W h e n the parameters o f the functional population are unknown, the other measure (Measure 1) is appropriate in that it only depends on the parameters o f the dysfunctional distribution. The assumption o f two distinct distributions m a y be appropriate for some clinical problems as will be discussed subsequently; nevertheless, a viable alternative assumption is available. The alternative ass u m p t i o n is t h a t t h e r e is o n e
302 0005-7894/86/0302-0311 $ 1.00/0 Copyright 1986 by Association for Advancement of Behavior Therapy All rights of reproduction in any form reserved.
LETTERS T O T H E E D I T O R
p o p u l a t i o n , w h i c h c o n t a i n s the functional and dysfunctional individuals, and that the dysfunctional individuals are found in one o f the tails o f the distribution o f scores for that population. I f the alternative assumption is accepted, then the measures proposed by Jacobson et al. are inappropriate. Clearly, the measures that rely on two sets o f parameters (Measures 2 and 3) cannot be calculated. Clinical significance for the remaining measure (Measure 1) is determined by assessing whether the posttest score falls outside the range o f the dysfunctional distribution, where the range is said to extend two standard deviations f r o m the m e a n in the direction o f the functional distribution. This measure is inappropriate as well in that the m e a n o f the dysfunctional distribution is nonexistent because the distribution for the dysfunctional population is not distinct. One other problem with the measures o f clinical significance proposed by Jacobson et al. should be mentioned. Calculation o f each o f the measures depends on knowing the parameters o f the distributions. In the calculations o f the measures, estimates o f the parameters are needed. Although not discussed in detail by Jacobson et al., it appears that the parameters would be estim a t e d on the sample collected for the study. However, these estimates will be dependent on how the sample was defined and selected. F o r instance, in one study the dysfunctional subjects m a y be severely dysfunctional whereas in another study the dysfunctional subjects m a y be m a r g i n a l l y distressed. Ironically, including a more dysfunctional
sample makes it easier to m o v e from the dysfunctional to the functional range. Measure 3 in Jacobson et al.'s c o n c e p t u a l i z a t i o n d e t e r mines whether functioning at posttest is m o r e likely to be in the functional or dysfunctional range. This value, when variances are equal, is the m i d p o i n t o f the means o f the functional and dysfunctional distributions. However, as the dysfunctional distribution moves away from the functional distribution, the critical value also m o v e s away from the functional distribution. Thus, the criterion for likelihood that the posttest score is in the functional range is less stringent. For example, suppose that a functional population had a m e a n o f 120. A subject who scored 70 on the pretest would need a score o f 100 on the posttest to say that her or his i m p r o v e m e n t was clinically significant i f a m o d erately dysfunctional sample with a m e a n o f 80 was used in the study. On the other hand, i f a severely dysfunctional sample with a m e a n o f 60 was used, the same subject would only need to obtain a posttest score o f 90 to achieve a clinically significant result. W h e t h e r or not clinical problems are conceptualized best as two distinct distributions, as Jacobson et al. contend, or as a single distribution is an empirical question. Nevertheless, there is evidence that most clinical conditions exist on a c o n t i n u u m from functional to dysfunctional. F o r example, depression appears to range from m i n o r episodes o f depression c o m m o n to the n o r m a l population to m o r e debilitating forms that are clearly dysf u n c t i o n a l . T h e Diagnostic and
Statistical Manual of Mental Dis-
LETTERS TO THE E D I T O R
orders (DSM-III; American Psy-
c h i a t r i c A s s o c i a t i o n , 1980) lists m a j o r forms o f depression (major episode) and less severe forms where all o f the s y m p t o m s are not represented or are represented to lesser degree (e.g., dysthymic disorder or adjustment disorder with depressed mood). C o m m o n l y researchers in the area o f depression select dysfunctional populations based, in part, on the criteria that the subjects fall above a critical value on an instrument such as the Beck Depression I n v e n t o r y (BDI). Certainly, i f the entire population were sampled, the distribution o f scores on the BDI would not form the bimodal distribution represented in Jacobson et al.'s figure 1. It is not difficult to find other examples for which the single distribution is applicable. N o n c o m p l i ance is natural for most children at different developmental stages. In part, however, depending on the degree o f oppositional behavior, children can also be diagnosed as oppositional disorder, conduct disorder, or even anti-social personality disorder. Other examples o f disorders that fall on a single distribution include marital dissatisfaction, habit disorders, and forms o f anxiety disorders. Often diagnoses are defined in terms o f critical values. For example, two o f the DSM-III (American Psychiatric Association, 1980) criteria for Anorexia Nervosa is that the patient (a) experience a weight loss o f at least 25% o f original b o d y weight, and (b) refuse to maintain body weight over a m i n i m a l norm a l weight. These two criteria refer to a single distribution for which, if the patient falls below a critical val-
ue, he or she is classified as dysfunctional. Some clinical conditions do seem, however, to form two separate distributions. Clearly, infantile-autistic and profoundly retarded individuals are different from normally functioning individuals. For these conditions, m e a s u r e m e n t o f characteristics (e.g., symptoms, behaviors, etc.) m a y result in two distinct distributions. Conditions for which this model holds, however, are very stable o v e r time and cures are rare i f they occur at all. Moreover, for these conditions, Jacobson et al.'s measures o f clinical significance are superfluous. The diagnosis o f these conditions typically can be m a d e by clinicians qualitatively and without resort to statistically based tests. For instance, in the area o f autism, clinicians would feel uncomfortable relying on Measures 1, 2, or 3 to determine whether or not a child was autistic. Jacobson et al. appeared to have recognized the difficulty in discriminating between instances when one o f the two assumptions was valid. T h e y stated that d i c h o t o m o u s categorization " . . . forces the clinical researcher to think in terms o f false dichotomies (e.g., one either has the problem or one does not). On what basis would we infer that agoraphobia has been 'eliminated'? H o w about depression" (p. 339)? However, after making this argument, Jacobson et al. advanced the assumption of separate, although overlapping, distributions. Either disorders are dichotomous, in which case clinical j u d g m e n t should be sufficient to determine normality and nonnormality; or disorders are not dichtomous, in which case mea-
LETTERS TO THE EDITOR
surement o f characteristics form a single distribution on which dysfunction is f o u n d in a tail o f the distribution. Jacobson et al. have been the first to suggest measures o f clinical significance that are objective and applicable to a variety o f problems. Although we have d e m o n s t r a t e d that the d e v e l o p m e n t o f measures o f clinical significance will be difficult, the d e v e l o p m e n t o f useful measures o f clinical significance is vital to research in psychotherapy. Regardless o f whether or not Jacobson et al.'s measures are generally accepted, it should be recognized that they are pioneers in the d e v e l o p m e n t o f statistical procedures to assess clinical significance.
t e m p t e d to provide some guidelines for determining when the criterion o f clinical significance has been a c h i e v e d in p s y c h o t h e r a p y o u t c o m e research. Briefly, these guidelines specify that clinical significance has been achieved when the t r e a t m e n t c o n d i t i o n m o v e s the client from a dysfunctional to a functional population and when i m p r o v e m e n t exceeds what would be expected by m e a s u r e m e n t error. In order to determine if the pretest to posttest change score exceeded that which would be expected on the basis o f m e a s u r e m e n t error, J a c o b s o n et al. proposed the use o f a reliable change index where
American Psychiatric Association. (1980).
Diagnostic and statistical manual of mental disorders (3rd ed.). Washington,
DC: Author. Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15, 336352. BRUCE E. WAMPOLD
University o f Oregon Eugene, Oregon 97403 WILLIAM R . JENSON
University o f U t a h Salt Lake City, U t a h 84112
A Method of Assessing Change in a Single Subject: An Alteration of the RC Index To the Editor. --Jacobson, Follette, and Revenstorf (1984) have at-
X2 - X] SE
R C = reliable change X~ = pretest score X 2 = posttest score SE = Standard error o f measurement This formula c o m p a r e s the difference in the pre- and posttest scores with that which would be expected on the basis o f m e a s u r e m e n t error. According to J a c o b s o n et al., if R C is equal to or greater than ___1.96 one could expect a change o f that magnitude to occur by chance a p p r o x i m a t e l y 5 t i m e s in 100. However, there is a basic p r o b l e m with the R C index as specified by J a c o b s o n et al. The standard error o f m e a s u r e m e n t is an index o f the dispersion o f an obtained score a b o u t a true score. However, the R C index makes use o f two obtained scores. F o r the R C index to be an accurate measure o f whether or not a change is greater than that