Reliability of the Lichtman Classification of Kienbo¨ck’s Disease Kourosh Jafarnia, MD, Evan D. Collins, MD, Harold W. Kohl III, PhD, James B. Bennett, MD, Omer A. Ilahi, MD, Houston, TX The purpose of this study was to establish the interobserver reliability and intraobserver reproducibility of the staging of Kienbo¨ck’s disease according to Lichtman’s classification. Posteroanterior and lateral wrist radiographs of 64 patients with a diagnosis of Kienbo¨ck’s disease and 10 control subjects were reviewed independently by 4 observers on 2 separate occasions. The reviewers included 3 hand fellowship-trained surgeons and 1 orthopedist who was not fellowship-trained in hand surgery. A stage was assigned to each set of radiographs according to the Lichtman classification. Paired comparisons for reliability among the 4 observers showed an average absolute percentage agreement of 74% and an average paired weighted kappa coefficient of 0.71. Furthermore, all the controls were correctly classified as stage I, which is in accordance with the Lichtman system. With regard to reproducibility, observers duplicated their initial readings 79% of the time with an average weighted kappa coefficient of 0.77. These results indicate substantial reliability and reproducibility of the Lichtman classification for Kienbo¨ck’s disease. (J Hand Surg 2000;25A:529 –534. Copyright © 2000 by the American Society for Surgery of the Hand.) Key words: Kienbo¨ck, Lichtman, lunate, reliability, radiographs.
In 1910, Kienbo¨ck1 reported an abnormality of the carpal bones that he termed lunatomalacia. Although its etiology is thought to be from both repetitive trauma and vascular compromise, this pathogenesis has not been conclusively proven. The clinical presentation and radiographic appearance, however, are well recognized. In the early phases of the disease, pathology is limited to the lunate, and radiographs may be negative (with changes evident only on magnetic resonance imaging or bone scanning) or may reveal increased density. In later phases, there is an alterFrom the Department of Orthopedic Surgery, Baylor College of Medicine, Houston, TX. Received for publication October 13, 1999; accepted in revised form February 14, 2000. No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article. Reprint requests: Omer A. Ilahi, MD, 6560 Fannin, Suite 400, Houston, TX 77030. Copyright © 2000 by the American Society for Surgery of the Hand 0363-5023/00/25A03-0026$3.00/0 doi: 10.1053/jhsu.2000.7377
ation of the structure and kinematics of the carpus with lunate collapse resulting in proximal capitate migration, proximal row widening, and secondary rotatory instability of the scaphoid, eventually leading to diffuse degenerative changes. In 1977, Lichtman et al2 modified Stahl’s original radiographic classification from 1947 in an attempt to help guide the choice of treatment options. The modified system consists of 4 stages. In stage I, radiographs are normal or reveal a linear or compression fracture of the lunate. Stage II is characterized by increased density in the lunate relative to the other carpal bones. The size, shape, and anatomic relationship of the bones, however, are not altered. In stage III, the entire lunate appears collapsed in the frontal plane and elongated in the sagittal plane with the capitate migrating proximally. Scapholunate dissociation, flexion of the scaphoid (ring sign), or ulnar migration of the triquetrum may be seen on posteroanterior radiographs. Stage IV shows all the characteristics of stage III as well as generalized degenerative changes in the carpus. Since Lichtman et al’s2 original report defining The Journal of Hand Surgery 529
530 Jafarnia et al / Reliability of the Lichtman Classification
this classification, several studies have been published that based treatment and prognosis on these stages.3–5 For optimal treatment, a classification system needs to be reliable and reproducible. Interobserver reliability refers to the agreement between different observers and intraobserver reproducibility refers to agreement for observations made by an individual on 2 or more occasions. Recently, classification systems for fractures of the femoral neck, proximal humerus, and ankle and the King classification of scoliotic curves have demonstrated disappointing reliability and reproducibility.6 –11 To our knowledge, there has been only 1 study reviewing the reliability of the Lichtman classification system. In the Scandinavian literature Jensen et al12 reported moderate reliability and fair to substantial reproducibility of the Lichtman classification system. Our clinical impression is that the Lichtman classification is more reliable than reported by Jensen et al.12 Their study included only 1 hand surgeon and only 48 patients. We performed a larger study with more experienced participants to assess the degree of interobserver reliability and intraobserver reproducibility of the staging of Kienbo¨ck’s disease according to the Lichtman system using routine posteroanterior and lateral wrist radiographs.
Materials and Methods Posteroanterior and lateral wrist radiographs obtained at the initial presentation of 64 patients diagnosed with Kienbo¨ck’s disease and from 10 controls without Kienbo¨ck’s disease were converted to standard-size 35-mm projection slides. All 64 patients were diagnosed, subsequently treated, and actively monitored by several hand surgeons. No patient’s radiographs were used more than once. The 74 sets of radiographs were reviewed individually and staged according to the Lichtman system by each of 4 reviewers. One month later each of the 4 reviewers again assessed the same films. The treating surgeon originally diagnosed 4 patients with stage I, 23 with stage II, 32 with stage III, and 5 with stage IV Kienbo¨ck’s disease. Of the 4 cases diagnosed as stage I, 2 were confirmed by magnetic resonance imaging, 1 by bone scanning, and 1 by both imaging modalities. The 10 controls were included to increase the number of radiographs consistent with stage I Kienbo¨ck’s disease and also to assess the specificity of radiographic evaluation of Kienbo¨ck’s disease. The reviewers in this study included 2 orthopedic
surgeons who had fellowship training in hand surgery, a plastic surgeon with fellowship training in hand surgery, and an orthopedic surgeon with no hand fellowship training. They averaged 5.5 years of practice experience. Each reviewer assigned a classification stage to each set of radiographs. Those radiographs that were thought to be normal were to be classified as stage I. The reviewers were provided a description of each stage as described in Lichtman et al’s2 original report. No feedback was provided after the first viewing session and the radiographs were not made available between the first and second viewings. Furthermore, the observations made at the first session were not available during the second. In addition, during the second viewing the radiographs were presented in a different order than the first, minimizing the possibility of latent recall. Finally, no questions were allowed during either review and all identifying information from the radiographs was obscured. Interobserver reliability was assessed by comparison of the readings of each set of radiographs among the 4 observers. Absolute percentage agreements between observers were calculated, as were paired weighted kappa coefficients to adjust for the proportion of agreement that could have occurred by chance. A kappa value of 1.0 indicates complete agreement; a value of 0.0 means that agreement is no better than that expected by chance alone.13 Both absolute percentage agreement and kappa values were reported because data with abnormal distributions or unusual patterns may skew kappa values. Kappa values can be compared with percentage agreement for correlation. If they are both of the same magnitude, one can be more certain that the kappa values are not artificially skewed. Reporting absolute percentage agreements also affords the reader a familiar reference. Intraobserver reproducibility was determined in a similar manner. The guidelines of Landis and Koch14 for the interpretation of kappa values were used to categorize the coefficients: values of 0.00 to 0.20 indicate slight reliability; 0.21 to 0.40, fair reliability; 0.41 to 0.60, moderate reliability; 0.61 to 0.80, substantial agreement; and 0.81 to 1.00, excellent or almost perfect agreement. Weighting the kappa coefficients allows correction for the magnitude of the disagreement between the individual assessments.15 This is calculated using different weights in accordance with the agreement expressed between stages. Identical agreement between 2 observations was assigned the weight of 1. If
The Journal of Hand Surgery / Vol. 25A No. 3 May 2000 531
Table 1. Agreement Among Observers Compared With the Treating Surgeon Stage I II III IV
(the maximal differential in a 4-stage scheme) were assigned a weight of 0.
ⱖ3 Reviewers in Agreement
No. Originally Diagnosed by the Treating Surgeon
5 13 31 2
4 23 32 5
All 4 observers agreed on the classification stage of 41 (55%) of the 74 sets of radiographs. At least 3 of the 4 observers agreed on 51 sets (69%) of radiographs. The data in Table 1 illustrate the agreement among observers compared with the treating surgeons’ assessment. Paired comparisons between the 4 observers showed an average absolute percentage agreement of 74% (range, 70% to 78%). Corrected for chance agreement, the mean weighted kappa coefficient was 0.71 (range, 0.66 – 0.74). Kappa values between observers showed statistical significance (p ⬍ .001). Figure 1 demonstrates radiographs of a patient that
the disagreement amounted to a 1-stage differential (eg, 1 observer classifying a set of radiographs as stage II and the other observer as stage III) the disagreement was assigned a two thirds weighting. Disagreement differentials of 2 stages were assigned a one third weighting and differentials of 3 stages
Figure 1. Posteroanterior and lateral radiographs of a patient with lunate collapse and capitate migration. This patient was classified as stage III by all reviewers.
532 Jafarnia et al / Reliability of the Lichtman Classification
Figure 2. Posteroanterior and lateral radiographs of a patient with increased density of the lunate and other changes consistent with Kienbo¨ck’s disease. This patient was classified as stage II by 2 reviewers and stage III by 2 reviewers.
the reviewers all classified as stage III. Figure 2 demonstrates radiographs of a patient who was classified as stage II by 2 reviewers and stage III by 2 reviewers. The surgeons with fellowship training in hand surgery had an overall agreement of 0.72 weighted kappa compared with each other versus 0.70 compared with the orthopedist with no fellowship training in hand surgery. This difference was not statistically significant. The greatest agreement was a weighted kappa of 0.75, which was between the plastic hand surgeon and one of the orthopedic hand surgeons. For reproducibility within observers, the 4 individuals duplicated 79% of their initial classifications (range, 73% to 84%). The mean weighted kappa coefficient was 0.77 (range, 0.72– 0.82) (Table 2). Kappa values within observers were again highly significant (p ⬍ .001).
Four radiographs were deemed to be of mediocre quality. Removing these observations did not appreciably change the results of the analysis. In addition, all the controls were correctly classified as stage I, which is in accordance with the Lichtman system.
Table 2. Intraobserver Reproducibility Observer‡
Kappa Coefficient (weighted)
1 (Orthopedic surgeon)* 2 (Orthopedic surgeon)* 3 (Plastic surgeon)* 4 (Orthopedic surgeon)† Mean
84 83 73 76 79
0.81 0.82 0.72 0.74 0.77
* Fellowship training in hand surgery. † No fellowship training in hand surgery. ‡ No significant difference in reproducibility among different observers.
The Journal of Hand Surgery / Vol. 25A No. 3 May 2000 533
Discussion Classification systems have been used in medicine as guidelines for treatment and prognosis as well as to compare the results of intervention. Such systems must prove to be reliable and reproducible if they are to serve this purpose. Recently, classification systems in orthopedics have been evaluated for interobserver reliability and intraobserver reproducibility with somewhat disappointing results. Thomsen et al11 reported only moderate reliability and reproducibility for both the Lauge-Hansen and Weber classifications of ankle fractures and concluded that there was poor precision of staging within the systems, which diminished their usefulness in daily practice. Frandsen et al7 and Thomsen et al10 found poor reliability for the Garden classification of femoral neck fractures. Sidor et al9 evaluated the Neer classification of proximal humerus fractures and found moderate reliability. Siebenrock and Gerber16 assessed both the Neer and the AO systems and concluded that neither was sufficiently reproducible or reliable to allow meaningful comparison of similarly classified fractures in different studies. This was also found to be the case for the King classification of scoliosis curves in 2 recent studies reported by Lenke et al8 and Cummings et al.6 Although these reports have not had any effect on the clinical application of these classification systems, they do bring to light the limitations of these systems and may help stimulate the revision of these systems and possibly incorporate other imaging and diagnostic modalities. The scheme of Lichtman et al2 is the most widely used system for the classification of Kienbo¨ck’s disease and is used to help determine both treatment options and prognosis. The only report dealing with its interobserver reliability and intraobserver reproducibility appears in the Scandinavian literature in an article by Jensen et al.12 These investigators reported reliability kappa values in the range of 0.45 to 0.52 and reproducibility kappa values from 0.26 to 0.63. Their study had several limitations, however. There were only 48 patients with Kienbo¨ck’s disease and 3 reviewers, only one of which was a hand surgeon. Several of the patients’ radiographs were used multiple times and no control radiographs were included. The current study consists of a larger group of patients with Kienbo¨ck’s disease. Furthermore, the reviewers included both orthopedic and plastic surgeons with fellowship training in hand surgery and an orthopedist with no fellowship training in hand
surgery to determine whether differences in training or background of the reviewer might significantly influence reliability or reproducibility. Although the orthopedic hand surgeons demonstrated higher reproducibility (84% and 83%) than both the plastic hand (73%) and non-hand fellowship-trained orthopedic surgeons (76%), the differences were not statistically significant. The results of our study show that the Lichtman classification system for Kienbo¨ck’s disease has substantial agreement both in terms of interobserver reliability and intraobserver reproducibility. These results compare favorably with the aforementioned studies of other popular orthopedic classification systems based on radiographs. An excellent or almost perfect level of reliability (kappa ⱖ 0.81), however, was never reached for any paired evaluation. Excellent or almost perfect reproducibility was obtained by 2 of the 4 observers. We assessed the specificity of the system by incorporating wrist films from 10 normal patients. This also served to increase our proportion of radiographs that are consistent with stage I Kienbo¨ck’s disease. Radiographs of patients with stage I disease are difficult to find as these patients typically present when they have more advanced disease. Furthermore, stage I radiographs are often normal and necessitate a bone scan or magnetic resonance imaging for diagnosis. All the normal controls were correctly classified as stage I by all 4 observers. There was substantial disagreement on several of the cases between the treating physicians and the majority of the reviewers (Table 1). One possible explanation is that the treating physicians had the benefit of a history and physical examination, which could contribute additional information, such as symptom duration, intensity of pain, and range of motion, that might influence the classifying process in a subtle manner. The strengths of this study lie in its large number of patients for a relatively uncommon disease process, the study design, and the diverse training backgrounds represented by the 4 reviewers. Classification systems are intended to provide treatment guidelines and help compare results of different interventions. The results of this investigation show substantial reliability and reproducibility for the Lichtman classification of Kienbo¨ck’s disease among surgeons familiar with the system. The authors thank Drs Gerard T. Gable and David T. Netscher for their generous participation in this project as reviewers and also for
534 Jafarnia et al / Reliability of the Lichtman Classification their contribution of cases. They also thank Drs David Lichtman, Thomas Mehlhoff, and Marcos Masson for contributing cases.
References 1. Kienbo¨ck R. Concerning traumatic malacia of the lunate and its consequences: degeneration and compression fractures. Clin Orthop 1980;149:4 – 8. 2. Lichtman DM, Mack GR, MacDonald RI, Gunther SF, Wilson JN. Kienbo¨ck’s disease: the role of silicone replacement arthroplasty. J Bone Joint Surg 1977;59A:899 – 908. 3. Bochud RC, Bu¨chler U. Kienbo¨ck’s disease, early stage 3— height reconstruction and core vascularization of the lunate. J Hand Surg 1994;19B:466 – 478. 4. Lichtman DM, Degnan GG. Staging and its use in the determination of treatment modalities for Kienbo¨ck’s disease. Hand Clin 1993;9:409 – 416. 5. Condit DP, Idler RS, Fischer TJ, Hastings H II. Preoperative factors and outcome after lunate decompression for Kienbo¨ck’s disease. J Hand Surg 1993;18A:691– 696. 6. Cummings RJ, Loveless EA, Campbell J, Samelson S, Mazur JM. Interobserver reliability and intraobserver reproducibility of the system of King et al for the classification of adolescent idiopathic scoliosis. J Bone Joint Surg 1998;80A:1107–1111. 7. Frandsen PA, Andersen E, Madsen F, Skjødt T. Garden’s classification of femoral neck fractures: an assessment of interobserver variation. J Bone Joint Surg 1988;70B:588 – 590.
8. Lenke LG, Betz RR, Bridwell KH, et al. Intraobserver and interobserver reliability of the classification of thoracic adolescent idiopathic scoliosis. J Bone Joint Surg 1998; 80A:1097–1106. 9. Sidor ML, Zuckerman JD, Lyon T, Koval K, Cuomo F, Schoenberg N. The Neer classification system for proximal humeral fractures: an assessment of interobserver reliability and intraobserver reproducibility. J Bone Joint Surg 1993;75A:1745–1750. 10. Thomsen NOB, Jensen CM, Skovgaard N, et al: Observer variation in the radiographic classification of fractures of the neck of the femur using Garden’s system. Int Orthop 1996;20:326 –329. 11. Thomsen NOB, Overgaard S, Olsen LH, Hansen H, Nielsen ST. Observer variation in the radiographic classification of ankle fractures. J Bone Joint Surg 1991;73B: 676 – 678. 12. Jensen CH, Thomsen K, Holst-Nielsen F. Radiographic staging of Kienbo¨ck’s disease: poor reproducibility of Ståhl’s and Lichtman’s staging systems. Acta Orthop Scand 1996;67:274 –276. 13. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Measure 1960;20:37– 46. 14. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159 –174. 15. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York: John Wiley & Sons, 1981:223–225. 16. Siebenrock KA, Gerber C. The reproducibility of classification of fractures of the proximal end of the humerus. J Bone Joint Surg 1993;75A:1751–1755.