External quality control for embryology laboratories

External quality control for embryology laboratories

Reproductive BioMedicine Online (2010) 20, 68– 74 www.sciencedirect.com www.rbmonline.com ARTICLE External quality control for embryology laborator...

282KB Sizes 1 Downloads 184 Views

Reproductive BioMedicine Online (2010) 20, 68– 74

www.sciencedirect.com www.rbmonline.com

ARTICLE

External quality control for embryology laboratories Jose Antonio Castilla a,b,c,*, Rafael Ruiz de Assı´n a, Maria Carmen Gonzalvo a, Ana Clavero a, Juan Pablo Ramı´rez Francisco Vergara b,c, Luis Martı´nez a

b,c

,

a

Human Reproduction Unit, Hospital Virgen de las Nieves, 18014 Granada, Spain; b Sperm Bank CEIFER, Granada, Spain; ´n de la Asociacio ´n para el Estudio de la Programa de Control de Calidad Externo para el Laboratorio de Reproduccio ´n (ASEBIR), Madrid, Spain Biologı´a de la Reproduccio c

* Corresponding author. E-mail address: [email protected] (JA Castilla). Jose Castilla graduated in medicine in 1985, completed his PhD in 1986 (Granada University) and then specialised in clinical analyses. He has been Director of the Andrology and Embryology, Virgen de las Nieves, Granada, Spain since 1991 and founded the CEIFER sperm bank in 1993. He has been co-ordinator of the Spanish External Quality Control Programme for Semen Analysis since 1999 and for the Assisted Reproduction Laboratory since 2003. A founder member of the Spanish Association of Clinical Embryologists (ASEBIR), and on its executive committee from 1993 to 2000, he is currently on the executive committee of the Spanish Fertility Society and deputy co-ordinator of ESHRE’s Special Interest Group in Andrology.

Abstract Participation in external quality control (EQC) programmes is recommended by various scientific societies. Results from

an EQC programme for embryology laboratories are presented. This 5-year programme consisted of the annual delivery of (i) materials to test toxicity and (ii) a DVD/CD-ROM with images of zygotes and embryos on days 2 and 3, on the basis of which the participants were asked to judge the embryo quality and to take a clinical decision. A high degree of agreement was considered achieved when over 75% of the laboratories produced similar classifications. With respect to the materials analysed, the specificity was 68% and the sensitivity was 83%. Concerning embryo classification, the proportion of embryos on which a high degree of agreement was achieved increased during this period from 35% to 55%. No improvement was observed in the degree of agreement on the clinical decision to be taken. Day-3 embryos produced a higher degree of agreement (58%) than did day-2 embryos (32%) (P < 0.05). Participation in EQC increased the degree of inter-laboratory agreement on embryo classification, but not the corresponding agreement on clinical decision taking. It is necessary to introduce measures aimed at standardizing decision taking procedures in embryology laboratories. RBMOnline ª 2009, Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved. KEYWORDS: embryology, embryo quality, quality control

Introduction Participation in external quality control (EQC) programmes is recommended by various scientific societies (the Practice Committee of the American Society for Reproductive Medi-

cine [ASRM] and the Practice Committee of the Society for Assisted Reproductive Technology Magli et al., 2008; [SART], 2006) in view of its utility in improving laboratory performance. These programmes should be aimed both at tangible elements (staff, instrumentation, equipment and

1472-6483/$ - see front matter ª 2009, Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.rbmo.2009.09.033

Quality control for embryology laboratories

69

supplies) and at intangible elements (protocols and techniques) (Elder and Kastrop, 2003). With respect to tangible elements, the laboratory products used in assisted reproduction and which come into direct or indirect contact with zygotes and/or embryos should have no negative influence on their viability (Parinaud et al., 1987; Quinn, 2004). Therefore, all such products need to be tested beforehand to determine their degree of toxicity before clinical use (De Jonge et al., 2003; Elder and Kastrop, 2003; Go, 2000; the Practice Committee of the ASRM and the Practice Committee of the Lane et al., 2008; Magli et al., 2008; SART, 2006). In every laboratory, there should be a reliable bioassay for testing the toxicity of media and materials (Go, 2000). The most appropriate bioassay for this purpose is one that provides a sensitive system for approximating the real conditions under which zygotes and embryos are cultured in vitro (De Jonge et al., 2003). There exist a wide variety of bioassays, but none provide an ideal system for testing for embryo toxins (Elder and Kastrop, 2003). Among the most commonly employed are the culture of 1- or 2-cell mouse embryos (Clarke et al., 1995; Gardner et al., 2005; Van den Bergh et al., 1996), the survival of human or hamster spermatozoa (Bavister and Andrews, 1988; Claassens et al., 2000; Elder and Kastrop, 2003; Go, 2000; Rinehart et al., 1988), the use of a small number of surplus oocytes from patients receiving assisted reproduction treatment (Elder and Kastrop, 2003), the culture of somatic cell lines (Elder and Kastrop, 2003; Go, 2000), the culture of multipronucleate embryos (Elder and Kastrop, 2003) and the use of mouse embryo stem cells (Genschow et al., 2000; Kim et al., 2005). As yet, there is no agreement as to which test is the most suitable for this type of assay, and so it has been suggested that it is preferable to simultaneously perform several different test procedures rather than just one (Gardner et al., 2005). In order to ensure that the method used is indeed an appropriate one, the laboratory in question should participate in an EQC programme (De Jonge et al., 2003). With respect to intangible elements such as protocols and techniques, the evaluation of embryo quality is a crucial laboratory task, as it affects the decision as to how many and which embryos should be transferred, which in turn is Table 1 2007.

directly related to the effectiveness of an IVF cycle and to the probability of a multiple pregnancy. Many factors may influence the assessment of embryo quality, including the different systems by which embryos and zygotes are classified, and intra- and inter-observer differences (Arce et al., 2006; Baxter et al., 2006; Keck et al., 2004). It is important that all the members of a team should follow the same criteria in order to be able to work in unison and take coherent decisions; this implies that there should be a degree of standardization of systems for embryo evaluation and for the ongoing training of embryology staff (Arce et al. 2006; Go, 2000; Keck et al., 2004). Concerning inter-laboratory differences in embryo evaluations, differences have been inversely related to the degree of activity, with fewer differences reported among laboratories with high levels of activity (Baxter et al., 2006) and among experienced embryologists (Arce et al., 2006). Therefore, it is necessary to establish mechanisms to standardize embryo evaluation among laboratories. These factors, together with the absence of an EQC programme for human reproduction laboratories in Spain that includes bioassays and embryo evaluation, led us to design, develop and assess a programme with these characteristics.

Materials and methods All the data utilized in the analysis were obtained from the Spanish EQC programme for human reproduction laboratories, organized by Centro de Estudio e Investigacio ´n de la Fertilidad (CEIFER, 2008) and under the auspices of the Spanish Association for the Study of Reproductive Biology (ASEBIR). Over 40 laboratories throughout Spain took part in the programme from 2003 to 2007. The programme examined the evaluation of embryo quality and of the toxicity of materials, using bioassays. An annual examination was made of various materials, from 2003 to 2007 (Table 1). Some of these materials were treated with Armil (Bristol-Myers Squibb, USA) diluted with sterile PBS at a concentration of 1:100 for 5 min and subsequently dried at 37C for 120 min and sterilized in an autoclave. Armil is a liquid disinfectant derived from quaternary ammonium (benzalkonium chloride). This concentra-

Participation in external quality control programme (bioassay testing and embryo evaluation) from 2003 to

Bioassay No. of laboratories participating Newly joined laboratories (%) Material delivered

Embryo evaluation No. of laboratories participating Newly joined laboratories (%)

2003

2004

2005

2006

2007

14 100 Tipa Transfer catheter Pasteur pipette

21 57 Yellow strawa Blue strawa Red straw

15 13 Petri dishb Petri dish Petri dish

13 15 Tipa Tipa Tip

14 29 Tip Tip Tip

30 100

22 41

18 6

19 5

16 13

All tips were 200 ll. Petri dishes were Falcon 1006. a Treated with Armil 1:100. b Batch unsuitable for embryo culture.

70 tion was determined from the human sperm motility assay (HSMA) (Bavister and Andrews, 1998; Claassens et al., 2000; Elder and Kastrop, 2003; Go, 2000; Rinehart et al., 1988) and the mouse embryo assay (MEA) (Clarke et al., 1995; Gardner et al., 2005; Van den Bergh et al., 1996), and was the lowest at which there was found to be a significant reduction in sperm motility and of mouse embryo development. The participating laboratories were totally unaware that this treatment had been applied. All the materials in question were previously tested by the programme administrators. The participating laboratories were asked to state whether the material analysed was toxic or not. In 2005, three batches of Falcon 1006 culture plates (Falcon 1006–01711051, Falcon 1006–4066072, Falcon 1006– 4185534) were sent to the laboratories. However, the suppliers then informed us that one batch (Falcon 1006– 4185534) was not suitable for embryo culture; these plates were not treated or tested by the organization, and the laboratory results were not included in the analysis. In 2007, only non-toxic materials were sent. In addition, each laboratory was sent a DVD/CD-ROM with videos of zygotes and of day-2 and day-3 embryos. The optic and magnification used for the recording were Hoffman and 400·, respectively. The recordings were carried out as daily routine assessment of embryo quality. Each batch consisted of embryo images obtained from different couples. Each video was divided into five groups (except 2007 with four groups), the first one with five videos of zygotes (this group was not sent in 2007), the next two groups with five videos each, showing day-2 embryos, and the final two groups with five videos each, showing day-3 embryos. The laboratories were asked to classify each zygote or embryo as optimal, moderate or poor quality. They then had to decide which two zygotes were considered most suitable to remain in culture and which should be cryopreserved or discarded and, with respect to the embryos, decide for each batch (i.e. day 2 or day 3) which two embryos should be transferred and of those not transferred which should be cryopreserved and which should be discarded. A high degree of agreement was considered to have been achieved when over 75% of the laboratories produced similar classifications of the embryo or zygote. The variables in the analysis were compared using the chi-squared test with a significance level of 5%. The analysis included tests of the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the bioassays. For the purposes of this analysis, sensitivity was defined as the percentage of toxic materials detected by the bioassays, specificity as the percentage of non-toxic materials identified as such by the bioassays, PPV as the percentage of bioassays that detected toxicity in a toxic material and NPV as the percentage of bioassays that detected no toxicity in a non-toxic material. The total accuracy of the bioassay was defined as the sum of the laboratories that detected toxicity in a toxic material and of those that detected no toxicity in a non-toxic material.

JA Castilla et al. the programme of bioassays and embryo evaluation, are shown in Table 1.

Bioassays HSMA was used as the bioassay by 85.7% of the laboratories, MEA was used by 11.7% and both tests were used by 2.6%. The methodology used for HSMA varied considerably among laboratories (Table 2). The sperm concentration range was 0.1–108 · 106 sperm/ml. All laboratories carried out HSMA at 37C. The percentage of total accuracy achieved was similar during each year of the study period (Figure 1). The mean total accuracy in 2003, 2004, 2006 and 2007 was 74.6% of the tests carried out. When the material supplied was non-toxic, it was accepted by 68% (specificity) of the laboratories, and when it was toxic it was rejected by 83% (sensitivity) of the participants (P < 0.01) (Table 3). The PPV was 67% and the NPV was 84%. The laboratories that used HSMA decided whether or not to accept the material on the basis of the decrease in sperm motility recorded at 24 and/or 48 h. The mean reduction in sperm motility used by the laboratories to accept a given material was 33% at 24 h with respect to the motility at 0 h and 71% at 48 h with respect to the motility at 0 h. When the material was rejected, the mean value for the decrease

Table 2 Characteristics of the human sperm motility assay, as carried out by the participating laboratories. Protocol detail

No. of laboratories (%)

Well Microdrop Tube Oil overlay used No oil overlay used Directly with material Medium-washed material Total motility Progressive motility Both 37C Room temperature

29 (58.0) 16 (32.0) 5 (10.0) 23 (46.0) 27 (54.0) 30 (63.8) 17 (36.2) 10 (33.3) 17 (56.7) 3 (10.0) 50 (100) 0 (0.0)

Results The number of laboratories participating during the study period, together with the percentage of new clinics joining

Figure 1 Success rates in evaluation of materials. Sensitivity could not be calculated in 2007 as only non-toxic materials were sent.

Quality control for embryology laboratories Table 3

71

Classification of material by laboratory according to toxicity of materials.

Laboratory classification

Toxic Non-toxic Total

Material delivered Toxic

Non-toxic

Total

64 13 77

32 68 100

96 81 177

Sensitivity = 83% (64/77); specificity = 68% (68/100); positive predictive value = 67% (64/96); negative predictive value = 84% (68/81); total accuracy = 75% ((64 + 68)/177).

Figure 2 Mean percentage decrease in sperm motility and acceptance or rejection of materials (2003–2007). Bars = SD.

in sperm motility was 81.0% at 24 h and 94.0% at 48 h (Figure 2). Only one of the laboratories, and in the case of a single bioassay (1/222), reported the occurrence of contamination of one of the cultures during the material evaluation process: this was assumed to have been rejected.

Evaluation of embryos The percentage of zygotes for which there was a high degree of agreement among laboratories tended to improve during the study period (Figure 3). The percentage of embryos (days 2 and 3) for which there was a high degree of agreement increased year on year, from 35% in 2003 to 55% in 2007 (Figure 3). A higher percentage of embryos with a high degree of agreement was observed when the embryo classification most commonly determined was ‘poor’ than when it was ‘moderate’ or ‘optimal’ (P < 0.05) (Table 4). There was a significantly higher percentage of embryos with

Figure 3 Percentage of embryos (day 2 and day 3) and zygotes with a high degree of agreement.

a high degree of agreement when the evaluation was made of day-3 embryos (58%, 29/50) than of day-2 embryos (32%, 16/50) (P < 0.05) (Table 5). With respect to the clinical decision, there were no significant differences between the years (Figure 4). There was observed to be a non-significant trend towards higher agreement rates when decisions were taken on day-3 embryos (46%, 23/50) than on day-2 embryos (42%, 21/50) or zygotes (35%, 7/20). The percentage of embryos for which a high degree of agreement was obtained was significantly greater when the most common clinical decision taken was to reject the embryo (66.7%, 18/27) than to cryopreserve it (30.8%, 8/26) or transfer it (42.9%, 18/42) (P < 0.05) (Table 6). In the case of zygotes, too, there was observed to be a trend for the highest degrees of agreement to be attained when the clinical decision most often taken was that of rejection (66.7%, 2/3), rather than maintaining them in the culture (33.3%, 3/9) or cryopreserving them (25.0%, 2/ 8), although this difference was not significant. On examining the relationship between the classification made of embryos and the clinical decision taken, there was seen to be a certain degree of coherence, with the embryos that were classified by most laboratories as being of ‘optimal’ quality being transferred by most laboratories and the ‘poor’ quality ones being rejected (Table 7).

Discussion Among all Spanish embryology laboratories (187 in 2003; Andersen et al., 2007), an annual average of 15 (8.0%) took part in the bioassay programme, and 21 (11.2%) took part in the image assessment programme. Participation was lower than in EQC programmes for semen analysis (Alvarez et al., 2005), which might be due to the fact that some general laboratories participate in the latter programme. Be that as it may, the participation rate for the present study is low in view of the fact that various scientific societies recommend taking part in programmes of this kind (the Practice Committee of the ASRM and the Practice Committee of the Magli et al., 2008; SART, 2006) and the European Directive 2004/23/EC concerning the setting of standards of quality and safety for the donation, procurement, testing, processing, preservation, storage and distribution of human cells and tissues requires the presence of quality systems in embryology laboratories. However, inspections and accreditations following this directive by the competent authorities have not yet started in Spain and it is necessary

72

JA Castilla et al. Table 4

Variability among laboratories in the classification of day-2 and day-3 embryos. Most chosen quality

Inter-laboratory agreement rates

High (>75%) Low (<75%) Total

Total

Optimal

Moderate

Poor

12 (41.4) 17 (58.6) 29

12 (28.6) 30 (71.4) 42

21 (72.4) 8 (27.6) 29

45 (45.0) 55 (55.0) 100

Values are number (%). Poor quality versus optimal quality: P < 0.05. Poor quality versus moderate quality: P < 0.05.

Table 5

Variability among laboratories in the classification of zygotes and day-2 and day-3 embryos. Stage

Inter-laboratory agreement rates

High (>75%) Low (<75%) Total

Zygote

Day 2

Day 3

8 (40.0) 12 (60.0) 20

16 (32.0) 34 (68.0) 50

29 (58.0) 21 (42.0) 50

Values are number (%). Day 3 versus day 2: P < 0.05.

Figure 4 Clinical decision: percentage of zygotes (2003– 2006) and day-2 and day-3 embryos (2003–2007) with a high degree of agreement.

to take steps to encourage the participation of embryology laboratories in this type of programme. HSMA was used by 85.7% of the laboratories for the bioassay, which is a higher rate than that reported for similar programmes (39.4%) such as that of the American Association of Bioanalysts (AAB) (AAB, 2008). The high variability among laboratories observed in the present study with

Table 6 2007).

respect to the HSMA procedure is in line with the results reported for a similar programme (AAB, 2008). The interpretation of HSMA results for Spanish laboratories concurs with that reported by Claassens et al. (2000), who considered a medium or material to be toxic when it provoked a decrease in sperm motility exceeding 75% at 24 h. In this study, the laboratories rejected a material when it provoked a decrease of 81.0% at 24 h. These data conflict with those of the AAB, for which a medium was considered toxic when decreases of 37.1% were observed at 24 h, and of 66.9% at 48 h (AAB, 2008). These disparities may result from differences in the origin of the semen sample used, because if a laboratory utilizes known donors for the bioassay, then it will class as abnormal smaller reductions in motility than would be the case if each survival test were carried out with different donors or patients: in this latter circumstance, larger decreases in motility would be needed for a material to be considered non-optimum. Spanish laboratories are more sensitive (83.1%) than specific (68.0%) with respect to the analysis of material toxicity. According to the data published by the AAB for 2007

Variability among laboratories in the clinical decision on day-2 and day-3 embryos (2003–

Inter-laboratory agreement rates

High (>75%) Low (<75%) Total Values are number (%). Rejection versus cryopreservation: P < 0.05.

Most chosen clinical decision on embryos Transfer

Cryopreservation

Rejection

18 (42.9) 24 (57.1) 42

8 (30.8) 18 (69.2) 26

18 (66.7) 9 (33.3) 27

Total

44 (46.3) 51 (53.7) 95

Quality control for embryology laboratories

73

Table 7 Relationship between the most chosen embryo classification and the most chosen clinical decision. Most chosen clinical decision on embryos

Transfer Cryopreservation Rejection

Most chosen embryo quality Optimal

Moderate

Poor

23 4 0

16 21 3

3 1 24

P < 0.05.

and 2008, the sensitivity is 82.4% (AAB, 2007, 2008), a value that is very similar to ours, while specificity is 89.5% (AAB, 2007, 2008), which is well above those for the results presented here. This might be due to the above-commented differences concerning the interpretation of HSMA results. Although there are differences between the PPV found in the present study (66.7%) and that of the AAB programme (88.7%) (AAB, 2007, 2008), the corresponding NPV results were comparable, with 84.0% in this study versus 83.6% in the AAB programme (AAB 2007, 2008). Hence, it seems that it is more acceptable to classify non-toxic material falsely as toxic (low PPV) than to classify toxic material as non-toxic (low NPV). This approach would prevent toxic material from being used in the routine practice of an embryo laboratory. During the study period, no increase was observed in the total accuracy of the bioassays; the mean value in this respect was 74.6%, which is similar to the results reported by Genschow et al. (2002, 2000), with accuracy rates of 70–80%, although lower than the values published by the AAB for 2007 and 2008, with a combined accuracy rate of 85.9% (AAB, 2007, 2008). These differences might be due to the different materials used for the programmes (the AAB programme evaluated mediums while this study analysed materials), or to differences in bioassay protocols, which have considerable weight in bioassay results (Cai et al., 2006; De Jonge et al., 2003; Gardner et al., 2005; Go, 2000; Lane et al., 2008; Scott et al., 1993; Van den Bergh et al., 1996). The existence of a high level of variability between laboratories with respect to criteria for the classification of embryos and zygotes, whether via systems based on embryo scoring (De Placido et al., 2002; Desai et al., 2000; Fisch et al., 2001; Holte et al., 2007; Sharpe-Timms and Zimmer, 2000) or on embryo grading (Sharpe-Timms and Zimmer, 2000), led us to opt for a three-category classification system (optimal, moderate and poor) on the basis of its greater simplicity and coincidence with the criteria of Baxter et al. (2006). The results of this study show that there is a higher degree of agreement when an embryo is classified as poor quality that when it is considered optimal or moderate, which is in agreement with the results of Arce et al. (2006). In the present study, there was seen to be a lower inter-individual variability in the evaluation of day-3 embryos than of day-2 embryos. These results do not agree with those of Arce et al. (2006), who reported contrary results. This discrepancy could be due to the fact that complete embryos were evaluated in the present study, whereas in that by Arce et al. (2006), individual embryo characteristics were evaluated, and obviously it is more difficult to evaluate the number of cells in day-3 embryos than in day-2 ones.

Over the study period, there was seen to be an increase in the level of agreement between laboratories concerning the classification of embryos and zygotes, an increase that might be accounted for simply by the participation in the EQC programme. When different laboratories participate in training sessions featuring photos and common procedures, criteria tend to become more standardized (Arce et al. 2006; Keck et al., 2004). Thus, this study, by communicating the final results of the EQC programme to the participating laboratories, seems to have helped to increase the degree of consensus operating. Among the participating laboratories there was a considerable degree of agreement concerning the taking of a clinical decision on a zygote or an embryo: optimal embryos were transferred and poor zygotes were rejected. Nevertheless, this coherence does not guarantee low variability in the clinical practice of embryologists, as shown by Matson (1998) in a multicentre evaluation of embryo image assessment. In the present study, there was not found to be any increase, over the years, in the degree of agreement concerning the taking of clinical decisions. Nevertheless, greater agreement was observed when an embryo was to be rejected than when it was to be cryopreserved or transferred. The EQC considered in the present study presents various limitations. First, the use of video meant that recording time was limited, and also that the embryos were not rolled for observation from various angles; thus, the context was an artificial one, in which the embryologist had no control. However, Arce et al. (2006) have demonstrated the validity of a digital imaging system similar to ours for inter-embryologist comparisons. Second, it would have been preferable to be able to study the rate of implantation of the embryos evaluated, and thus to see whether the inter-individual variations in clinical decisions and in embryo quality were influential in the effectiveness of the IVF cycle. This would also have enabled us to tell whether this study’s system of embryo evaluation, in three categories, was excessively simplistic, affecting the rates of implantation, and thus to derive a trade-off between the evaluation system and the pregnancy rate achieved. Carrying out such a study is clinically a very complicated task, as all the embryos evaluated would have to correspond to the same couple, and the latter should have no more embryos. Third, this study was set up under the limitation of transferring two embryos; this prevented us from studying the differences among laboratories with respect to the number of embryos transferred. Although in this study a very low number of laboratories was used, Matson (1998) observed large differences among the number of embryos to be transferred, in a similar multicentre study. Another limitation of this study is that it did

74 not examine the question of intra-observer agreement. Nevertheless, various authors have shown there to be good intra-observer agreement in similar situations (Arce et al. 2006; Baxter et al., 2006). In conclusion, this study has demonstrated the viability of a national EQC programme that involves detecting toxicity in various laboratory materials, by means of a bioassay and embryo evaluation using image analysis. The large inter-laboratory differences observed in bioassays, together with the discrepancies in embryo evaluation, make it necessary to implement further measures to consolidate and standardize EQC programmes.

References Alvarez, C., Castilla, J.A., Ramı´rez, J.P., et al., 2005. External quality control program for semen analysis: Spanish experience. J. Assist. Reprod. Genet. 22, 379–387. American Association of Bioanalysts 2007 Proficiency Testing Service. Available at http://www.aab-pts.org/Stats [accessed October 2009]. American Association of Bioanalysts 2008 Proficiency Testing Service. Available at http://www.aab-pts.org/Stats [accessed October 2009]. Andersen, A.N., Goossens, V., Gianaroli, L., et al., 2007. Assisted reproductive technology in Europe, 2003. Results generated from European registers by ESHRE. Hum. Reprod. 22, 1513– 1525. Arce, J.C., Ziebe, S., Lundin, K., et al., 2006. Interobserver agreement and intraobserver reproducibility of embryo quality assessments. Hum. Reprod. 21, 2141–2148. Bavister, B.D., Andrews, J.C., 1988. A rapid sperm motility bioassay procedure for quality-control testing of water and culture media. J. In Vitro Fert. Embryo Transfer 5, 67–75. Baxter, A.E., Mayer, J.F., Shipley, S.K., Catherino, W.H., 2006. Interobserver and intraobserver variation in day 3 embryo grading. Fertil. Steril. 86, 1608–1615. Cai, X., Pomeroy, K.O., Mattox, J.H., 2006. Application study of human sperm motility bioassay in IVF laboratory quality control. Zhonghua Nan Ke Xue 12, 625–628. Centro de Estudio e Investigacio ´n de la Fertilidad 2008. Available at http://www.ceifer.es/calidad/ [accessed 15 October 2009]. Claassens, O.E., Wehr, J.B., Harrison, K.L., 2000. Optimizing sensitivity of the human sperm motility assay for embryo toxicity testing. Hum. Reprod. 15, 1586–1591. Clarke, R.N., Griffin, P.M., Biggers, J.D., 1995. Screening of maternal sera using a mouse embryo culture assay is not predictive of human embryo development or IVF outcome. J. Assist. Reprod. Genet. 12, 20–25. De Jonge, C.J., Centola, G.C., Reed, M.L., et al., 2003. Human sperm survival assay as a bioassay for the assisted reproductive technologies laboratory. J. Androl. 24, 16–18. De Placido, G., Wilding, M., Strina, I., et al., 2002. High outcome predictability after IVF using a combined store for zygote and embryo morphology and growth rate. Hum. Reprod. 17, 2402– 2409. Desai, N.N., Goldstein, J., Rowland, D.Y., Goldfarb, J.M., 2000. Morphological evaluation of human embryos and derivation of an embryo quality scoring system specific for day 3 embryos: a preliminary study. Hum. Reprod. 15, 2190–2196. Elder, K.T., Kastrop, P., 2003. Control de calidad en laboratorios de fertilizacio ´n in vitro. Reprod. Hum. 3 (1), 13–20. Fisch, J.D., Rodriguez, H., Ross, R., et al., 2001. The graduated embryo score (GES) predicts blastocyst formation and pregnancy rate from cleavage-stage embryos. Hum. Reprod. 16, 1970– 1975.

JA Castilla et al. Gardner, D.K., Reed, L., Linck, D., et al., 2005. Quality control in human in vitro fertilization. Sem. Reprod. Med. 23, 319–324. Genschow, E., Scholz, G., Brown, N., et al., 2000. Development of prediction models for three in vitro embryotoxicity tests in an ECVAM validation study. In Vitro Mol. Toxicol. 13, 51–66. Genschow, E., Spielmann, H., Scholz, G., et al., 2002. The ECVAM international validation study on in vitro embryotoxicity tests: results of the definitive phase and evaluation of prediction models. European Centre for the Validation of Alternative Methods. Altern. Lab. Anim. 30, 151–176. Go, K.J., 2000. Quality control: a framework for the ART laboratory. In: Kal, B.A., May, J.V., De Jonge, C.I. (Eds.), Handbook of the Assisted Reproduction Laboratory. CRC Press, Boca Raton, pp. 253–268. Holte, J., Berglund, L., Milton, K., et al., 2007. Construction of an evidence-based integrated morphology cleavage embryo score for implantation potential of embryos scored and transferred on day 2 after oocyte retrieval. Hum. Reprod. 22, 548–557. Keck, C., Fischer, R., Baukloh, V., Alper, M., 2004. Quality management in reproductive medicine. In: Gardner, D.K., Weissman, A., Howles, C.M., Shohan, Z. (Eds.), Textbook of Assisted Reproductive Techniques. Laboratory and Clinical Perspectives. Taylor and Francis, London and New York, pp. 477–494. Kim, J.J., Patton, W.C., Corselli, J., et al., 2005. Mouse embryonic stem cells for quality control testing in assisted reproductive technology programs. J. Reprod. Med. 50, 533–538. Lane, M., Mitchell, M., Cashman, K.S., et al., 2008. To QC or not to QC: the key to a consistent laboratory? Reprod. Fertil. Dev. 20, 23–32. Magli, M.C., Van den Abbeel, E., Lundin, K., et al., 2008. Revised guidelines for good practice in IVF laboratories. Hum. Reprod. 23, 1253–1262. Matson, P.L., 1998. Internal and external quality assurance in the IVF laboratory. Hum. Reprod. 13, 156–165. Parinaud, J.M., Reme, J.M., Monrozies, X., et al., 1987. Mouse system quality control is necessary before the use of new material for in vitro fertilization and embryo transfer. J. In Vitro Fertil. Embryo Transfer 4, 56–58. The Practice Committee of the American Society for Reproductive Medicine and the Practice Committee of the Society for Assisted Reproductive Technology, 2006. Revised guidelines for human embryology and andrology laboratories. Fertil. Steril. 86 (Suppl. 4), 57–72. Quinn, P., 2004. The development and impact of culture media for assisted reproductive technologies. Fertil. Steril. 81, 27–29. Rinehart, J.S., Bavister, B.D., Gerrity, M., 1988. Quality control in the in vitro fertilization laboratory: comparison of bioassay system for water quality. J. In Vitro Fertil. Embryo Transfer 5, 335–342. Scott, L.F., Sundaram, S.G., Smith, S., 1993. The relevance and use of mouse embryo bioassays for quality control in an assisted reproductive technology program. Fertil. Steril. 60, 559–568. Sharpe-Timms, K.L., Zimmer, R.L., 2000. Oocyte and pre-embryo classification. In: Kal, B.A., May, J.V., De Jonge, C.I. (Eds.), Handbook of the Assisted Reproduction Laboratory. CRC Press, Boca Raton, pp. 179–196. Van den Bergh, M., Baszo ´, I., Biramane, J., et al., 1996. Quality control in IVF with mouse bioassays: a four years’ experience. J. Assist. Reprod. Genet. 13, 733–738. Declaration: The authors report no financial or commercial conflicts of interest. Received 28 July 2008; refereed 13 October 2008; accepted 10 September 2009.