Consistent naming in scientific writing: sound advice or shibboleth?

Consistent naming in scientific writing: sound advice or shibboleth?

English for Specific Purposes 22 (2003) 113–130 Consistent naming in scientific writing: sound advice or shibboleth? Guy J...

264KB Sizes 0 Downloads 158 Views

English for Specific Purposes 22 (2003) 113–130

Consistent naming in scientific writing: sound advice or shibboleth? Guy J. Norman* Servicio de Traduccio´n, Facultade de Farmacia, Universidade de Santiago de Compostela, 15782 Santiago de Compostela, A Corun˜a, Spain Received 1 January 2001; accepted 1 April 2002

Abstract Scientific style manuals typically stress that scientific texts should use consistent terminology: in other words, a given entity or process should consistently be denominated by the same lexeme. But to what extent do native-English-speaker scientific writers actually follow this advice? To investigate this question, I analyzed anaphoric references in a sample of biomedical research abstracts. My results indicate that (1) straight repetition is indeed a common anaphoric strategy; (2) proforms are used infrequently; (3) where straight repetition would be inappropriate because of Given/New structure or other considerations, writers typically make use of reductive head-repetition (e.g. erythrocytes referring back to murine erythrocytes) or determiner-plus-hypernym structures (e.g. these cells); and (4) packaging devices (notably packaging nominalizations as defined by Halliday) have anaphoric function, and occur very frequently. The anaphoric use of reductive head-repetition forms part of a much wider system of taxonomy construction and manipulation, based on nominal groups with ‘‘general nouns’’ (such as protein) as head. In general, and despite the occasional use of synonyms, these findings suggest that the conventional style-manual exhortation to use consistent terminology is sensible advice grounded in native-English-speaker practice. Applications of these findings in the second-language academic writing classroom are briefly discussed. # 2003 The American University. Published by Elsevier Science Ltd. All rights reserved.

Choose the right form of the right word and stick with it. In English courses, you were probably taught to vary words for the sake of variety. This is fine for creative writing but not for scientific writing. Keep in mind that such variation can be confusing. . . (Day, 1995, p. 22). * Tel.: +34-98-156-3100x15094; fax: +34-98-159-4912. E-mail address: [email protected] (G.J. Norman). 0889-4906/03/$22.00 # 2003 The American University. Published by Elsevier Science Ltd. All rights reserved. PII: S0889-4906(02)00013-3


G.J. Norman / English for Specific Purposes 22 (2003) 113–130

Scientific style manuals like Day (1995) typically stress that a scientific text should use consistent terminology: in other words, a given entity or process should consistently be denominated by the same lexeme. And indeed, I would suggest that most people involved in the practice or teaching of scientific writing would agree: inconsistent naming is a rather common problem that often makes texts harder to read. The authentic example that follows (from a Spanish medical researcher) is a particularly dramatic illustration of the way in which many inexperienced writers strive to achieve ‘‘elegant variation’’, with stylistically ludicrous results: In 62 cases it was the first convulsion, in two instances it was the second event and two patients had antecedents of 3 or more seizures. More typically, the problem is less dramatic but has more insidious (and thus probably more serious) effects on readability: for example, when an author uses the nominal-group head ‘‘area’’ for one class of spatial region, and the near-synonym ‘‘zone’’ for another class, and then at some subsequent stage uses ‘‘this area’’ to refer to a region of the second class. Nevertheless, the issue of intratextual consistent naming is clearly far from simple.1 Native-English-speaker (NES) scientific texts clearly do not use absolutely consistent terminology: in other words, anaphoric reference is by no means always achieved by straight repetition. Here, my primary aim has been to investigate exactly how anaphoric reference is achieved in scientific texts, and to this end I have analyzed a sample of 10 biomedical research abstracts from US and British journals (see Appendix for a list of sources). I consider the extent to which my findings support the style-manual advice to use consistent naming, and briefly discuss pedagogic implications. It is worth noting that major current academic-writing texts for researchers (notably Weissberg & Buker, 1990, and Swales & Feak, 1994) do not consider this issue.

1. What is anaphoric reference? An anaphor is a word or phrase that refers back to one or more words or phrases occurring earlier in the same text.2 But what exactly does ‘‘refer back’’ mean? The classic source in the discourse analysis literature is Halliday and Hasan (1976), who list three types of coreferential anaphoric relation: identical (same referent, i.e. same 1 This article deals solely with intratextual consistency, although the issue of intertextual consistency is clearly related: for example, an author’s use of inconsistent naming within a given text may be a response to terminological variation in the literature. 2 Here I consider only nominal-group anaphors: many authors (e.g. Quirk, Greenbaum, Leech, & Svartvik, 1985) define anaphoric reference solely with regard to nominal groups, and indeed Hoey (1991) states that ‘‘Lexical items with the grammar of verbs or adjectives cannot [. . .] be conveniently said to refer at all’’. I would dispute this latter assertion: there are clearly cases in which an adjective or verb is coreferential in a meaningful sense. Nevertheless, such cases are relatively rare, and of marginal importance in defining the meaning of coreference.

G.J. Norman / English for Specific Purposes 22 (2003) 113–130


Table 1 Summary of the classification of anaphor–antecedent relations used in the present study, showing correspondences with the categories of referential relation included in the typology of Halliday and Hasan (1976) and in that of the DRAMA text-annotation scheme (Passonneau, 1996; see Davies & Poesio, 1998) Present study

Halliday and Hasana



Same referent


Not included



Contrastive Packaging

Exclusive Not included

Identity of coreference Genitive/possessive pronouns Set/subset/member relationships Part/whole relationships Set/subset/member relationship Part/whole relationships Set relationships in plural NPs Implicit partitives and pseudo-partitives Not explicitly included

a Halliday and Hasan’s classification also includes the category ‘‘unrelated’’, not considered in the present study as not relevant to the issue of consistent naming. b The DRAMA scheme also includes three categories of inference-dependent referential relations that are likewise not considered in the present study, again because they are not relevant to the issue of consistent naming; these categories are (1) ‘‘causal inference’’ (an explosion ! the noise), (2) ‘‘propositional inference’’ (It’s sunny ! That’s a relief) and (3) ‘‘implicit arguments’’ (Look at the car! The wheel just fell off).

thing or same class of things), inclusive (the anaphor is a set of which the antecedent is a member), and exclusive (the anaphor and the antecedent are explicitly different members of a single set).3 Subsequent work in discourse analysis (see Hoey, 1991, and references therein) has not introduced significant changes to this classification. More detailed classifications have been published in the computational linguistics literature. In such classifications, basically derived from Clark (1977), endophoric relations are typically divided into two categories, identical (corresponding to Halliday & Hasan’s identical) and bridging (all other types of phoric relation). A particularly comprehensive classification of bridging relations is given in the DRAMA text-annotation scheme (Passonneau, 1996; see Davies & Poesio, 1998) (Table 1). In view of the specific aims of the present study, I have used a five-category classification of types of referential relation, drawing upon both Halliday and Hasan (1976) and Passonneau (1996): three of these categories correspond directly to those of Halliday and Hasan (one-to-one, zoom-out and contrastive, respectively, for identical, inclusive and exclusive), but I additionally include another two categories, which I shall term zoom-in (the anaphor is a member of the set denominated by the antecedent) and packaging (the anaphor ‘‘packages’’ semantic content already presented). This classification is summarized in Table 1.


Halliday and Hasan’s classification is in fact not of referential relations in anaphoric reference as such, but of referential relations in lexical cohesion, an evidently wider context. They state (p. 282) that ‘‘It is not necessary for two lexical occurrences to have the same referent [. . .] in order for them to be cohesive’’; and in accordance with this, their classification in fact includes a fourth category, ‘‘unrelated’’. Anaphors falling into this category can be seen as ‘‘referring back’’ in a purely prosodic sense, and are thus of only minor importance in the present context.


G.J. Norman / English for Specific Purposes 22 (2003) 113–130

My category zoom-in (e.g. mouse referring back to rodent) is not considered by Halliday and Hasan, and is implicitly rejected as a category of ‘‘repetition’’ by Hoey (1991, pp. 69–70), on the grounds that there is an information gain from antecedent to anaphor, and thus no strict identity of reference. However, I would argue that zoom-in reference is clearly coreferential, in the same way that zoom-out and exclusive reference are coreferential in a broad but meaningful sense (i.e. both anaphor and antecedent are referring to the same set or members thereof). This category is widely accepted in the computational linguistics literature. My category packaging is not considered either by Halliday and Hasan (1976) or Hoey (1991), or by Passonneau (1996) and other computational linguistics researchers (see Davies & Poesio, 1998). However, Halliday must of course be fully credited with the ‘‘discovery’’ of this category, though in another context and without explicit recognition of its anaphoric function. Specifically, packaging nominalization is defined by Halliday (1988) in the context of Given/New structure, as a device for ‘‘1) packaging a complex phenomenon into a single semiotic entity, by making it one element of clause structure, so that 2) its rhetorical function—its place in the unfolding argument—is rendered fully explicit’’. Rather lengthy examples are given by Halliday: a shorter example from one of the texts analyzed in the present study is as follows: 1. Recombinant falcipain rapidly hydrolyzed both denatured and native hemoglobin. Hemoglobin hydrolysis was blocked by cysteine protease inhibitors. . . [Abstract 5] Here, the nominal group hemoglobin hydrolysis ’’packages’’ the semantic content of the preceding sentence. As pointed out by Halliday (1988), such nominalization greatly facilitates Given/New structuring. In addition, though, packaging nominalization is clearly also a device for anaphoric reference. Related to Halliday’s packaging nominalization is ‘‘encapsulation’’ or ‘‘retrospective labeling’’ as described by Francis (1994): ‘‘A retrospective label serves to encapsulate or package a stretch of discourse. [. . .] It is not a repetition or ‘synonym’ of any preceding element. Instead, it is presented as equivalent to the clause or clauses it replaces, while naming them for the first time’’. Note that Hallidayan packaging is nominalization of semantic content (e.g. ‘‘hemoglobin hydrolysis’’), whereas Francis’s encapsulation is basically metatextual in function (e.g. ‘‘this issue’’). In the analysis performed for the present study, Francis’s encapsulation was annotated as ‘‘packaging reference’’; in most cases, however, encapsulation is not directly relevant to the issue of consistent naming, since the writer does not have the option of repeating the antecedent (though note, for example, ‘‘Smith (1999) has suggested that. . .’’ followed by ‘‘This suggestion. . .’’). In the analysis, I have classified each anaphoric item not only in terms of category of referential relation (Table 1), but also—following Halliday and Hasan (1976)—in terms of lexicogrammatical realization (see Table 2). The categories of lexicogrammatical realization listed in Table 2 are fairly self-explanatory, and will in any case be defined as they are used, in Section 3. More generally, however, note that correspondences between the different terms used by authors in this field are com-

G.J. Norman / English for Specific Purposes 22 (2003) 113–130


Table 2 Summary of the present study’s classification of lexicogrammatical realizations of anaphoric reference, showing broad correspondences with the categories of Halliday and Hasan’s (1976) classification of types of lexical cohesion (LC), and Hoey’s (1991) classification of types of ‘‘repetition’’ Present study

Halliday and Hasan


Proformal (p) Repetition (r) [Repetition] Abbreviation (a) Reductive head-repetition (x) Expansive head-repetition (y) Substitutive head-repetition (z) Synonym (s) Hyponym (o) Hypernym (e) [Hypernym] Packaging device (d) Not included

Reference Same-word reiteration LC Same-word reiteration LC Same-word reiteration LC? Same-word reiteration LC? Same-word reiteration LC? Same-word reiteration LC? Synonym reiteration LC Not included Superordinate reiteration LC General-word reiteration LC Not included Collocation LC

Non-lexical repetition Simple repetition Complex repetition Simple reiteration? Simple reiteration? Simple reiteration? Simple reiteration? Simple-paraphrase repetition Hyponymic repetition Superordinate repetition Superordinate repetition Not included Complex-paraphrase repetition

plex, and will not be discussed in detail here (see Hoey, 1991 and Davies & Poesio, 1998 for review). In particular, it should be stressed that ‘‘coreferential’’ is here used to refer to any of the five categories of referential relation listed in Table 1; this contrasts with the standard usage of this term in computational linguistics, where it means one-to-one reference only. 2. Some problems of analysis In the course of the analysis (see Section 3), three particular problems arose, and are perhaps worth noting here. First, how should an anaphor be delimited? The simplest situation is for a nominal group to refer back to another nominal group. As is well known, however, the nominal groups of scientific writing are often very complex, and a single group may contain several anaphoric references. For example, a premodifier of a nominal group X may refer back to another nominal group Y, without the head of X itself being anaphoric (as with ‘‘Plasmodium falciparum’’ in ‘‘a Plasmodium falciparum trophozoite cysteine proteinase’’, Abstract 5). Similarly, a premodifier may refer back to one antecedent and the head of the group to another, while at the same time the group as a whole makes packaging reference to an entire sentence. Often, then, it is not possible to identify clearly delimited discrete anaphors. In the present analysis, I have tried to annotate all significant anaphoric references. Second, is the antecedent the first mention of the word or phrase in question, the anaphor-preceding mention, or all mentions? This problem has been considered in some depth by Hoey (1991), but is of relatively minor importance in the short texts considered here. Third, it is not always clear to which antecedent/s the author wishes to refer with a given anaphor. In such cases, we must consider three possibilities: (a) that the identity


G.J. Norman / English for Specific Purposes 22 (2003) 113–130

of the antecedent is unclear to the analyst, but would be clear to a target-community reader, (b) that the identity of the antecedent is genuinely unclear, but that this lack of clarity would have no significant effect on understanding of the overall message by a target-community reader, or (c) that the antecedent is genuinely unclear, with significant implications for understanding by a target-community reader. It might in fact be argued that possibility (c) can be excluded by definition, since these are published abstracts (i.e. texts that have demonstrably met the requirements of the discourse community); however, I would suggest that this is a simplistic position. In any case, discrimination between these three possibilities is complex, and will not be considered in detail here: suffice it to say that a number of apparently unclear anaphoric relations were identified in the sample (see Sections 5 and 6).

3. Anaphoric reference in 10 biomedical abstracts The analysis considered 10 biomedical abstracts, all relating to malaria, a disease caused by blood parasites of the genus Plasmodium, notably Plasmodium falciparum. The abstracts were randomly sampled from the medical abstracts database MEDLINE (1995), by searching for the term ‘‘malaria’’. This gave 693 hits: I then selected the first 10 complete abstracts (database entries for abstracts over 250 words are automatically truncated) that met the following criteria: (1) from a major American and British journal, (2) with an apparently NES main author (i.e. from a US or British institution, with no evident non-NES grammatical errors in the text), and (3) by authors not included in the sample up to that point. The abstracts selected are listed in the Appendix, and henceforth referred to as Abstracts 1–10. All coreferential anaphoric items in each abstract were identified and classified by type of anaphor–antecedent relation (Table 1) and type of lexicogrammatical realization (Table 2). As summarized in Table 3, a total of 239 items were detected, of which 236 could be classified. In what follows, I summarize my findings within each category of lexigrammatical realization. In examples given in the text, dotted underlining indicates the antecedent, and solid underlining the anaphor. ............................. 3.1. Proformal Proformal anaphoric references (including possessives) occurred a mere eight times in the malaria abstracts (8/236=3%), three times in simple one-to-one reference (its, their, proformal this), three times in contrastive reference (those [obtained from laboratory-maintained strains], [four] others, that), and twice as a packaging reference (their, them; both coded for Table 3 as ‘‘packaging device’’). This marked scarcity of proforms is discussed in Section 4. 3.2. Repetition I have considered as repetition (a) the use of the same lexeme, (b) the use of a lexeme that has undergone a minor form change with respect to the antecedent (e.g.


G.J. Norman / English for Specific Purposes 22 (2003) 113–130 Table 3 Number of anaphors in each category detected in the malaria abstracts p













3 0 0 3 0

66 0 0 0 0

13 0 0 0 0

17 4 2 7 0

3 2 0 0 0

0 1 0 12 0

19 0 0 1 0

2 14 0 0 0

21 0 0 4 0

– – – – 42

144 21 2 27 42

61 9 1 11 18


























Three items were unclassifiable. Types of referential relation (see Table 1)—S, one-to-one; I, zoom-in; O, zoom-out; C, contrastive; P, packaging. Lexicogrammatical realizations (see Table 2)—p, proformal; r, repetition; a, abbreviation; x, reductive head-repetition; y, expansive head-repetition; z, substitutive headrepetition; s, synonym; o, hyponym; e, hypernym; d, packaging device. The SUM% columns show the SUM values as a percentage of the total number of anaphors (236). [Note that no attempt has been made to categorize packaging references by lexicogrammatical realization, though in some cases such categorization would be possible (see text).]

‘‘position’’ and ‘‘positions’’), and (c) the special case of abbreviation of genus names after first use (e.g. ‘‘P. falciparum’’ for ‘‘Plasmodium falciparum’’; standard practice in biomedical texts). As expected, repetition was widely used: it occurred at least once in all ten of the malaria abstracts, 66 times in total (66/236=28%). 3.3. Abbreviation Anaphoric abbreviations (e.g. ‘‘MBP’’ referring back to ‘‘maltose-binding protein’’) are perhaps best considered as a variant of repetition; here, however, I have listed them as a separate category, in view of their special relevance in scientific writing. Such abbreviations (i.e. definition-requiring abbreviations, versus standard abbreviations like DNA) occurred in 6 of the 10 abstracts, 13 times in total (13/ 236=6%). 3.4. Reductive head-repetition Reductive head-repetition is here defined as repetition of the head of a nominal group, but with elimination of some or all of the modifiers present in the antecedent, as in the following example: 2. Chinese hamster ovary (CHO) cells transfected with cDNA encoding the first 578 amino acid residues of human band 3 protein transiently expressed the ...................................... protein efficiently. [Abstract 2] In non-technical genres, we would perhaps expect to see ‘‘it’’ instead of ‘‘the protein’’ as the anaphor here. The use of reductive head-repetition instead of a less specific anaphoric device may reflect the low ambiguity tolerance of scientific writing (see Section 4).


G.J. Norman / English for Specific Purposes 22 (2003) 113–130

Reductive head-repetition occurred frequently (17 times in one-to-one reference, seven times in contrastive reference, 30 times in total; 30/236=13%). Reductive headrepetition for one-to-one reference occurred both within and between sentences. 3.5. Expansive head-repetition Expansive head-repetition is here defined as repetition of the head of a nominal group, but with inclusion of more modifiers than were present in the antecedent. It occurred only five times (5/236=2%) in the malaria abstracts. It might be expected to occur as a realization of zoom-in reference; however, only two such items were detected in the malaria abstracts. In fact, expansive head-repetition occurred more frequently (three times) in realization of one-to-one reference. Examples include the following: 3. The developmental stages of ............................ malaria parasites that infect E are responsible for the morbidity and mortality associated with this disease. [. . .] The rodent malarial parasite Plasmodium yoelii yoelii (Py) has provided a model system. . . [zoom-in reference; Abstract 10; note that E here means erythrocytes] 4. While the C-terminal region of MSP-1 from the two prototypic alleles of .................................................... P. falciparum has been shown to be relatively conserved in laboratory-maintained strains, little data exist on sequence heterogeneity of this region in field isolates from diverse geographic areas. To address this question, DNA encoding the C-terminal, Cys-rich region of P. falciparum MSP-1 from field samples was analyzed. . . [one-to-one reference; Abstract 3]

3.6. Substitutive head-repetition Substitutive head-repetition is here defined as repetition of the head of a nominal group, but with replacement of some or all of the modifiers present in the antecedent. It can therefore be expected to occur largely as a realization of contrastive reference, and this was indeed the case in my sample: of the 13 anaphors of this type detected (13/236=6%), one was a realization of zoom-in reference, but the remaining 12 were realizations of contrastive reference, as in the following example: 5. However, only antisera against the four carboxy regions (C-F) of Pfs230 and ......................... not the two amino regions (A and B) recognized the 310-kDa form of Pfs230. . . [Abstract 1]

3.7. Synonyms Synonym (and near-synonym) anaphors occurred 20 times in the malaria abstracts (20/236=8%) (for examples, see Section 4). As discussed in Section 4, this

G.J. Norman / English for Specific Purposes 22 (2003) 113–130


result is of interest in that it suggests that at least some NES scientific writers do use ‘‘elegant variation’’. 3.8. Hyponyms Although apparently not identified as a category of coreferential anaphoric reference by some previous authors, zoom-in anaphoric reference achieved with hyponyms certainly occurred in the malaria abstracts, as illustrated by the following example: 6. C. fasciculata was shown to reduce the redox potential of the culture medium, as were other malaria growth enhancers including cysteine and glutathione. ................................... [Abstract 6] Indeed, zoom-in anaphoric reference using hyponyms in this way was rather frequent (5 of the 10 abstracts; 14 occurrences in total, 14/236=6%). Often, such reference was realized with an appositive nominal group: 7. Infection of human erythrocytes with the malaria parasite, Plasmodium falciparum, results in the exposure of amino acid residues 542–555 of the anion-exchange protein, band 3, in a conformation that enables the cell to ...................................... adhere to C32 amelanotic melanoma cells. [Abstract 1] Zoom-in anaphoric reference using a hyponym is of course not relevant to the issue of consistent naming, since by definition the anaphor cannot be the same term as the antecedent. Note, though, that other realizations of zoom-in anaphoric reference (notably the different types of head-repetition) may be relevant.

3.9. Hypernyms The total number of hypernymic items in the sample was 25 (25/236=11%), of which 21 were one-to-one references and four contrastive references; note that this figure does not include single-lexeme encapsulatory items like ‘‘[These] results. . .’’, which were included in the ‘‘packaging device’’ category (see later). Almost all of the hypernymic one-to-one references were preceded by an appropriate determiner (the, this or these), as in the following example: 8. To further evaluate the biological role of .............. falcipain, we expressed the enzyme in bacterial and viral expression systems. [Abstract 5] As noted above for reductive head-repetition, in non-technical written genres we would perhaps expect to see ‘‘it’’ here instead of ‘‘[the] enzyme’’: the use of a hypernym can again be seen as a reflection of the low ambiguity tolerance of scientific writing (see Section 4).


G.J. Norman / English for Specific Purposes 22 (2003) 113–130

3.10. Packaging devices This category covers all the various realizations of packaging reference. Packaging references occurred frequently (42 items; 42/236=18%) in the sample. Two of the 42 items were proforms. Twelve of the 42 items were single-lexeme packages, typically preceded by a determiner such as ‘‘that’’ or ‘‘these’’ (results in four cases, hypothesis in two cases; also data, approaches, effect, sequences, studies, resistance). Only three items ([Our] hypothesis, [this] hypothesis, [other] studies) can be considered encapsulatory in the metatextual sense used by Francis, and indeed only the second of these items directly replaces a single subordinate clause, as required for strict compliance with Francis’s criteria for encapsulation. Nevertheless, anaphors like ‘‘These results’’ are clearly encapsulatory in a closely related sense. The remaining items are multi-lexeme packages of varying complexity and construction. As noted by Halliday (1988), a common pattern is for a verb-realized Process in the antecedent to be nominalized in the anaphor: for example, packaging anaphors often denominate the product of a previously described laboratory procedure, the procedure becoming a defining premodifier (e.g. pfalhesin-coated microspheres, in Example 9 below). However, and again as noted by Halliday and subsequent authors (e.g. Halliday, 1988; Martin, 1992; Ventola, 1996), the packaging function may of course be realized lexicogrammatically in various ways. In the present analysis, packaging items not readily classifiable as straightforward Process nominalizations included Examples 10 and 11 below. 9. As a more efficient alternative to transgenic expression of the adhesin, microspheres with covalently bound peptides fashioned on band 3 sequences previously found to be adherent (residues 546–553 and 820–829 and called pfalhesin) were produced. The pfalhesin-coated microspheres specifically bound. . . [Abstract 1] 10. We have reported previously that immunization with a bacterial recombinant protein containing the two epidermal growth factor (EGF)-like modules of Plasmodium yoelii Merozoite Surface Protein (MSP-1) protected mice against challenge with this malaria parasite. Bacterial plasmids containing sequences coding for the individual modules fused to glutathione S-transferase (GST) have now been made. The fusion protein containing the combined EGF-like modules was recognized. . . [Abstract 4] 11. To further evaluate the role of falcipain, we expressed the enzyme in bacterial and viral expression systems. [. . .] Recombinant falcipain rapidly hydrolyzed. . . [Abstract 5; recombinant falcipain here refers to falcipain expressed in bacterial and viral expression systems] 4. Consistent naming in scientific writing The above analysis has considered only 10 abstracts from a highly restricted field; I have not analyzed the main texts of the articles in question, and neither have I

G.J. Norman / English for Specific Purposes 22 (2003) 113–130


performed equivalent analyses of texts from other (non-scientific) genres. Nonetheless, I consider this sample to be sufficiently representative to allow a number of general conclusions to be drawn about the ways in which anaphoric reference is achieved in scientific writing. More specifically, to what extent do the authors of the texts included in my sample use consistent terminology in the sense recommended by style manuals? Broadly speaking, my findings suggest that the conventional style-manual advice to use consistent terminology is valid. Straight repetition, anaphoric abbreviation, reductive head repetition and substitutive head repetition (as defined earlier) can all be considered forms of consistent naming, and together accounted for 52% of the anaphors detected; by contrast, synonyms accounted for only 8%. Nevertheless, a more complete picture of the ways in which writers realize anaphoric reference calls for a more detailed analysis, as follows. First, proformal reference appears to be uncommon (only 3% of all anaphors). A similar conclusion was reached by Weissberg (1984) in his study of cohesive devices in a sample of Methods-section paragraphs (seven pronouns and three hypernyms, versus 54 occasions in which ‘‘inferential bridging’’ was required). When proformal anaphors do occur, they are usually in the same sentence as the antecedent, or in listlike series of simple SVO sentences.4 In situations in which straight repetition would be ugly or misleading (notably because of Given-New considerations), authors tend to use reductive head-repetition or a hypernym (usually with an anaphoric determiner, such as ‘‘the’’, ‘‘this’’, ‘‘these’’ or ‘‘such’’). These findings are entirely as expected if we accept that scientific writing has relatively low ambiguity tolerance; though authors like Myers (1996) and Ventola (1996) have pointed out that scientific writing is probably not as purely ideational and unambiguous as many style-manual writers would seem to think. Note also that a given ambiguity-reducing choice on the part of the writer (or editor ‘‘correcting’’ the text) need not necessarily reflect a deliberate effort to reduce ambiguity, but may rather be simply an ‘‘unthinking’’ style choice; though of course the stylistic rule in question may originally have evolved as an ambiguity-reducing strategy. Second, my analysis suggests that the different variants of head repetition (reductive, expansive and substitutive) are fundamentally important in scientific writing. As discussed for example by Halliday (1989), scientific language is characterized by complex taxonomies. The present analysis points to the importance of ‘‘modular’’ terminological systems in which the permanent or temporary taxonomies required are created by repeated modification of a head (typically a general noun, like cell or protein or disease). In terms of cohesion, anaphoric reference using head-repetition constitutes a precise and economical device for expressing long-distance ties. At the same time, and as pointed out above, reductive head-repetition in particular permits the reduction of ‘‘weight’’ necessary for effective Given/New structuring, while maintaining high anaphoric specificity. Furthermore, skilled use of head-repetition


This conclusion is based on a quick-and-dirty survey of anaphoric occurrences of ‘‘it’’ in 50 biomedical abstracts.


G.J. Norman / English for Specific Purposes 22 (2003) 113–130

devices allows the implicit definition of classes of entity or process, and the creation of sophisticated contrastive parallel structures that convey information with extreme economy and precision, as illustrated by the following example: 12. Fourteen of 17 malaria parasite isolates in one study, and 12 of 12 isolates in a second study, were successfully adapted to continuous culture in the presence of C. fasciculata, while only 5 of 17 parallel control isolates in the first study, and 2 of 12 isolates in the second study, were adapted in the absence of any feeder cells. [Abstract 6] Third, the analysis has shown that packaging devices (notably packaging nominalizations as defined by Halliday) typically have anaphoric function, and occur very frequently in scientific texts. Often, consistent naming (i.e. repetition or headrepetition) forms part of the ‘‘package’’. As noted, synonyms accounted for only a small proportion of the total number of anaphors, but this proportion is certainly non-negligible (20 items: 8% of all anaphoric items, versus 8% for hypernyms and 7% for reductive head-repetition). This finding is of particular interest in that such items do not obey the style-manual exhortation to use consistent naming. Detailed consideration of these 20 items suggests that they can be classified into four groups (I–IV), as follows: (I) Appositive synonyms, as in Example 13: three items. Items in this group are clearly not relevant to the issue of consistent naming. 13. ‘‘Pfs230’’ in ‘‘Six regions of malaria transmission-blocking target antigen, Pfs230, [. . .] were expressed. . .’’ [Abstract 1] (II) Clearly useful comprehension-enhancing paraphrase (Examples 14–16 below) or compaction (Examples 17), five items: 14. ‘‘the adhesin’’ referring back to the ‘‘adhesive form’’ (of a protein) [Abstract 2]; ‘‘-in’’ is a standard suffix for proteins, so that the term ‘‘adhesin’’ will be readily understood by target-community readers to mean ‘‘adhesive protein’’. 15. ‘‘bacterial recombinant protein’’ referring back to ‘‘Bacterial plasmids containing sequences coding. . .’’ [Abstract 4] 16. ‘‘the fusion protein’’ referring back to ‘‘a new fusion construct’’ [Abstract 10] 17. ‘‘mariner transposons’’, and subsequently ‘‘mariners’’, referring back to ‘‘Transposable elements of the mariner family’’ [Abstract 7] (III) Apparently harmless ‘‘elegant variation’’: nine items, examples including: 18. ‘‘microspheres with covalently coupled peptides’’ referring back to ‘‘microspheres with covalently bound peptides’’ [Abstract 2]

G.J. Norman / English for Specific Purposes 22 (2003) 113–130


19. ‘‘field samples’’ following ‘‘field isolates’’ [Abstract 3] 20. (antigen) ‘‘reacted . . . with’’ (antibody) following (antigen) ‘‘was recognized by’’ (antibody) [Abstract 4] 21. ‘‘the natural antibodies’’ referring back to ‘‘anti-parasite [. . .] antibodies’’ [Abstract 4] 22. ‘‘domains’’ referring back to ‘‘modules’’ [Abstract 4] 23. ‘‘the MSP-1 protein’’ referring back to ‘‘MSP-1’’ [Abstract 10] (IV) Apparently problematic ‘‘elegant variation’’: two items, both in Abstract 3: 24. In 15 isolates from Africa, Asia and Latin America, only a few nucleotide changes were found leading to amino-acid alterations at four positions out of 102 residues. All the variations corresponded to the predicted amino-acid sequence of the other prototype, suggesting that these changes were possibly due to allelic recombinations. The four changes were E!Q at position 1644 and TSR!KNG, or KNG!TSR at positions 1691, 1700 and 1701. Thus, only three patterns of the C-terminal, Cys-rich region of MSP-1, E-TSR, Q-KNG and Q-TSR, were detected. Detailed understanding of this semantically complex extract requires study of the abstract and consultation of the full text of the article. Briefly, the first problematic anaphor (variations) appears to be referring back to the nominal group ‘‘amino acid alterations at four positions out of 102’’: in other words, ‘‘variations’’ is being used as a synonym of ‘‘alterations’’. Alternatively, it might be interpreted as a ‘‘vague’’ anaphor referring back to both the ‘‘nucleotide changes’’ and the resulting ‘‘aminoacid alterations’’. The second problematic anaphor (changes) appears to be referring back to the previously mentioned ‘‘alterations’’ and/or ‘‘variations’’, not to the ‘‘nucleotide changes’’; at least, this is what is suggested by the next sentence, which uses ‘‘changes’’ to refer to changes in amino acid sequence. It is interesting to note that both of these problematic uses involve ‘‘general nouns’’. What then does this more detailed analysis of synonym use suggest? First—some synonym uses in scientific writing have genuine comprehension-facilitating function, although most are simply elegant variation. Second—within the category of elegant variation, most uses are harmless, with no negative effect on readability. Third— some uses do however clearly reduce readability. Fourth, it is worth pointing out that the relative rarity of readability-reducing uses in no way constitutes an argument in favor of synonym use: only a small proportion of episodes of blind overtaking actually cause accidents! Finally, and as noted above, this study has only considered abstracts. It would certainly be of interest to perform analogous studies of complete articles, which clearly differ from abstracts in a number of relevant respects: most notably, they are


G.J. Norman / English for Specific Purposes 22 (2003) 113–130

Table 4 Four sample tasks illustrating possible approaches to the teaching of consistent-naming strategies (all texts are constructed)

G.J. Norman / English for Specific Purposes 22 (2003) 113–130


longer and more complex, and will thus tend to contain longer and more complex chains (or networks) of coreference. To the best of my knowledge, no studies of this type have been performed to date, though as noted Weissberg (1984) has analyzed cohesive devices in 20 research-report Methods paragraphs. In addition, and from a rather different perspective, Gledhill (2000) has investigated terminology development within biomedical texts, and among related groups of such texts, focusing on the ongoing negotiation and reformulation of meaning.

5. Pedagogic implications Briefly stated, the principal pedagogic implications of this study are (1) that the standard style-manual advice to use consistent terminology is broadly valid, and relevant to the non-NES scientific writing classroom, though with awareness that this is a complex issue; (2) that teachers of scientific writing should be wary of ideas about cohesion and anaphoric reference that derive from studies of non-technical texts (see for example Hoey, 1991, Section 9.1); and (3) that three particular classes of anaphoric reference (reductive/substitutive head repetition, determiner+hypernym, and packaging reference) perhaps warrant special attention. Table 4 shows some sample tasks using constructed texts, designed to draw students’ attention to consistent-naming issues; though of course different teachers in different contexts might choose to approach these same issues in entirely different ways. My personal experience (with doctoral and postdoctoral scientific researchers at a Spanish university) is that consistent-naming strategies are readily taught, and attractive to students in that they provide a straightforward way of improving readability that is basically independent of conventionally defined lexicogrammatical level. Again in my experience in Spain, students are often very surprised—indeed sometimes disbelieving at first—when it is suggested that elegant variation is not generally appropriate in scientific writing. Another level-independent strategy closely related to consistent-naming is ‘‘parallel-structuring’’ (e.g. all items in a list should have parallel grammatical structure): in my classes, I often consider parallel-structuring immediately after consistent-naming, and ask students to rewrite a constructed text showing severe problems of both types (see Sample Task 4, second part). Consistent naming also ties in readily with questions including article use and anaphoric ‘‘such’’, ellipsis (see Sample Task 4, first part), and packaging and encapsulation. This study has investigated how NES scientific writers realize anaphoric reference. In addition, my personal view is that we should not be afraid to try to distinguish between good and bad NES writing. Certainly, some of the anaphoric items detected in the present analysis were evidently reader-unfriendly, as in Example 24, and the following example from Abstract 6: 25. Neither this system nor another system, murine peritoneal macrophages, had any effect on the cysteine content of the culture medium.


G.J. Norman / English for Specific Purposes 22 (2003) 113–130

Here, the use of the indefinite determiner another clearly suggests that murine peritoneal macrophages are being mentioned for the first time; in fact, though, this is an anaphoric reference to murine peritoneal wash cells earlier in the abstract. The problem is compounded by the use of macrophages as a synonym of wash cells; straight repetition of wash cells would almost certainly make it easier for the reader to overcome the confusion introduced by another. Examples of this type suggest that teachers of scientific writing should perhaps pay attention (a) to the misleading use of synonyms (notably ‘‘general noun’’ synonyms), and (b) to the misleading use of substitutive or expansive head repetition for simple one-to-one reference, leading to possible ‘‘False New’’ problems (cf. Bloor & Bloor, 1992): in other words, causing the reader to think that something new is being introduced, when in fact it has been mentioned already. Again, exactly how these issues should be approached in the classroom is likely to vary greatly depending on context and on the individual teacher’s methodological preferences.

Acknowledgements I am grateful to Sue Wharton (Aston University, UK) and three anonymous reviewers for helpful comments during the preparation of this manuscript.

Appendix. Abstract sources

1. Williamson, K.C., Keister, D.B., Muratova, O., & Kaslow, D.C. (1995). Recombinant Pfs230, a Plasmodium falciparum gametocyte protein, induces antisera that reduce the infectivity of Plasmodium falciparum to mosquitoes. Molecular and Biochemical Parasitology 75 (1), 33–42. 2. Guthrie, N., Bird, D.M., Crandall, I., & Sherman, I.W. (1995). Plasmodium falciparum: the adherence of erythrocytes infected with human malaria can be mimicked using pfalhesin-coated microspheres. Cell Adhesion Communications 3 (5), 407–417. 3. Kang, Y., & Long, C.A. (1995). Sequence heterogeneity of the C-terminal, Cys-rich region of the merozoite surface protein-1 (MSP-1) in field samples of Plasmodium falciparum. Molecular and Biochemical Parasitology 73 (1–2), 103–110. 4. Ling, I.T., Ogun, S.A., & Holder, A.A. (1995). The combined epidermal growth factor-like modules of Plasmodium yoelii Merozoite Surface Protein-1 are required for a protective immune response to the parasite. Parasite Immunology 17 (8), 425–433. 5. Salas, F., Fichmann, J., Lee, G.K., Scott, M.D., & Rosenthal, P.J. (1995). Functional expression of falcipain, a Plasmodium falciparum cysteine proteinase, supports its role as a malarial hemoglobinase. Infection and Immunity 63 (6), 2120–2125.

G.J. Norman / English for Specific Purposes 22 (2003) 113–130


6. Awadelkariem, F.M., Hunter, K.J., Kirby, G.C., & Warhurst, D.C. (1995). Crithidia fasciculata as feeder cells for malaria parasites. Experimental Parasitology 80 (1), 98–106. 7. Robertson, H.M., & Lampe, D.J. (1995) Recent horizontal transfer of a mariner transposable element among and between Diptera and Neuroptera. Molecular Biology and Evolution 12 (5), 850–862. 8. Paul, R.E.L., Packer, M.J., Walmsley, M., Lagog, M., Ranfordcartwright, L.C., Paru, R., & Day, K.P. (1995). Mating patterns in malaria parasite populations of Papua New Guinea. Science 269 (5231), 1709–1711. 9. Shai, S., Blackman, M.J., & Holder, A.A. (1995). Epitopes in the 19kDa fragment of the Plasmodium falciparum major merozoite surface protein-1 (PfMSP-1(19)) recognized by human antibodies. Parasite Immunology 17 (5), 269–275. 10. Daly, T.M. (1995). Humoral response to a carboxyl-terminal region of the merozoite surface protein-1 plays a predominant role in controlling bloodstage infection in rodent malaria. Journal of Immunology 155 (1), 236–243.

References Bloor, M., & Bloor, T. (1992). Given and new information in the thematic organisation of text: an application to the teaching of academic writing. Occasional Papers in Systemic Linguistics, 6, 33–44. Clark, H. H. (1977). Bridging. In P. N. Johnson-Laird, & P. C. Wason (Eds.), Thinking: readings in cognitive science (pp. 411–420). Cambridge: Cambridge University Press. Davies, S., & Poesio, M. (1998). Coding schemes for co-reference. In: M. Klein (Ed.) MATE Deliverable D1.1: supported coding schemes. Available: Day, R. A. (1995). Scientific english: a guide for scientists and other professionals (2nd ed.). Phoenix, Arizona: Oryx Press. Francis, G. (1994). Labelling discourse: an aspect of nominal-group lexical cohesion. In M. Coulthard (Ed.), Advances in written text analysis (pp. 83–101). London: Routledge. Gledhill, C. (2000). Collocations in science writing. Tuebingen: Gunter Narr Verlag. Halliday, M.A.K. (1988). On the language of physical science. Reprinted in M. A. K. Halliday & J. R. Martin (1993), Writing science: literacy and discursive power (pp. 54–68). Pittsburgh: University of Pittsburgh Press. Halliday, M.A.K. (1989). Some grammatical problems in scientific english. Reprinted in M. A. K. Halliday & J. R . Martin (1993), Writing science: literacy and discursive power (pp. 69–85). Pittsburgh: University of Pittsburgh Press. Halliday, M., & Hasan, R. (1976). Cohesion in English. London: Longman. Hoey, M. (1991). Patterns of lexis in text. Oxford: Oxford University Press. Martin, J. R. (1992). English text. Amsterdam: John Benjamins Publishing Company. Myers, G. (1996). Strategic vagueness in academic writing. In E. Ventola, & A. Mauranen (Eds.), Academic writing: intercultural and textual issues (pp. 3–17). Amsterdam: John Benjamins Publishing Company. Passonneau, R. J. (1996). Instructions for applying Discourse Reference Annotation for Multiple Applications (DRAMA). Internal report, Columbia University: Department of Computer Science. Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A comprehensive grammar of the English language. London: Longman. Swales, J. M., & Feak, C. B. (1994). Academic writing for graduate students: a course for non-native speakers of English. Ann Arbor: The University of Michigan Press.


G.J. Norman / English for Specific Purposes 22 (2003) 113–130

Ventola, E. (1996). Packing and unpacking of information. In E. Ventola, & A. Mauranen (Eds.), Academic writing: intercultural and textual issues (pp. 153–194). Amsterdam: John Benjamins Publishing Company. Weissberg, R. (1984). Given and new: paragraph development models for scientific English. TESOL Quarterly, 18, 485–500. Weissberg, R., & Buker, S. (1990). Writing up research: experimental research report writing for students of English. Englewood Cliffs, New Jersey: Prentice Hall Regents. Guy Norman is a one-time biologist, long-time technical translator and small-time teacher of English for Scientific Research at the University of Santiago de Compostela in Galicia, Spain. He is the author of Co´mo escribir un artı´culo cientı´fico en ingle´s (Editorial He´lice, 1999).