Readers are welcome to print individual copies of this paper for private study. Reference should always be made to the original places of publication

-- [ Page 1 ] --

© Copyright Michael Stubbs 1986.



Michael Stubbs

Readers are welcome to print individual copies of this paper for private study.

Reference should always be made to the original places of publication which are:

Kevin Durkin, ed (1986) Language Development in the School Years. Croom Helm.


A slightly revised version is published in: M Stubbs (1986) Educational Linguistics. Blackwell).

(This version was converted from an ancient computer file to an HTML file and further converted to this pdf-file. It therefore may not correspond exactly to the published versions.)

A few additional notes have been added to the text here, like this:

NOTE added February 2008. The term used in the title, "nuclear vocabulary", followed work by Dixon, Hale, and Stein and Quirk (which is referred to in the article), but has not been much used in later work. The term has however been retained here. The term "core vocabulary" is probably the most current nowadays.

This article uses the following conventions:

• italics for linguistic forms • "double quotes" for meanings

• asterisk * for ill-formed strings When people think of a language, they think almost inevitably of words: vocabulary.

And when they think of language development, they also tend to think of vocabulary enlargement. There are obviously many other aspects of language development, and there is the danger that an attempt to 'increase someone's word power' leads to a quiz mentality. Nevertheless, the notion of extending someone's vocabulary is a perfectly plausible one in itself. It rests on a powerful, though sometimes hazy, intuition that 2 some words are simpler, more important or more basic than others. It underlies the commonsense fury against much bureaucratic gobbledegook; and the often repeated observation that children's everyday vocabulary does not prepare them for reading the unfamiliar academic vocabulary in school textbooks (Perera, 1980).

This article sets out in detail several criteria for defining basic or nuclear vocabulary, and discusses some of the implications of the concept: for theoretical linguistic studies of lexis, for psycholinguistic studies of children's language development, and for practical educational concerns.

In some form the idea of basic vocabulary must underlie all vocabulary teaching. It certainly underlies vocabulary lists of various kinds including: Ogden's (1930) Basic English; Thorndike's (1921) and Thorndike and Lorge's (1944) Teacher's Wordbook;

West's (1953) General Service List; Kucera and Francis' (1967) computational analysis of American English; Carroll et al's (1971) American frequency list; Hornby's (1974) Advanced Learner's Dictionary; Hindmarsh's (1980) English Lexicon; the Keyword scheme in Ladybird readers; and, in fact, lists of vocabulary in any language textbook.

Historically, the distinction between basic and non-basic expressions can be traced back to seventeenth century speculations on the possibility of a logical universal language.

This work exerted a powerful influence on Roget's (1852) attempt at a Thesaurus which logically classifies the whole vocabulary of English. It also influenced Ogden's (1930) Basic English, intended as an international auxiliary language. (Lyons, 1981: 64.) Such lists have very different purposes, including: teaching English as a foreign language to different groups; facilitating international communication, given the position of English as a world language; and making prescriptions about the educational level expected of native English-speaking schoolchildren of different ages. Underlying some such lists is therefore a concept of the 'usefulness' or 'communicative adequacy' of different words. A clear statement of the fundamental intuitive notion involved is by

Jeffery in the Foreword to West (1953: v):

A language is so complex that selection from it is always one of the first and most difficult problems of anyone who wishes to teach it systematically.... To find the minimum number of words that could operate together in constructions capable of entering into the greatest variety of contexts has therefore been the chief aim of those trying to simplify English for the learner.

The widespread use of such a large number of lists in teaching of different kinds illustrates how important vocabulary development is felt to be, sometimes as an end in itself, and sometimes as a way of facilitating cognitive development.

There remain problems, however. For example, later lists have generally been constructed on the basis of earlier lists, which have themselves built-in biases in their sampling. The Thorndike list, often used by later scholars, was based on a corpus of 4.5 million words, of which 3 million are from 'the Bible and the English classics', including Boswell's Life of Johnson and Gibbon's Rise and Fall of the Roman Empire.

Earlier work is of course generally reinterpreted, but via a 'teacher's discretion' (Hindmarsh, 1980: ix); and some lists (eg Van Ek and Alexander, 1977) are set up with no indication at all of how they were constructed.

3 Frequency counts are obviously inadequate on their own, although basic frequency data

cannot be entirely ignored. And totally inexplicit use of intuition is also inadequate:

apart from any other reasons, intuitions about lexical frequency are often wildly inaccurate. This article therefore aims to provide a more precise concept of what might be meant by 'basic' versus 'non-basic' vocabulary, by returning to first principles and using published lists only at a later stage. I do not aim to provide a review of empirical research on vocabulary development, though some work is referred to. The aim is rather to discuss the systematic linguistic basis for a distinction which has far-reaching implications for linguists, child language researchers and teachers.

Words are idiosyncratic

It is regularly pointed out that words are idiosyncratic. Every individual word is unique in its etymology, and in its meaning and behaviour, including its collocations.

Furthermore any individual speaker's vocabulary is unique: an idiosyncratic network of personal connections which do not appear to concern linguistic competence as this is usually understood.

Phonological and grammatical competence are essentially different from lexical competence in this respect. Any adult native speaker of any dialect of English (or any other language) has basically the same phonological competence, involving intuitive knowledge of the phonemes of the language, their allophonic variants, their possible phonotactic constraints, and so on. This competence is acquired by the age of around seven years: after that there is simply no more to learn. The same is true of much of the grammar of the language: in most of its main features this is learned by the age of five or six years, though some of the more complex syntactic structures may be learned later and some stylistically formal syntactic structures, largely restricted to written language, may be learned in adulthood, if at all. Lexical competence simply never approaches this kind of completeness. The learning of new vocabulary is clearly very rapid in early childhood, and then slows down. But a person's vocabulary may nevertheless keep growing throughout their whole life. New meanings can be learned for old words, and new relations between words can be formed.

Relational lexical semantics

Despite this apparent inherently idiosyncratic aspect of lexical competence, there are, of course, systematic ways of studying vocabulary. One set of approaches could be called relational lexical semantics, and comprises: semantic field theory (expecially Roget, 1852, and Trier, 1931; but also other work by Humboldt in the 1800s, and by Meyer and Weisgerber between 1900 and 1930); structural semantics (Lyons, 1963, 1968); and componential analysis (Nida, 1975; Lehrer, 1974). The basic concept is that meaning is a relational property of language systems: words have no absolute value or meaning, but are defined in relation to other words. The sense relations involved include synonymy, antonymy and hyponymy, and these can be given formal definitions in terms of logical entailment and contradiction. Such approaches are well known and well reviewed in many standard textbooks (see especially Lyons, 1977, vol l). I will therefore not discuss them here except in so far as they can help to support a rather different way of discussing relations between words: a distinction between nuclear and non-nuclear 4 vocabulary. Nor will I explicitly discuss the question of how children acquire such semantic relations. There are detailed analyses of children's acquisition of the hierarchical organization of vocabulary, their initial overextensions and later narrowing of word meaning, and the structure of their concepts, by Clark (1973), Livingston (1982), Nelson (1982), Palermo (1982) and Rosch (1973).

The common core

An important part of native speakers' linguistic competence is the ability to recognize that some words are 'ordinary' English words, in some sense, whilst others are rare, exotic, foreign, specialist, regional and so on. Such intuitions are by no means always accurate: for example, regional words are often not recognized as such. As a speaker of standard Scottish English, I realized only recently that skelf ("splinter of wood stuck in a finger") is regionally restricted to Scotland and some other northern areas of Britain.

This intuitive notion that part of the vocabulary is more basic than the rest underlies the definition of the vocabulary of a language which is discussed in detail in the introduction to the Oxford English Dictionary. It is argued there that the vocabulary of English is 'not a fixed quantity circumscribed by definite limits', but rather a nebulous mass with 'a clear and unmistakeable nucleus (which) shades off on all sides... to a marginal film that seems to end nowhere'. The introduction also provides a helpful

diagram which neatly sums up this concept:

–  –  –

Dialectal and diatypic variation The concept of 'core' evident in the position adopted by the OED is, however, not quite the concept which we require here. Comprehensive dictionaries and grammars wish to define the whole of what is 'unquestionably' English. What we require is a considerably more restricted subset of this core. In addition, the 'core' in the sense already discussed occurs naturally as the intersection of many different varieties. We require also to build in the concept of a deliberate and planned selection within this core. Stein (1978) and Quirk (1981) call such a reduced and planned English 'nuclear English', with reference to the lexical and syntactic characteristics of a restricted variety of international English.

Hale (1971) and Dixon (1971, 1973) also use the term 'nuclear' in a relevant sense.

Blum and Levenston (1978) point to a related aspect of lexical competence which is closer to our requirements. An important part of native speakers' linguistic competence is the ability to do with less than their full vocabulary when required to do so. Speakers have an intuitive sense of which words to avoid when, for example, talking to younger or older children or to foreigners (cf Snow and Ferguson, eds, 1977; Bohannon and Marquis, 1977); or, conversely, which words ought to be taught first to foreign learners or used in simplified reading books for children, and, in general, which words are of maximum utility (Rosch, 1975; Cruse, 1977; Blewitt, l983; Shipley et al, 1983).

Speakers have many strategies for avoiding words if they require to. One strategy is to use a paraphrase or circumlocution: instead of waddle, they might talk of a clumsy walk, and such paraphrases are constructed in systematic ways (see below). However, such intuitions have limits, hence the debates over which words should be taught in foreign language textbooks, and hence the need for criteria which are not purely intuitive.

In order to develop this sense of nuclear vocabulary, I require to develop the concepts in the OED diagram cited above. It is usual to distinguish between: regional or geographical dialects (eg Scottish versus Anglo English); social dialects (eg working class versus middle class); temporal dialects (eg Old English versus Middle English);

and individual dialects (usually called idiolects). There are exceptions, but many individual speakers have full native competence in only one dialect, defined geographically, socially and temporally, and fixed in adolescence. On the other hand, any individual uses many different diatypes, according to the field of discourse (the activity going on at the time), the tenor of discourse (the social relations between the speakers), and the mode of discourse (predominantly speech versus writing). My formulation here is a Hallidayan one (see Halliday, 1978; or Gregory and Carroll, 1978, for a very simple account).

