Terminological Data Banks: a model for

a British Linguistic Data Bank (LDB)

John McNaught

Centre for Computational Linguistics, UMIST

Paper presented at the Aslib Technical Translation Group conference and exhibition,

London 20 November 1980

A description of a model linguistic data bank (LDB) for a British market will be given,

based on the results of a continuing feasibility study. A LDB represents an economical and

highly efficient way of organizing Britain’s efforts in the field of terminology, both with respect to English and the many foreign languages through which contact is maintained with non-English speaking countries. The institutional and organizational structure will be outlined. Emphasis will be placed on services to be provided to various groups, and in particular to translators, and on the important role these groups will play in assuring the continued viability and relevance of the LDB, not only as users, but as contributors and advisers. Data acquisition policy and financial aspects will be considered.

A multilingual, multidisciplinary British LDB will provide translators with a valuable service, whose applications are many, whose products are varied to cater for a wide range of needs, whose terminology is continually revised and updated and whose modes of consultation are several.

THIS PAPER IS based on results obtained from a continuing feasibility study of the establishment of a terminological data bank in the United Kingdom, a study being carried out at UMIST under the auspices of the British Library.

I shall use the term Linguistic Data Bank (or LDB) in preference to Terminological Data Bank, as many of the banks we investigated in the course of this study do not restrict themselves to handling terminological data alone. Thus LDB represents a more accurate designation of the types of information systems we will be discussing.

I shall concentrate primarily on work being done in this country towards the establishment of a British LDB, but shall make reference to other LDBs abroad by way of exemplification and illustration. Indeed, I would urge you to keep in mind during this talk that, when I describe possible features of a British LDB, these features already exist in other LDBs. I am not describing services or facilities or search methods that could exist. In our proposals for a model of a British LDB, we have translated the assumedly best features of LDBs abroad to the context of a British market. Where Britain may hope to achieve a measure of innovation in LDB operation is in the use of the most up-to-date technology and software, exploiting information networks and the move towards office and home computers, etc, and in reaping the benefits of recent terminological research. There are significant advantages to be gained by being 297 ASLIB PROCEEDINGS VOL. 33, NO. 7/8

a late-comer in this field, not the least of which is to be able to study the reaction of users to existing LDBs, and so to be able to design a LDB which will suit users’ needs.

There are three sections to this paper: Part I deals with the reasons behind the feasibility study; Part II is a description of the phases of the study; and Part III is a presentation of a model for a British Linguistic Data Bank.

I. Reasons The reasons and considerations behind the feasibility study are several—I shall mention

only the most important:

Special language communication. This involves the constant creation of terms to designate concepts, objects, measurements, products, etc. These designations (terms) differ from the words of general language, in that they refer more specifically than words, in that they are mainly used by specialists, in that they are often created according to established patterns and precedents, in that they are susceptible to standardization and in that they may be relatively short-lived and changed in the light of discoveries and developments.

Efficient communication. This depends on common agreement, and can only be achieved by widespread knowledge of terms (in our case) or by easy access to terminological information. The problems of efficient communication apply with even greater force across language boundaries.

Efficient special language communication. There are many different groups involved in the use and creation of terminology; all groups must have access to terminologies, both their own, and those of other disciplines.

‘Information explosion’. The immense upsurge in technological innovation and the concomitant upsurge in new terminology, together with the great increase in multilingual communication needs, means that the work of collecting, storing, sorting and disseminating terminology cannot be carried out efficiently by dispersed methods, especially when contact must be maintained with LDBs abroad housing foreign language data.

Lack of single authoritative organization in the UK. There is no single organization in the UK able to provide authoritative guidance on English usage of specialized terminology. Note that I do not say standardized terminology: the BSI do a laudable job in this area. Specialized terminology, however, is another matter, in that both standardized and non-standardized terms are present. One is dealing with the special languages of different disciplines, with the grey areas where the terminologies of disciplines meet, with in-house usage vis-a-vis wider usage, etc. There is no national centre for terminology, no centre which has close links with other bodies concerned with the production and regularization of usage of specialized terminology. There is also a distinct lack of links with foreign LDBs—no central body capable of negotiating the exchange of data with a foreign LDB, for example.

Existence of other LDBs. In recent years, major industrial countries and international organizations have established LDBs. LDBs in multilingual form exist in (nos. of main LDBs in brackets) Canada (2), at the Commission of the European Communities, in France (1), the Federal Republic of Germany (4), the German Democratic Republic 299 ASLIB PROCEEDINGS VOL. 33, NO. 7/8 (1), Sweden (1) and the USSR (2). In Denmark, plans are well advanced for the establishment of DANTERM. The UN plans to establish its own LDB, as does UNI, the Italian Standards Institution. In Spain, HISPANOTERM is of recent creation.

Further information on these LDBs may be gained from Sager & McNaught1. Great Britain is the only major industrial nation without such a service facility, that is, a centre for the processing of all kinds of terminological data.

There is a substantial amount of work being done in Britain, however, related to thesauri for indexing and retrieval purposes. One of the most important contributions Britain has made in this field is towards the development of the ISONET thesaurus, which is a computerized, controlled vocabulary of some 11.5 thousand descriptors and 5.5 thousand non-descriptors used for the selection of descriptors for indexing and searching standards and technical regulations on ISONET databases. The thesaurus consists of a classified subject display and an alphabetical list (the index to the display) and, though developed at the moment only as a bilingual English-French version, is designed to be both multilingual as well as multidisciplinary. The BSI team responsible for the development of the English part of the thesaurus has helped to produce not only an excellent indexing and information retrieval tool, but also a database whose contents contain a valuable store of terminological information.

English terminology. All the foreign LDBs mentioned contain, or will contain, substantial amounts of English terminology, at least as translation equivalents, and such vocabulary may be misleading. The impact of LDBs on the usage of English terminology outside the UK will increase, and may, without British involvement, introduce usage unacceptable or even incomprehensible to this country.

There is a serious danger that the international role of English as a means of communication may be impaired if a single, national British centre for terminology does not exist. Moreover, as many languages create new terms on the basis of English, uncontrolled elaboration of English terminology in a number of different centres has far-researching consequences for effective communication in other languages and between these languages and English.

Nairobi Recommendation of UNESCO. Paragraph 12 of this document, on the legal

protection of translators and translations, reads:

‘12. Member states should consider organizing terminology centres which might

be encouraged to undertake the following activities:

(a) communicating to translators current information concerning terminology required by them in the general course of their work;

(b) collaborating closely with terminology centres and developing the internationalization of scientific and technical terminology so as to facilitate the task of translators.’ Aslib 1978 conference on ‘Translating and the Computer’. The audience of this conference expressed a strong interest in LDBs, and many of the organizations we have contacted during the course of this study were represented at this conference.

II. Phases of the Feasibility Study On the basis of the above reasons and considerations, the project seeks to establish

the following:

In phase one:

— the use made of LDBs in other countries — the cost and financing of other LDBs — the institutional and organizational framework of other LDBs — the availability and quality of data for a British LDB

In phase two:

— the possible uses of a LDB in the UK — the possible structure of a British LDB

The study itself was split into three phases:

Phase One: LDBs and data

1. The state of LDBs

1.1. Information gathering 1.1.1.Scrutiny of available documentation 1.1.2.Formulation of further enquiries to be made 1.1.3.Follow up enquiries by questionnaire or visits to selected LDBs

1.2. Report on selected LDBs:

their use, cost, financing, organization and institutional framework Phase Two: Preliminary enquiry among potential users

1. Preliminary technical specification of a British LDB — scope of holdings, acquisition policy — format of holdings — modes of operation, user facilities — maintenance and development

2. Discussion of this model with potential contributors and users — government departments — relevant institutions — industry — translators — information and documentation centres — publishers

3. Evaluation of responses Phase Three: Feasibility report on a model of a LDB

1. Modification of the technical specification

2. Organizational specification

3. Recommendations Phase One, the material for which was gathered with the aid of a British Library Overseas Study Visit Grant, was concluded in June with a report entitled ‘Survey of Five Linguistic Data Banks’2. Some of the main points and conclusions of this report

are detailed below:

Three main types of LDB exist:

(a) those conceived primarily as translation aids, including EURODICAUTOM (CEC) and LEXIS (Bundessprachenamt)


(b) those used primarily as language planning aids, for example BTQ (Banque de Terminologie du Québec) and TERMIUM (University of Montreal) (c) those used as aids to standardization, including NORMATERM (Association Française de Normalisation), TEAM (Siemens ag) in collaboration with DIN (Deutsches Institut für Normung), which now has its own LDB and document retrieval system (DITR) and TERMDOK (Tekniska nomenklaturcentralen), Stockholm, which collaborates very closely with SIS, the Swedish Standards Institution.

Two main methodological approaches to LDB data organization exist, exemplified by EURODICAUTOM on the one hand, which stores keywords and their contexts, in the belief that translators are best served by supplying them with terms in context, and LEXIS on the other, which records terms in isolation, preferring to work from concepts.

The facilities, services, institutional and organizational structure of these major European banks were investigated, as was the functioning of other major LDBs in Europe and elsewhere, through consultation of the literature and via correspondence.

Of great interest to us were the various systems used by LDBs to finance their operations, and to establish links with their users. Here we investigated the partnership systems set up by TEAM and TERMDOK, where partners contribute terminology in return for services, and subscriber systems such as the one operated by NORMATERM. Links with users, and methods of elaborating terminology, were studied especially in relation to TEAM, TERMDOK and DANTERM. This latter has a policy of sending terminologists into the field to develop and research terminology on the spot. TERMDOK has a smoothly-running system of committees which elaborate new terminology in conjunction with industry, etc, and has wide user links in many sectors.

TEAM provides a good example of how a partnership system may operate to the benefit of all members. This particular partnership system unites many different groups and organizations, both in West Germany and in other countries, eg Philips, and the Dutch Foreign Ministry. These groups all contribute terminology to TEAM and have access to all TEAM terminology free of charge, payment only being asked for actual processing time.

