«A Terminology Server for Medical Language and Medical Information Systems AL Rector1 WD Solomon1 WA Nowlan2 TW Rush2 1Medical Informatics Group, ...»
Published in the Proceedings of IMIA WG6, Geneva, May 1994
A Terminology Server for Medical Language and Medical
AL Rector1 WD Solomon1 WA Nowlan2 TW Rush2
1Medical Informatics Group, Department of Computer Science, University of Manchester,
Manchester, M13 9PL, UK
2Medical Products Group, Hewlett-Packard Ltd, Bristol, BS12 6QZ, UK
GALEN is developing a Terminology Server to support the development and integration of clinical
systems through a range of key terminological services, built around a language-independent, re-usable, shared system of concepts - the CORE model. The focus is on supporting applications for medical records, clinical user interfaces and clinical information systems, but also includes systems for natural language understanding, clinical decision support, management of coding and classification schemes, and bibliographic retrieval. The Terminology Server integrates three modules: the Concept Module which implements the GRAIL formalism and manages the internal representation of concept entities, the Multilingual Module which manages the mapping of concept entities to natural language, and the Code Conversion Module which manages the mapping of concept entities to and from existing coding and classification schemes. The Terminology Server also provides external referencing to concept entities, coercion between data types, and makes its services available through a uniform applications programming interface. Taken together these services represent a new approach to the development of clinical systems and the sharing of medical knowledge.
Keywords: Terminology server, Natural language processing, Coding and classification schemes Electronic medical records, Knowledge representation
1. Introduction: The Idea of a ‘Terminology Server’ Clinical practice centres on the care of patients by doctors, nurses, and other clinicians. Medical information should centre on the record of that care. There is a world-wide move towards ‘patient-centred’ information systems in which clinical information gathered by health care professionals during the process of patient care is both used to further that care and re-used to serve other functions within the health care systems.
If clinical information is to be re-used and shared, the basic concepts used to describe that care must be shared. Different specialised systems may organise those basic concepts differently for their own purposes, but the fundamental concepts must be common to all applications. In terms of classic data-modelling, we can imagine many different data models, but the meaning of the entities in those models— the meaning of ‘the information that goes in the boxes on the modelling diagram’ — must be shared. Such shared systems of concepts are increasingly known as ‘ontologies’ in thedatabase and artificial intelligence communities.
The GALEN1 project is funded by the European Commission as part of the AIM programme.
GALEN’s goal is to develop a ‘Terminology Server’ to manage language-independent shared systems of concepts for clinical applications. The Terminology Server will be a new type of integrating service for heterogeneous information systems. GALEN aims to demonstrate the feasibility and
usefulness of such a Terminology Server:
• To provide infrastructure support for the development and integration of clinical systems.
• To provide a flexible, extensible basis for achieving ‘coherence without uniformity’ amongst the many different clinical information services required.
• To serve as an accessible repository of language-independent medical conceptual knowledge, and to map this repository to potentially many different natural languages.
1 General Architecture for Languages Enclopædias and Nomenclatures in Medicine. The members of the GALEN consortium are: University of Manchester (UK, Coordinator), Hewlett-Packard Ltd (UK), Hôpital Cantonal Universitaire de Genève (Switzerland), Consiglio Nazionale delle Ricerche (Italy), University of Liverpool (UK), Katholieke Universiteit Neijmegen (Netherlands), University of Linköpking (Sweden), The Association of Finnish Local Authorities (Finland), The Finnish Technical Research Centre (Finland), GSF-Medis Institut, (Germany), Conser Systemi Avanzati (Italy)
1 Published in the Proceedings of IMIA WG6, Geneva, May 1994
• To convert between existing representations and coding schemes.
• To provide dynamically generated local nomenclatures or ‘coding schemes’ which are more comprehensive and thoroughly organised than can be held as a static structure or managed manually.
If computer systems are to play a significant role in clinical care, then formal ontologies which can be manipulated by computer systems are essential. Manual ‘coding systems’ or ‘controlled vocabularies’ interpreted by human users (largely on the basis of the natural language rubrics attached to the symbolic codes) are no longer sufficient. The difficulties of using even such massive efforts as the Unified Medical Language System , SNOMED-III  and the Read Codes  are all too apparent. Such systems are becoming too large to manage, but remain too small to contain the detail required to meet clinical requirements. Their organisation remains too limited to support acceptable clinical interfaces, and too rigid to support the variety and rapid evolution of clinical care.
To capture more detail and achieve greater organisation the meaning of the concepts must be captured not just in the rubrics but in the symbolic structure itself so that it can be manipulated computationally.
Mechanisms are needed to encapsulate the resulting intrinsically variable descriptions into the fixed formats used by relational databases. These requirements have been extensively discussed elsewhere and we shall not review them further here [4-8].
Medicine is not alone in perceiving the need for shared terminology. Sharing and re-use of ‘ontologies’ is now a major growth area in many areas of information and knowledge based systems development [9-13]. However, medicine may be unique in its scale, its large and diverse body of professional users and sublanguages and in its common international effort to share knowledge based on extensive shared understanding of the domain. If there is not already a shared model of clinical medicine and disease, there is a vigorous international effort to create one, an effort largely motivated by clinical goals. GALEN is one response to the special needs of supporting these clinical efforts to share knowledge and practice. Others include [14-16].
Because of medicine’s distinctive situation, GALEN takes a distinctive approach to knowledge sharing. We shall return at the end of this paper to the relationship between our concept of a medical Terminology Server and other knowledge sharing efforts. In the next section we discuss GALEN’s approach to meeting needs of the clinical community; Section three provides a functional description of the GALEN Terminology Server. Sections four and five discuss the architecture of the Terminology Server and the special features of the GALEN modelling formalism — the GRAIL Kernel — which derive from the special clinical requirements for reuse and information sharing.
The final section provides an overall discussion including questions of evaluation and maintenance.
2.1 Fundamental proposition The fundamental proposition of the GALEN project is that there is a terminological — or more properly, a conceptual — component of clinical language which can be usefully separated from other aspects of medical natural language processing, information modelling, knowledge based systems, and user interface design. GALEN contends that this conceptual component can be made largely independent of surface natural language characteristics. We suggest that this model is sufficiently strongly shared across clinical and linguistic groups to permit the development of an ‘interlingua’  based on a single coherent COncept REference (CORE) model of medical concepts. We believe that such a CORE Model of medical concepts is the appropriate reference point for developing coherent collections of clinical applications which work together successfully and build on each others achievements.
Because access to the CORE Model and related information is a common and pervasive requirement for many applications, GALEN aims to encapsulate access to the CORE Model and related functions in a server — the ‘Terminology Server’. In a network environment the Terminology Server will both mediate amongst existing systems and act as a repository for terminology to facilitate developing new systems. We do not claim that such a ‘Terminology Server’ will solve all problems of mediation amongst existing systems or of building new systems. Indeed, one of the primary aims of GALEN is to modularise the overall task of building clinical systems. The goal of the Terminology Server is to relieve individual applications of technically difficult operations involving terminology, or conceptual knowledge of the domain. Our image is of groups of applications,
2 Published in the Proceedings of IMIA WG6, Geneva, May 1994
developers and sites co-operating to develop and maintain one or more CORE Models which they all share and which support their joint efforts.
2.2 GRAIL The modelling formalism in which the CORE Model is built is known as the GRAIL (GALEN Representation And Integration Language) Kernel . GRAIL is a compositional formalism — rather than having to enumerate all and only those clinical concepts that are available, the GRAIL modeller specifies elementary concept entities, and relations that may be used to combine them into 'complex' concept entities. This process can be recursive, thus providing for indefinitely complex concept entities.
GRAIL is generative and GRAIL models are sparse. A GRAIL model contains only the minimum information necessary to sanction the generation of all sensible concept entities. An indefinitely large number of concept entities can be inferred from the sanctions in the model and generated as needed without having to store them explicitly.
GRAIL classifies composite concepts automatically on the basis both of their definition and of indefeasible statements which are conceptually necessary to a concept. Hence there is no need for maintaining multiple classifications manually or even for specifying them in advance. Concepts such as “congenital heart disease” can be classified automatically under both congenital diseases and heart diseases without manual intervention.
GRAIL also provides a facility for attaching ‘extrinsic’ information to concept entities. Extrinsic information is information which does not affect an entity’s classification. For example, the
Aspirin extrinsically mayBeBoughtIn 100mgTablets is a representation of additional ‘real-world’ knowledge beyond what is necessarily true about Aspirin conceptually. The well structured taxonomies in GRAIL models are often useful and compact ways to organise other extrinsic knowledge.
2.3 The Terminology Server The Terminology Server provides an encapsulation of, and a networked applications programming interface to, the CORE Model, the facilities provided by the GRAIL formalism, and linguistic and coding functionality. It provides means of referring to concept entities, asking questions of them, and transforming them into other representations, such as natural language. Individual modules within the Terminology Server handle different aspects of the overall task. The Terminology Server provides a uniform interface to the services provided by each of these modules, as well as combining multiple services into those useful for external applications. There are five major tasks that the
Terminology Server as a whole performs:
• managing external references to concept entities ('Reference management') and coercion between data types;
• implementing the GRAIL formalism, and managing the internal representation of concept entities (implemented by the Concept Module) ;
• managing the data and functionality required to map concept entities to natural language (and, potentially, the inverse), (handled by the Multilingual Module);
• managing the data and functionality required to map concept entities to and from existing coding and classification schemes (handled by the Code Conversion Module);
• providing the functionality and management to handle extrinsic information (the Extrinsic Information Module).
Section 3 provides a functional description of the Terminology Server; Section 4 provides an overview of its architecture.
2.4 Expected Applications The test of the Terminology Server will be whether it supports applications successfully. To be successful it must be shown to support applications both individually and, more importantly, within an environment of heterogeneous interworking clinical information systems. Our goal is not an
‘pure’ representation of the essence of medical thought ; rather, the goal is a practical tool for developers of clinical information systems. Experience suggests that, within limits, ‘cleaner’, more formal representations lead to systems which are more flexible and extensible. However the ultimate criteria is use in practical applications; compromises are therefore inevitable.
3 Published in the Proceedings of IMIA WG6, Geneva, May 1994
Applications should benefit from the Terminology Server in at least four ways:
• Operations involving terminology can be delegated outside of individual applications;
• Development should be easier because it is based on existing ontologies and, increasingly, re-uses other work which uses those ontologies.
• Communication with other applications using the shared ontology should be possible.