«Abstract Medical terminology is one of the most dynamic terminological domains, and the choice of one term instead of the other is not random, but ...»
From Term Dynamics to Concept Dynamics:
Term Variation and Multidimensionality in the
Pilar León-Araúz, Arianne Reimerink
Department of Translation and Interpreting, University of Granada
Medical terminology is one of the most dynamic terminological domains, and the choice of one term
instead of the other is not random, but the result of different perspectives towards reality. VariMed is
a research project on medical term variants and its overall objective is to generate a multifunctional resource on the medical domain for linguistic research, translation and technical writing. In this pa- per, we propose a systematic way of extracting term variants from large corpora within the subdo- main of Psychiatry and how to represent them according to cognitive and communicative parame- ters. Our aim is to discover if different conceptualizations, or different conceptually motivated term variants, of the same concept are preferred in expert or semi-specialized communication. A corpus on Psychiatry was compiled and classified according to user types. A grammar was designed in NooJ (Sil- berztein, 2003) in order to extract term variants based on the usual lexico-syntactic patterns accom- panying synonyms (also known as; commonly referred to as, etc.). Corpus analysis results indicate that, from a cognitive perspective, term variants reflect the prototypical dimensions in which psychiatric disorders may be classified. From a communicative perspective, terms and dimensions can also be as- sociated with user-based parameters.
Keywords: term variation; cognitive motivation; communicative motivation 1 Introduction The medical domain has over 25 centuries of history and involves numerous disciplines which affect all human beings to some extent. Therefore, medical terminology is one of the most dynamic termi- nological domains, and the use of one term instead of the other implies perceiving and conceptuali- zing aspects of reality from different perspectives (Prieto Velasco et al 2013: 168). VariMed is a research project on medical term variants and its overall objective is to generate a multifunctional resource on the medical domain for linguistic research, translation and technical writing. In this paper, we pro- pose a systematic way of extracting term variants from large corpora within the subdomain of Psy- chiatry and how to represent them according to cognitive and communicative parameters. The rela- tionship of specialized communications between a terminological resource and its user implies a 657 Proceedings of the XVI EURALEX International Congress: The User in Focus prototypical discursive positioning (Harré and Langenhove 1999), which is reflected in a specialized text sender and receivers with a different background knowledge level. Terminological resources should provide adequate terminological units and an adequate knowledge load (Tarp 2005: 8-9), always according to their potential users’ continuum of general-specific language and knowledge purposes (León-Araúz et al 2013: 33).
This is in consonance with the Functional Theory of Lexicography (FTL; Bergenholtz and Tarp 1995, 2003). According to the FTL, there are two main types of lexicographic functions that cover use situations and different user needs (Wiegand 1989). These functions are cognition and communication-oriented (Bergenholtz and Tarp 2003; Bergenholtz and Nielsen 2006). In cognition-oriented situations, users seek additional information to widen their knowledge about the conceptual structure of a particular subject-field (psychiatry, neurology, oncology, etc.). Bergenholtz and Nielsen (2006: 286) explain that in these situations, the only communicative act taking place is between the terminographer and the users of the resource. The users want knowledge and the lexicographers provide it at a cognitive level, nothing more. The most difficult task is then, for the terminographer to decide how much information is to be included and how to represent its underlying structure to make the dictionary suitable to meet users’ needs. On the other hand, in communication-oriented situations, two or more persons are engaged in producing or receiving a piece of language. This is the case of a translator who receives and must subsequently produce a text, as well as scientific writers, proofreaders, etc. Here the terminographer acts as a kind of mediator who helps to solve communication problems. We believe any terminological resource should satisfy both (León-Araúz et al 2013: 34).
In section 2, we give a brief overview on term variation. In section 3, we present how term variants are extracted from a specialized corpus on Psychiatry with a pattern-based grammar in NooJ, an NLP application (Silberztein, 2003). In section 4, a selection of the extracted variants is classified according to dimensional features and the results are compared across three subcorpora in order to see if certain dimensions are preferred in one discourse or the other. Finally, section 5 covers the conclusions and further research.
2 Term Variation
Although specialized language initially aspired to having one linguistic designation for each concept for greater precision, it is true that the same concept can often have many different types of linguistic designations. In the same way as in general language, there is terminological variation based on user-based parameters of geographic, temporal or social variation or usage-based parameters of tenor, field, and mode (Gregory and Carroll 1978). However, terminological variation also occurs for reasons that are often considerably more complex and difficult to explain. Freixa (2006: 52) classifies the causes for terminological variation in the following categories: (1) dialectal, caused by different origins of the authors; (2) functional, caused by different communicative registers; (3) discursive, caused by difLexicography for Specialised Languages, Technology and Terminography Pilar León-Araúz, Arianne Reimerink ferent stylistic and expressive needs of the authors; (4) interlinguistic, caused by contact between languages; (5) cognitive, caused by different conceptualizations and motivations. According to Freixa (2002), certain term variants are not only formally different, but also semantically diverse, as they give a particular vision of the concept. In this sense, Fernández-Silva et al (2011) describe this phenomenon as the linguistic reflection of conceptual multidimensionality. Multidimensionality has been defined by many authors (Bowker 1997, Kageura 1997, Wright 1997, Rogers 2004) as the phenomenon in which certain concepts can be classified according to different points of view or conceptual facets.
This has important consequences in how domains are categorized and modelled. According to Picht and Draskau (1985, 48 apud Rogers 2004, 219), multidimensionality depends on who is the classifier as well as the different knowledge sources that may reflect different criteria when organizing the same domain or knowledge node. For example, botanists would classify roses different from rose growers.
However, multidimensionality has also an impact on term variation, since concepts can be designated in more than one way based on the different characteristics that it possesses (Fernández-Silva et al 2011). Thus term variation should not be regarded as a linguistic phenomenon isolated from conceptual representations, since it is one of the manifestations of the dynamicity of categorization and expression of specialized knowledge (Fernández Silva et al in press).
Fernández-Silva (2010: 60-71) classifies the cognitive factors involved in term variation, based on numerous authors, according to two criteria. Firstly, the first category division depends on whether the cognitive factor refers to the conceptual organization or to its usage. Secondly, within the usage category, the categorization of the factor depends on how reality is conceptualized by certain groups of people or individuals or how reality is conceptualized according to the specific context in which the concept is used (Table 1).
As can be inferred by Table 1, all Freixa’s causes for term variation can be approached from a cognitive perspective according to Fernández-Silva. This study combines Freixa’s second (functional) and fifth (cognitive) causes for term variation with Fernández-Silva’s perspective, since it analyses the multidimensionality of the conceptual system and how the different conceptual dimensions correlate with the adaptation to the level of expertise of the receiver. Our aim is to discover if different conceptualizations, or different conceptually motivated term variants, of the same concept, are preferred in expert communication or semi-specialized communication.
3 Extracting Term Variants
A specialized corpus was compiled on the Psychiatry domain, which has more than 10 million tokens, and it was divided according to user and genre types: expert, semi-specialized and encyclopaedic. The expert corpus contains specialized books and journal papers written by experts for experts, such as the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 2013). The semi-specialized corpus consists of web pages and brochures written by experts from Medline Plus and the National Institute of Mental Health, which combines basic and clinical research with information for patients, or their relatives, suffering from any kind of mental disorder. Finally, the encyclopaedic corpus consists of a Wikipedia dump which was automatically collected through categories such as Psychiatry, Syndromes, Disorders, etc. We considered that Wikipedia should belong to this corpus because, being an encyclopaedic resource, it usually contains metalinguistic information on synonyms and variants that could be useful in our research.
Once the corpus was compiled and classified, a NooJ local grammar was designed in order to extract term variants (Figure 1). The grammar is based on the usual lexico-syntactic patterns accompanying synonyms (also known as; commonly referred to as, etc.) combined with specialized terms, namely syndromes, disorders and diseases.
Local grammars in NooJ work in conjunction with dictionary-like resources that act like a parser. In this case we used the default general language dictionary in NooJ as a POS tagger. However, since the dictionary does not include highly specialized terms, we had to include the UNK code in order to locate the terms that are unknown to the system. Thus, the grammar in Figure 1 identifies different sequences where a noun (N) or an unknown word (UNK), optionally preceded by an adjective (A) or another unknown word, are followed by the patterns in the variant1 sub-graph together with another similar structure. This helps us locate specialized terms on both sides of term variance structures and store them in variables from which we can generate the following output: x is a variant of y (Figure 3). This output let us build our own dictionary, where variants are different entries but are linked to the same concept.
Figure 3: Term variant extraction output.
Furthermore, this grammar has a recursive path in order to identify the cases where different variants are enumerated, as in …dermatillomania (also known as neurotic excoriation, pathologic skin picking or compulsive picking). In this case, three different outputs are produced (neurotic excoriation is a variant of dermatillomania; pathologic skin picking is a variant of dermatillomania; compulsive picking is a variant of dermatillomania). By using the statistical module in NooJ (based on the standard score), we can conclu
de that, not surprisingly, term variance structures are most often lexicalized in the Wikipedia corpus, then in the semi-specialized corpus, and finally in the expert corpus (Figure 4).
4 Representing Term Variants In terminological resources, users are often confronted with a vast array of variants with no other information on how term variation arises and how their use may be constrained. However, they need to know when to use each of the variants and the conceptual connotations they imply, since this will affect the receiver’s interpretation of the message.
In our study, we have found many different types of term variants for the same concept. Some of them were just acronyms, dialectal, orthographic or morphological variants, which, of course, need to be stored in any terminological resource, but their impact on communication is obvious, and their use does not usually need any further explanation. In this paper, however, we focus on dimensional variant types, which need more in-depth study, since they affect both cognitive and communicative situations. Dimensional variants show different conceptualizations of the same concept according to different facets and are usually conveyed by multi-word terms. For instance, Ganser syndrome, nonsense syndrome and prison psychosis are all variants of the same concept, but the first one highlights a discoLexicography for Specialised Languages, Technology and Terminography Pilar León-Araúz, Arianne Reimerink verer dimension (Sigbert Ganser was the first to describe the syndrome), the second one focuses on the symptom dimension (saying nonsense is one of them) and the third one on the location dimension (it often takes place in prisons, since it affects inmates). We collected from the corpus all the concepts that showed more than one variant type and classified their corresponding variants according to the dimension conveyed. In Table 2, we show the dimensions we found with an illustrative example.
Of course, there are variants that may show different dimensions at the same time, such as chronic [+Time] traumatic [+Cause] brain [+Body_part] injury associated with boxing [+Cause] or alcohol-induced [+Cause] amnestic [+Symptom] disorder; and different variants for the same concept that highlight the same dimension, such as Alice in Wonderland [+Symptom] syndrome and Lilliputian hallucination [+Symptom], which refer to the same symptom of the disorder, or nonsense [+Symptom] syndrome and.syndrome of approximate answers [+Symptom], which convey the same dimension but refer two different symptoms.