FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 |

«Abstract. A description of a medical cases can – as any statement about reality – contain more or less information. The aim of a classification ...»

-- [ Page 1 ] --

From Terminologies to Classifications – the

Challenge of Information Reduction

Hans Rudolf STRAUBa, Maurus DUELLIa, Norbert FREIb, Hugo MOSIMANNa and

Annette ULRICHa


Semfinder AG, Kreuzlingen, Switzerland


Interstate University of Applied Sciences of Technolog NTB, Buchs, Switzerland


A description of a medical cases can – as any statement about reality – contain

more or less information. The aim of a classification is to express as much as

possible with a minimum of words (classes). For this purpose the information contained in a terminology must be reduced. Is such a reduction an obvious process? In this paper we examine this question by considering practical aspects arising from the task of "teaching" computers automated ICD-10 coding of diagnoses in text form.

We first assess the extent of information reduction and then discuss the path along which this reduction takes place. The role and conditions of a true hierarchical structure are discussed, as well as the questions that stem from reduction of the many semantic dimensions to the single dimension of a formal hierarchy. Special attention is given to the sum/summands problem, a major challenge for automated classification in practice.

Are medical classifications necessary at all? Just because extracting class information from terminological data is not self-evident, the classification holds information which is not otherwise available.

1. Introduction The information available about a medical case, a patient, is always less than the information that could theoretically be found at the moment of observation of the real case. The language that we use to describe the patient can be differentiated according to several characteristics: the divide between ontological and epistemological viewpoints has recently been discussed [1,2] and the discussion looks set to continue.

In this paper we do not emphasize this distinction, but we do look more closely at the question of granularity, e.g. of the information content of a language.

"Language" is used in a broad sense in this context and includes "free" natural languages, standardized and structured terminologies like SNOMED CT (with a fine granularity and a large information content) and classifications like ICD-10 (with a coarse granularity and a poor information content). Of course, nobody believes that it would be possible to extract information in a fine granular language (terminology) from the information found in the terms or codes of a classification. But is it – on the other hand – possible to go in the other direction and assign a case to a classification with the aid of the terms in the terminology alone? At first glance this seems self- evident. A clo

–  –  –

Figure 1: From Reality to Terminologies to Classifications

2. Information reduction

2.1. From reality to observation In reality, every hair of a patient can be counted. But this information is not what the physician wants to know. Nor does he want to know the condition of every single red blood cell. It is sufficient for him to know that most appear to be normal and that they occur in numbers within a certain range of normality. If anaemia is present, it is not necessary to know all the details of the single cells; it is sufficient to know their condition and numbers in general terms. Obviously only a very small part of the information relating to the real case is observed by the physicians, nurses and laboratories, yet this is not a shortcoming, but a desirable outcome, since we do not need every single piece of information to cure the patient. Too much information would confuse the observer and he wouldn't be able to see the wood for the trees.

The fact that the look is of limited closeness implies a reduction of information, but closeness is not the only aspect. Also the direction of the look means a selection of what is possibly observed. This selection is intended, too. The complaints of the patient direct the views of the medical professionals. When he complains about acute abdominal pain, the doctor will most probably not perform a CT scan of the head.

All in all, the reduction of information content from reality to observation is obviously huge.

2.2. From observation to medical records

Not every observation is worth recording and of course only a small part of the information in doctors’ and nurses’ heads finds its way into medical records.

Information in the records can be in pictures, in numbers (quantitative) or in words (qualitative). For purposes of this paper we confine ourselves to the words, they carry the qualitative information, which is the main scope of terminologies and medical classifications.

2.3. From medical records to diagnoses The diagnoses are usually a small part of the information in the medical record.

2.4. From diagnoses to codes and DRGs Again there is a reduction of the amount of information and again this reduction is intended. The fewer codes or DRGs (diagnosis related groups) there are, the easier it is to compare cases statistically in groups.

2.5. Estimation of the information content on each layer of granularity In Figure 1 the information content of the layers of granularity is estimated roughly.

The number of permitted instances in the layers provides an estimate of the information content of a selected instance (selective information content according to Shannon [10] and MacKay[6]).

DRGs usually amount to several hundred groups and usually include less than a thousand groups. The ICD-10 has roughly 15,000 codes, depending on the version in question. SNOMED CT contains more than 1 million terms. Compared to these still small numbers, the information content of a medical case is impossible to quantify in reality. In Figure 1 it is shown as a cloud, which represents the huge amount of information as well as its lack of form at this stage of interpretation.

The number in brackets (and the points in the three quadrilaterals representing the interpretation layers) in Figure 1 reflect the fact that, although there are several ICD-10 codes for one case, there is by definition just one DRG for the same case. The information content of the single ICD-10 codes is multiplied and the information content of the whole is the product of the contents of the single codes. In Figure 1 we assume that each DRG has two codes. This is of course a rough estimate. Not every combination of codes is possible, but usually there are more than two diagnostic and therapeutic codes per case.

What is true for the codes is true for the terms. Many terms combine to give one code. Not every term is used for ICD-10 coding. Therefore not only is the information content of one SNOMED clinical term reduced to one ICD-10 code, but several terms in the medical record lead to just one code.

2.6. Amount of the information reduction

As can be seen from Figure 1, the amount of information explodes when we go from the bottom (DRGs) to the top (free text in the medical record). The information in the real case (cloud) is again much richer than the information in the medical record (we shan’t offer a quantitative estimate at this point). In the other direction, from the real case to the codes and the DRGs, the information content of the medical case is radically reduced.

2.7. The coding process

Our group creates programs for automated ICD-coding with computers. The installations are designed around an inference machine, which reads the free text (noun phrases) and produces ICD-10 codes. If the input is not precise enough, the program requests the missing information in the form of a context specific multiple-choice question. As an internal representation language we use concept molecules [12,14], which permit precise and structured modelling of the descriptive [6] information content of the words in the physicians’ natural language as well as the information contained in the ICD-10 codes.

3. Is the result of the coding process naturally deducible?

–  –  –

3.1. Deduction in a hierarchical tree A hierarchy (Figure 2) has two conditions: disjunctivity and unidirectionality.

Disjunctivity means that the siblings on each level are mutually exclusive. If a mammal is a dog, it cannot be a cat at the same time.

Unidirectionality in a hierarchy means that the branchings go in only one direction: mammals can be differentiated as dogs, cats, cows, elephants, etc. However, this differentiation cannot apply in the other direction: elephants are mammals and can never be fish. If a hierarchy were not strictly unidirectional, it would contain ring structures and would not be a hierarchy, but a net.

If the two conditions apply, we have a true hierarchy and this means that we can easily make conclusions based on the leaves of the hierarchical tree back to the branches: if we know that the subject is an elephant, we can conclude that it is a mammal and that it is a vertebrate. Furthermore we can pass the properties of the elements in the upper layer to those in the lower layers. The elephant inherits all the properties of mammals as well of those of vertebrates.

This is a stroke of luck for knowledge representation: we don’t need to show all the information about elephants, dogs, cats etc. again for each species, as it is sufficient to show the common information just once at the upper level. This saves space in the representation and makes maintenance easier and more transparent.

A hierarchical tree is therefore ideal for knowledge representation purposes.

Properties are passed from the root to the leaves, from coarse granular to fine granular levels. Class information, however, is deduced in the opposite direction, from fine granular to coarse granular levels (elephant mammal). This deduction is selfevident in a hierarchical tree, but is dependent on the two conditions explained above.

A natural deduction of this kind from fine to coarse granular levels would be exactly what we are striving for in the coding process described in Section 2.7. If the information reduction "funnel" in Figure 1 could be designed as a hierarchy, we could easily deduce the identity of a medical case on the coarse level from its description on the fine granular level. In other words, we could safely deduce the ICD-10 code from the description of the case in medical terms without external assistance.

Is this possible?

3.2. Difference between the zoological system and the system of diseases

Unfortunately the system of diseases cannot be arranged naturally in a hierarchical

tree. The reason for this is linked to the two conditions required for a hierarchy:

disjunctivity and unidirectionality both apply naturally in the case of animals and plants but are absent in the case of diseases.

In zoology the disjunctivity condition is naturally guaranteed by the fact that two species cannot mix (species barrier). Because cats and dogs cannot have offspring together, the two species are definitively disjunct.

The unidirectionality condition is based on the history of the evolution of species.

Since species have evolved along the unidirectional time line, this evolution cannot be reversed. Elephants cannot evolve into fish in the future.

The evolution of zoological species is, however, a special case in nature. The fact that this system is in the form of a perfect hierarchical tree is due to the history behind its evolution.

Such a history is absent in the development of diagnoses. Diseases do not evolve from other diseases as zoological species evolve over time from more ancient species.

Certainly diseases are related to each other. One disease can lead to another. But these relationships are much more complicated than the ones in zoology.

Because the two conditions, disjunctivity and unidirectionality, are not present in the system of diseases, this system does not occur naturally in the form of a hierarchy.

If we want to make it into a hierarchy for practical reasons – and there are good reasons for this! – we have to create it artificially. As soon as a structure is artificial, however, its shape can be altered and becomes arbitrary.

Statistical methods (variance reduction with regard to a target variable) can be used to perfect a system as Fetter has done in creating the first DRG systems [5]. Such statistically created systems are designed to serve specific needs (economic ones in the case of DRGs), they are artificial and do not have natural and unpassable boundaries like the above described species barriers in zoology.

The ICD-10 classification is designed as a hierarchy. This does offer many advantages, but we have to remember that its structure is arbitrary – however well designed it may be. If we want to assign ICD-10 codes to diagnoses, we must reduce the complex information of the real case diagnoses until it fits into the artificial hierarchical tree. What information gets lost? Additional rules – inclusiva et exclusiva – are necessary for this task.

If we try to obtain ICD-10 codes automatically from natural language diagnoses, we can see how more complex structured information is arranged in a hierarchical tree.

In the next section I intend to show how this is done.

4. ICD-10 coding of arterial hypertension

4.1. Semantic dimensions (degrees of freedom) If we want to code a diagnosis, we first have to analyse the characteristics used in the target coding system. The terms used to describe the codes are best arranged in groups of the same semantic "flavour".

Terms of the same "flavour" represent tokens of the same type. Usually, for each "flavour", just one token can be assigned independently to a diagnosis, so that the diagnosis has as many tokens assigned to it as there are "flavours". The "flavours" can be seen as semantic dimensions, as axes in a semantic space or as degrees of freedom, the latter in order to express the independence of each dimension. They are related, but not completely identical, to the partitions and features (qualities) of the semantic web [9]. The exact differences between the methods of partitioning in the semantic web and the here depicted semantic dimensions as well as the consequences of these differences must be the subject of an additional paper yet to appear.

Pages:   || 2 | 3 |

Similar works:

«DRUG FORMULARY Updated after HEY Drug and Therapeutics Committee meeting Of September 2013 This formulary is used and maintained by Hull and East Yorkshire Hospitals NHS Trust. It is also used as a formulary by Humber Foundation NHS Trust (HFT) and City Health Care Partnership (CHCP). The formulary has been agreed with Hull Clinical Commissioning Group and East Riding of Yorkshire Clinical Commissioning Group with the expectation that providers within Hull and East Riding will prescribe and...»

«Occupational Health for Personnel Handling Laboratory Animals Hazards Associated With The Use Of Laboratory Animals* Working with laboratory animals, or tissues from laboratory animals, is associated with potential health hazards to humans. These hazards include 1) bites, scratches, and kicks; 2) allergic reactions; and 3) possible zoonotic diseases. The key to minimizing these hazards is awareness and proper training. An on-line program titled “Occupational Health and Safety for Personnel...»

«DRUG FORMULARY Updated after HEY Drug and Therapeutics Committee meeting of July 2016 This formulary is used and maintained by Hull and East Yorkshire Hospitals NHS Trust. It is also used as a formulary by Humber Foundation NHS Trust (HFT) and City Health Care Partnership (CHCP). The formulary has been agreed with Hull Clinical Commissioning Group and East Riding of Yorkshire Clinical Commissioning Group with the expectation that providers within Hull and East Riding will prescribe and...»

«Posted on Public Health Website on 5/10/16 Final Comments about Thurston County OSS Management Plan The online survey was open from 10/9/15-12/15/15 and reopened from 2/2/16 – 2/18/16. Comments were also submitted via letter, e-mail, printed survey forms, and phone calls.PLAN RECOMMENDATIONS Generally supportive Septic plan recommendations are 1-not strong enough for reasonable, and 3-just right for public health. The current system is totally unreasonable and a flat rate makes much more...»

«127 PHYSICIAN IN THE SKY STANISLAV JUŽNIč Background (Mathias) Gregor Kraškovič (Kraschovic, Kraskowitz, * March 2, 1767 Studenec in Bloke in today Slovenia; † January 5. 1823 Pile in periphery of Dubrovnik) was one of the best physicians of his era. He pioneered the manned balloon flights and focused his research on the medical circumstances of aeronaut in higher regions of the atmosphere. Kraškovič’s achievements are well known and several web pages are devoted to his work. His...»

«PRODUCT MONOGRAPH Pr SALAGEN® pilocarpine HCl 5 mg tablets cholinomimetic agent Pfizer Canada Inc. DATE OF REVISION: 25 September 2014 17,300 Trans-Canada Highway Kirkland, Quebec H9J 2M5 ® Registered trademark of Eisai Inc./used under License Pfizer Canada Inc. 2014 Submission Control No: 175412 SALAGEN Product Monograph Page 1 of 32 Table of Contents PART I: HEALTH PROFESSIONAL INFORMATION SUMMARY PRODUCT INFORMATION INDICATIONS AND CLINICAL USE CONTRAINDICATIONS WARNINGS AND PRECAUTIONS...»

«21 ○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○ A Message from the Secretary of Labor ○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○ To Concerned Workers and Employers: Every year, more than 250 workers in the United States die with silicosis, an incurable, progressive lung disease caused by overexposure to dust containing silica. Hundreds more become disabled by this disease. Every...»

«Running head: MEDICAL INTUITION 1 Medical Intuition: The Science of Intuitively Perceiving Physical Ailments and Diseases Jonas L.W. Nordstrom In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Esoteric Studies Kona University June, 2013 MEDICAL INTUITION 2 Abstract During the last thirty years, numerous books have been published in the field of medical intuition, describing how certain individuals are able to intuitively scan another individual’s body and...»

«Medical Review Section Driver Improvement Office Introduction The Medical Review Section of the Ministry of Transportation is responsible for the delivery of medical review programs including suspending driving privileges for drivers found to be unfit due to medical reasons and reinstating driving privileges for those individuals deemed to meet the medical standards. In 2008 the Medical Review Section processed 191,000 medical reports. This includes: • Mandatory physician reporting •...»

«Cosmetic Dentistry Cosmetic dentistry isn't just the quest for a white smile or straight teeth. Many procedures are imperative for optimal dental health. However, despite the necessity of some of these procedures, people with diabetes must be careful about the work going on inside their mouths. dLife spoke with Dr. Michael Goldberg, DMD, a dentist and author of What The Tooth Fairy Didn't Tell You, based in New York, and Dr. Ruben Cohen, DDS, an oral and maxillofacial surgeon also based out of...»

«PUBLISHED BY World's largest Science, Technology & Medicine Open Access book publisher 96,000+ 2750+ 89+ MILLION INTERNATIONAL AUTHORS AND EDITORS OPEN ACCESS BOOKS DOWNLOADS AUTHORS AMONG 12.2% BOOKS TOP 1% AUTHORS AND EDITORS DELIVERED TO MOST CITED SCIENTIST FROM TOP 500 UNIVERSITIES 151 COUNTRIES Selection of our books indexed in the Book Citation Index in Web of Science™ Core Collection (BKCI) Chapter from the book Lipid Peroxidation Downloaded from:...»

«SANCO – D.1 (06)D/410160 SUMMARY RECORD OF THE STANDING COMMITTEE ON THE FOOD CHAIN AND ANIMAL HEALTH HELD IN BRUSSELS ON 15 & 16 June 2006 (Section Animal Health and Welfare) (Section Controls and Import Conditions) (Section Animal Nutrition) President: Bernard Van Goethem and Alberto Laddomada Willem Penning for the Animal Nutrition section All the Member States were present. Malta, partly absent, was represented by the United Kingdom. Cyprus represented Italy and Greece, partly absent at...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.