Home
What's New
Faculty
Resources
Feedback
Main Text

INTRODUCTION

Controlled vocabularies are crucial to almost all healthcare applications. Clinical systems collect data about patient care which require controlled terms for mundane functions, such as billing or for more sophisticated ones, such as electronic medical records. General information systems need controlled vocabularies to index articles from journals, books, and proceedings so that information retrieval is possible. Medical expert systems use controlled vocabularies to map patients' data to their knowledge sources in search for a solution to a patient care scenario.

It is surprised that such a critical area in medical informatics, natural language processing (NLP) and controlled vocabularies, has been neglected so far. Researchers just realize that advances in other areas, such as computer-based patient record system, knowledge-based systems, cannot proceed farther without a standardized nomenclature and classification of medical terms. There exist several modern nomenclature and classification systems nowadays, among them:

    SNOMED: Systematized Nomenclature of Human and Veterinary Medicine
    UMLS : Unified Medical Language System
    CPT : Current Procedural Terminology
    ICD: International Classification of Diseases

In this article, these different nomenclature and classification systems are compared and contrasted along these following axes: history of their development, organizing principles, structure, current usage, and usage requirements for clinical vocabularies.

SNOMED

SNOMED is the Systematized Nomenclature of Human and Veterinary Medicine. SNOMED International was introduced in September 1993 and is traceable to its roots in the early 1960s as the Systematized Nomenclature for Pathology. SNOMED International is a comprehensive, multiaxial nomenclature classification work created for the indexing of the entire medical record, including signs and symptoms, diagnoses, and procedures. Its unique design will allow full integration of all medical information in the electronic medical record into a single data structure. The most recent version of SNOMED International (ver. 3.4) contains more than 150,000 terms and term codes in 11 separate modules.

The SNOMED International classification system contains 11 separate modules, listed below. There are more than 150,000 terms and termcodes included in the system; the numbers in parentheses below indicate the number of records contained within each module.

Topography

A functional anatomy for human and veterinary medicine.

(12,803 records)

Morphology

Terms used to name and describe structural changes in disease and abnormal development

(5,672 records)

Function

Terms used to describe the physiology and pathophysiology of disease processes

(18,027 records)

Living Organisms

Living organisms of etiological significance in human and animal disease

(24,480 records)

Chemicals, Drugs, and Biological Products

Including pharmaceutical manufacturers.

(14,275 records)

Physical Agents, Activities, and Forces

A compilation of physical activities, physical hazards, and the forces of nature.

(1,410 records)

Occupations

Developed by, and used with permission from, the International Labour Office in Geneva, Switzerland

(1,947 records)

Social Context

Social conditions and relationships of importance to medicine.

(845 records)

Diseases/Diagnoses

A classification of the recognized clinical conditions encountered in human and veterinary medicine

(34,377 records)

Procedures

A classification of health care procedures

(28,685 records)

General Linkages/Modifiers

Linkages, descriptors, and qualifiers to link or modify terms from each module

(1,373 records)

SNOMED International is rapidly being accepted worldwide as the standard for indexing medical record information. The American Veterinary Medical Association and the American Dental Association have recognized SNOMED's virtues and have adopted/endorsed SNOMED for their use. In addition, SNOMED is specified as the controlled terminology and message standard for interchange of biomedical images and image-related information in the DICOM (Digital Imaging and Communications in Medicine) standards.

An example of SNOMED as a nomenclature and as a classification system is shown below:

Nomenclature

Classification

Topography +

Morphology +

Etiology +

Function =

Disease

Crystalline lens +

Cataract, Mature +

Acquired +

Low vision =

Disease of lens

T-XX700

M-51120

E-0024

F-X0050 =

D-X080

UMLS

In 1986, the National Library of Medicine (NLM) began a long-term research and development project to build the Unified Medical Language System (UMLS®). The purpose of the UMLS is to aid the development of systems that help health professionals and researchers retrieve and integrate electronic biomedical information from a variety of sources. The UMLS approach involves the development of machine-readable Knowledge Sources that can be used by a wide variety of applications programs to compensate for differences in the way concepts are expressed in different machine-readable sources and by different users, to identify the information sources most relevant to a user inquiry, and to negotiate the telecommunications and search procedures necessary to retrieve information from these sources. The goal is to make it easy for users to link disparate information systems, including computer-based patient records, bibliographic databases factual databases, and expert systems.

There are four UMLS Knowledge Sources: the Metathesaurus®:, the SPECIALISTtm Lexicon, a Semantic Network and an Information Sources Map. Most heavily used to date, the Metathesaurus provides a uniform, integrated distribution format for more than 30 biomedical vocabularies and classifications, linking many different names for the same concepts. The Lexicon contains syntactic information for many Metathesaurus terms, component words, and English words, including verbs that do not appear in the Metathesaurus. The Semantic Network contains information about the types or categories (e.g., "Disease or Syndrome," "Virus") to which all Metathesaurus concepts have been assigned and the permissible relationships among these types (e.g., "Virus" causes"Disease or Syndrome"). The Information Sources Map or directory contains both human-readable and machine-"processable" information about the scope, location, vocabulary, syntax rules, and access conditions of biomedical databases of all kinds

The UMLS Knowledge Sources were designed as multi-purpose tools, to facilitate the development of more effective biomedical information systems. As intended, they have been applied in a wide variety of research and development environments to many different tasks, including vocabulary development, knowledge representation, clinical data capture, linking patient data to knowledge sources, curriculum analysis, natural language processing, automated indexing, and information retrieval.

Particularly in its early years, but also more recently, the UMLS project commissioned exploratory and ancillary studies on such topics as user information needs, methods of organizing and merging vocabulary information, and information retrieval techniques and also developed specialized tools for use in the research effort.

CPT-4

Physicians' Current Procedural Terminology, 4th Edition (CPT-4) is a listing of descriptive terms and identifying codes for reporting medical services and procedures performed by physicians. The purpose of the terminology is to provide a uniform language that will accurately describe medical, surgical, and diagnostic services, and will thereby provide an effective means for reliable nationwide communication among physicians, patients, and third parties. CPT first appeared in 1966.

Each procedure or service is identified with a five digit code. The main body of the material is listed in six sections. Within each section are subsections with anatomic, procedural, condition, or descriptor subheadings. The procedures and services with their identifying codes are presented in numeric order with one exception-the entire Evaluation and Management section (99201-99499) has been placed at the beginning of the listed procedures.

A physician using CPT terminology and coding selects the name of the procedure or service that most accurately identifies the service performed. The physician then may list other additional procedures performed or pertinent special services. When necessary, he lists any modifying or extenuating circumstances. Any service or procedure should be adequately documented in the medical record. Any procedure or service in any section of the CPT book may be used to designate the services rendered by any qualified physician.

Specific "Guidelines" are presented at the beginning of each of the six sections. These Guidelines define items that are necessary to appropriately interpret and report the procedures and services contained in that section.

The star "*" is used to identify certain surgical procedures that the usual "package" concept for surgical services cannot be applied. Such procedures are identified by a star (*) following the procedure code number.

A modifier provides the means by which the reporting physician can indicate that a service or procedure that has been performed has been altered by specific circumstances but not changed in its definition or code.

ICD-9-CM

ICD-9-CM stands for International Classification of Diseases, 9th Revision, Clinical Modification, published under different names since 1900. ICD-9-CM is a statistical classification system that arranges diseases and injuries into groups according to established criteria. Most ICD-9-CM codes are numeric and consist of three, four or five numbers and a description. The codes are revised approximately every 10 years by the World Health Organization and annual updates are published by HCFA. ICD-9-CM is based on the official version of the World Health Organization (WHO), 9th Revision, International Classification of Diseases (ICD-9).

ICD-9-CM was originally published as a three volume set (2nd edition). Newer versions of ICD-9-CM are available as two separate books (volume 1 and Volume 2) and as a single book containing Volume 1 and Volume 2, or Volumes 1, 2, and 3 depending on the publisher.

The Tabular List (volume 1) is a numeric listing of diagnosis codes and descriptions consisting of 17 chapters that classify diseases and injuries, two sections containing supplementary codes (V codes and E codes) and six appendices.

The Alphabetical Index (Volume 2) of ICD-9-CM consists of an alphabetic list of terms and codes, two supplementary Sections following the alphabetic listing, plus three special tables found within the alphabetic listing.

The Procedures: Tabular and Alphabetic Index (Volume 3) consists of two sections of codes that define procedures instead of diagnoses. Frequently used incorrectly by health care professionals, codes from Volume 3 are intended only for use by hospitals. The ICD-9-CM Procedure Classification is a modification of WHO's Fascicle V, Surgical Procedures, and is published as Volume 3 of ICD-9-CM. It contains both a Tabular List and an Alphabetic Index. Approximately 90% of the rubrics refer to surgical procedures with the remaining 10% accounting for other investigative therapeutic procedures.

REQUIREMENTS FOR CLINICAL VOCABULARIES

Cinimo et al. (1989) have defined six attributes as criteria for building and evaluating clinical vocabularies. Evans et al. (1991) have defined three additional features essential for concepts in clinical vocabularies:

  1. Domain completeness-coverage of all possible terms that lie within a vocabulary's domain.
  2. Unambiguity-the same term cannot refer to more than one concept.
  3. Nonredundancy-each concept must be presented by one unique identifier.
  4. Synonymy-multiple ways of expressing a word or concept must be allowed.
  5. Multiple classification-concepts must be allowed to be classified in multiple hierarchies.
  6. Consistency of views-concepts must have the same relationship in all views.
  7. Explicit relationships-all relationships must be explicitly labeled.
  8. Lexical decomposition-each concept must be lexically decomposable so that different attributes can be assigned.
  9. Semantical typology-each concept allows for restriction of allowable modifiers and grounds for synonyms.
  10. Extensible composition-certified terms can be allowed to generate new concepts.

CONCLUSION

All current nomenclature and classification systems do not meet all criteria of clinical vocabularies as proposed by Cimino et al. and Evans et al. CPT-4 and ICD-9-CM are most of the time used for financing purposes while SNOMED and UMLS promise a brighter use in clinical applications. 

Glossary

History

References

[Home] [What's New] [Faculty] [Resources] [Feedback]

Contact thuynh@uic.edu with comments or questions regarding this site. © Copyright, Tai Huynh, MD. All rights reserved. Last modified November 29, 1999