Development of a case-based system for grouping diagnoses in general practice

https://doi.org/10.1016/j.ijmedinf.2007.08.002Get rights and content

Abstract

Introduction

This article describes the development of EPICON; an application to group ICPC-coded diagnoses from electronic medical records in general practice into episodes of care. These episodes can be used to estimate prevalence and incidence rates.

Methods

We used data from 89 practices that participated in the Dutch National Survey of General Practice. Additionally, we held interviews with seven experts, and studied documentation to establish the requirements of the application and to develop the design. We then performed a formative evaluation by assessing incorrectly grouped diagnoses.

Results

EPICON is based on a combination of logical expressions, a decision table, and information extracted from individual cases by case-based reasoning. EPICON is able to group all diagnoses in the selected 89 practices, and groups 95% correctly.

Conclusion

The results cautiously indicate that EPICONs performance will probably be adequate for the purpose of estimating morbidity rates in general practice.

Introduction

General practitioners increasingly register patient data in electronic medical records (EMRs). These data could be a valuable source for epidemiologic research. Accurate recording and coding of diagnoses during consultations can be an especially useful source for estimating prevalence and incidence rates of diseases encountered in general practice. These rates are important for making probability diagnoses, monitoring diseases in the population, conducting scientific research, and evaluating health care policy.

A diagnosis in general practice can refer to a symptom or a complaint (symptom diagnosis), a syndrome (nosological diagnosis) or a disease (pathological/pathophysiological diagnosis) [1]. In this article, we use the umbrella term diagnosis to refer to any of these categories.

Diagnoses in general practice are not directly suitable for estimating prevalence and incidence rates. This would require that all diagnoses of a patient which refer to the same health problem are grouped. For instance, a patient visits the general practitioner for a cough (diagnosis a) which develops into a pneumonia (diagnosis b) several days later. This health problem should be counted only once when estimating occurrences of diseases, namely as a case of pneumonia. To avoid double counting, diagnosis a and b have to be grouped. Diagnoses referring to the same health problem can be grouped into an episode of care, i.e., “all encounters for the management of a specific health problem” [1]. An episode of care is usually named after the last diagnosis, which can be used to estimate the numerator of the epidemiological fraction.

Generally, two approaches for constructing episodes can be used. In the first approach, the general practitioner groups diagnoses directly into a problem-oriented or episode-oriented medical record. Lawrence Weed introduced the problem-oriented medical record (POMR) in 1968. The POMR is centered around problems in a problem list [2]. Diagnoses that refer to the same health problem receive the same problem number, which can be used to estimate morbidity rates [3]. A disadvantage of using this method for epidemiologic research is that problem lists are frequently not kept up to date [4]. The new generation of Dutch primary care information systems is episode-oriented: all patient information is actually recorded into episodes of care [5]. Data from these episode-oriented systems are probably very well suited for epidemiologic research. However, these systems are still in an implementation phase. In the second approach, diagnoses are grouped afterwards, through manual review or a computerised method. This approach is useful if episodes are not or inadequately constructed by the general practitioner. In this article, we will describe the development of EPIsode CONstructor (EPICON), an application for grouping diagnoses afterwards into episodes. EPICON makes it possible to use data from EMRs in general practice for estimating prevalence and incidence rates.

Our project builds upon the second Dutch National Survey of General Practice (DNSGP-2), which has been described elsewhere [6], [7]. In the DNSGP-2, diagnoses were grouped afterwards into episodes for 89 general practices. A semi-computerised method was used in which ‘easy to group’ diagnoses were grouped automatically (80% of all diagnoses), and ‘difficult to group’ diagnoses were grouped manually (20% of all diagnoses). An example of a difficult case is a patient who is diagnosed with tiredness, and who has also been diagnosed with hypothyroidism and general deterioration. This case is complicated, because tiredness is a very non-specific symptom that is observed in many medical conditions. In other words, there are no explicit rules to decide whether tiredness should be grouped with hypothyroidism, with general deterioration, or as a separate episode. In general, complicated cases involve a multi-class classification task and an absence of clear-cut classification rules. The DNSGP-2 dataset, in particular the manually grouped diagnoses, contains implicit knowledge of the signs, symptoms, and the course of diseases, which could be used to solve the problem of grouping diagnoses.

Basic problem-solving approaches in the field of artificial intelligence are rule-based reasoning (based on if… then… rules), model-based reasoning (based on a causal or functional model), and case-based reasoning (based on examples) [8]. We selected case-based reasoning (CBR), because the domain knowledge needed to group diagnoses into episodes, is implicit knowledge, which lends itself more for reasoning based on analogy than for formulating domain rules or for constructing a model. Also, ample cases were available, for the DNSGP-2 dataset provided an extensive case library.

CBR is a problem-solving paradigm based on psychological theories of human cognition which provides a method for constructing intelligent systems. It focuses on analogy as a strategy for solving real-world problems. Human experts differ from novices in their ability to relate problems to previous ones, to reason based on analogies between current and old problems, and to use solutions from earlier experiences. A case-based reasoner solves a new problem by remembering a previous similar situation and reuses information and knowledge from that situation. The following four processes describe a general CBR cycle:

  • 1.

    Retrieve: Given a new problem, former similar cases are retrieved.

  • 2.

    Reuse: Information and knowledge in the retrieved cases are used to solve the problem.

  • 3.

    Revise: The solution is tested for success, and repaired if it fails.

  • 4.

    Retain: A successful solution is incorporated into the case base for future use.

Many case-based systems are so-called retrieval-only systems or act primarily as retrieval and reuse systems. They merely perform the retrieval or the retrieval-and-reuse task [9], [10], [11], [12], [13], [14].

The objective of our research was to develop a fully computerised method for the construction of care episodes. This project is divided into a development and evaluation phase. In the development phase, we assessed the requirements, designed, and built the system. In the evaluation phase, we performed a formative evaluation. We formulated the following questions:

  • Development phase:

    • 1.

      How were diagnoses grouped in the semi-computerised method?

    • 2.

      (a) What are the requirements and (b) what is the case-based design of the fully computerised method?

    • Evaluation phase:

    • 3.

      How many, and which diagnoses are misclassified by EPICON?

The aim of this project is to determine whether the development of a computerised grouping method can disclose data from EMRs in general practice for epidemiologic research.

Section snippets

Dataset

The dataset used in this research is a longitudinal set of patient records provided by a Dutch network of computerised general practices (LINH) [15], [16]. The general practitioners within this network record longitudinal data on consultations, including diagnoses, prescriptions, and referrals.

Within the framework of the DNSGP-2, episodes were constructed for LINH-data of 1 year (2001), which were used to estimate prevalence and incidence rates of diseases in general practice [17].

We selected

Semi-computerised method

Results of our inventory of the semi-computerised method that was used in the DNSGP-2 are presented in the form of a flowchart. Fig. 1 shows this method, which consisted of five steps (shown in between parentheses).

  • Step 1. The first consultation diagnosis of a patient in a 1-year registration period was grouped into a separate episode. ‘Create separate episode’ means that a separate episode number was assigned to a diagnosis. Operationally, an episode is a row of diagnoses with the same episode

Conclusions and discussion

To our knowledge, this is the first study into the development of a case-based system for grouping diagnoses in general practice. Previous research in constructing episodes of care focused primarily on grouping insurance claims records and did not use a case-based approach [22], [23], [24].

EPICON groups diagnoses into episodes, based on a combination of logical expressions, a decision table, and information extracted from individual cases by CBR. This application is able to group all diagnoses

References (25)

  • G.F. Luger et al.

    Knowledge-intensive problem solving

    Artificial Intelligence: Structures and Strategies for Complex Problem Solving

    (1998)
  • A. Aamodt et al.

    Case-based reasoning: foundational issues, methodological variations, and system approaches

    AI Commun.

    (1994)
  • Cited by (0)

    View full text