Developing a large electronic primary care database (Doctors’ Independent Network) for research

https://doi.org/10.1016/j.ijmedinf.2004.02.002Get rights and content

Abstract

Background and objectives: Primary care databases form a unique source of population-based clinical information on the prevalence and management of diagnosed disorders. Historically such databases have lacked individual level socio-economic markers. We describe the development of the Doctors’ Independent Network (DIN) database for epidemiological and health services research. DIN includes a socio-economic marker (ACORN) based on postcode linkage at individual patient level. The validity of DIN is assessed against the General Practice Research Database (GPRD). Methods: External validity is assessed by comparing the demographic structure and prevalence rates for treated ischemic heart disease (IHD) and treated hay fever with those from the GPRD. We assess the utility of a socio-economic measure (ACORN) based on postcode-linkage at individual patient level by examining the trend in prevalence rates of IHD and hay fever by ACORN index. Results: 142 practices providing high quality data were selected, with 1,827,361 fully registered patients contributing data between 1992 and 2001, representing an identical age–sex structure to that for England & Wales and GPRD. Regionally adjusted prevalence of treated IHD (7.29 and 5.37%, respectively for men and women aged 35+ in 1998) in DIN was highly comparable to GPRD (7.27 and 5.42%). In DIN, the odds ratio of IHD was 1.37 (95% CI 1.30–1.44) in subjects living in “striving” compared to “thriving” areas. The prevalence of treated hay fever prevalence was similar across databases, with inverse associations seen with ACORN in DIN (higher rates in “thriving” areas). Conclusions: DIN provides comparable period prevalence rates to GPRD for two common conditions, with social trends as expected. Primary care databases such as these have the potential to replace the decennial national morbidity surveys carried out in UK general practices, with DIN having the important advantage of including a socio-economic index.

Introduction

Computers have been widely used in general practice in the UK for over 10 years. Databases pooling records from several hundred practices covering several million patients have been set up based on two of the most widely used systems [1]. Of these, the General Practice Research Database (GPRD) is widely known and used whilst the Doctors’ Independent Network (DIN) database is less well known. These databases form a unique source of population-based clinical information on health service use, and the prevalence and management of diagnosed disorders. Such databases have the potential to replace the decennial national morbidity surveys in general practice which were carried out in England & Wales between 1955–1996 and 1991–1992 [2]. In particular, general practice databases have the advantage of longitudinal records and detailed prescribing data. Historically an important limitation has been the lack of any socio-economic indicator.

In order to have confidence in the results of research using these databases, it is important to understand their underlying methodologies and to validate the data contained within them. In the case of the GPRD, the basic methodology has been described [3], [4], [5] and several studies have examined the completeness of its diagnostic and prescription data [3]. There is little information available on the validation of the accuracy of the patient registers.

DIN is an on-going anonymized, computerised database from over 300 general practices that use Torex (formerly MEDITEL) software, covering over 3 million patients from 1989 onwards. There is no overlap between the practices included in DIN and GPRD. To date the main use of DIN has been by pharmaceutical companies to look at trends in prescribing. DIN is unique in having a socio-economic indicator (the “ACORN” index) linked to each patient record prior to downloading from the practice.

Our overall aim was to develop DIN as a database suitable for use in epidemiological and health service studies. In Section 2 of this paper, we describe the DIN database, the process by which data are accumulated, the selection of practices with good quality data and the linkage of a socio-economic marker based on postcode. We then validate the database by: (i) comparing the DIN population with that for England & Wales and with that of the GPRD; (ii) validate DIN against GPRD by comparing period prevalence rates for two marker conditions, hay fever and ischaemic heart disease (IHD); and (iii) validate the use of the ACORN index within DIN by looking at the period prevalence rates of hay fever and IHD in relation to this socio-economic marker within practices. A priori, we expected IHD to be higher in subjects from deprived areas, while hay fever was expected to be higher in more prosperous areas.

Section snippets

Methods

This report considers data collected from practices using Torex System 5 software, which was used by the majority of practices up to the end of 2001. However, in the final 3 years (1999–2001), several practices switched to the newer System 6000. Only System 5 data are considered in this report. The process of data download is described along with the criteria of practice and patient inclusion. The method used to correct the registration data is described and initial validation studies are then

Results

Of 326 practices that had ever contributed data to DIN, 118 had no registration data (Fig. 1). A further 49 had consistently poor linkage and of the remainder 17 practices never met our criteria for acceptable data quality, leaving 142 high quality practices. Of these, the number of practices contributing rose from 94 in 1992 to a maximum of 142 in 1998, before falling to 84 in 2001. All but one practice had at least 5 years of continuous data recording.

1,827,361 patients were fully registered

Discussion

This report has given an overview of the methods, we used to clean the DIN database to enable it to be used for epidemiological research. For the 142 practices that passed our quality control the levels of linkage of therapies to problem codes and of recording more specific diagnoses (third level of the Read hierarchy or lower) appears to be considerably higher than recently reported rates in the MEDIPLUS database which is also based on Torex practices [6]. This demonstrates the value of

Acknowledgements

Wellcome Trust Grant 065177.

References (16)

  • S.S Jick et al.

    Pregnancies and terminations after 1995 warning about third generation oral contraceptives

    Lancet

    (1998)
  • R.D Farmer et al.

    Population-based study of risk of venous thromboembolism associated with various oral contraceptives

    Lancet

    (1997)
  • R Lawrenson et al.

    Clinical information for research the use of general practice databases

    J. Pub. Health Med.

    (1999)
  • A. McCormick, D. Fleming, J. Charlton, Morbidity statistics from general practice, Forth National Study 1991–92, HMSO,...
  • J Hollowell

    The General Practice Research Database: quality of morbidity data

    Popul. Trends

    (1997)
  • H Jick et al.

    Validation of information recorded on general practitioner based computerised data resource in the United Kingdom

    BMJ

    (1991)
  • D.H Lawson et al.

    The General Practice Research Database. Scientific and Ethical Advisory Group

    QJM

    (1998)
  • S De Lusignan et al.

    Does feedback improve the quality of computerized medical records in primary care?

    J. Am. Med. Inform. Assoc.

    (2002)
There are more references available in the full text version of this article.

Cited by (47)

  • Statin use after first myocardial infarction in UK men and women from 1997 to 2006: Who started and who continued treatment?

    2012, Nutrition, Metabolism and Cardiovascular Diseases
    Citation Excerpt :

    Data were from DIN-LINK, an anonymised, computerised UK primary care database. The completeness and accuracy of DIN-LINK has been demonstrated, by comparisons with other national data sources and the practices and GPs in DIN-LINK are comparable to the practices and GPs in other GP research databases [16]. The database contains a small area socio-demographic indicator, the Index of Multiple Deprivation (IMD) [17], for most patients in England.

View all citing articles on Scopus
View full text