Papers validating a diagnosis or patient characteristic
A total of 40 papers conducted validation of a diagnosis or patient characteristic coded in the GPRD database. Most of these validation exercises involved sending a questionnaire to the patients' GPs (n = 19) and conducting independent verification of diagnoses against hospital letters or medical records in practice (n = 16). Some studies involved both sending a GP questionnaire and conducting verification against medical records (n = 5). The PPV of the accuracy of GPRD clinical codes from these validation exercises is summarised in Figure 2. The majority of papers report PPVs over 50%, and simply required patients' GPs to confirm the diagnosis on the GPRD database. Five of the seven papers reporting a PPV under 50% considered acute outcomes including drug-induced liver injury, pancreatitis, or renal failure.11–15 The studies considering acute conditions all used strict diagnostic criteria to validate cases, which included confirmation of the diagnosis via biochemical tests or specialist confirmation.
Figure 2 Forest plot reporting positive predictive values of diagnoses in the General Practice Research Database.
Studies in this review reported high PPVs over 90% for the recording of anorexia, bulimia,16 cataract,17 congenital heart defects (including ventricular septal defects, tetralogy of Fallot, and coarctation of the aorta),18 inflammatory bowel disease,19 cerebrovascular disease, diabetes, respiratory tract infection,19 Paget's disease,20 hip fracture,21 upper gastrointestinal bleeding,22 non-affective and non-organic psychosis,23 venous leg ulcer,24 and pressure ulcer.25
Recording of psoriasis,26 venous thromboembolism,27 schizophrenia,23 dementia, and Alzheimer's disease28 was relatively accurate, with PPVs between 80% and 90%. Other diagnoses, including cardiovascular events and thromboembolic disease,10,29 irritable bowel syndrome,30 chronic obstructive pulmonary disease,31 chronic atrial fibrillation,32 and cardiac arrhythmia,33 were not as accurately recorded, with reported PPVs lower than 80%.
Several papers validated the same diagnosis. Two papers considered the recording of autism in the GPRD.8,34 Both studies validated the diagnosis directly against hospital and patient medical records, and report that autism is well recorded in the GPRD. However, the electronic record was not detailed enough to provide sufficient differentiation between subtypes of pervasive developmental disorders. Acute myocardial infarction was also well recorded in the GPRD, with three studies reporting PPVs above 80%.9,35,36 There was good agreement in two papers that validated the recording of incident multiple sclerosis against hospital and medical records; however, both papers reported relatively low PPVs of around 60%.37,38
Two papers considering the validity of coding for ventricular arrhythmia reported markedly different PPVs.39,40 Using a GP questionnaire as the standard for validation resulted in a PPV of 93% (95% CI = 78 to 99%) for cases of sudden death or ventricular arrhythmia, whereas more stringent criteria requiring objective evidence of ventricular arrhythmia from specialist clinics and absence of recent angina or myocardial infarction resulted in a PPV of only 20.9% (95% CI = 13 to 31%). However, the investigators using a GP questionnaire to validate ventricular arrhythmia also report that only 23% (95% CI = 10 to 42%) of these diagnoses originated from outpatient events, which was their main outcome of interest. Two papers consider the validity of coding for rheumatoid arthritis; however, one study reported four validation categories (valid, invalid, possible, and unclassifiable) from which a PPV could not be derived.10,41
Two studies assessed the completeness of GP recording of diagnoses made by hospital consultants. In these studies, diagnoses were transcribed from the hospital discharge letter onto patients' electronic medical files in a high proportion (∼90%) of cases.42,43
Devine et al conducted a validation study using an algorithm to identify children with neural tube defects. They reported an overall PPV of 71% (95% CI = 63 to 78%); however, the PPV varied considerably according to the specific neural tube defect diagnosis.44 While anencephaly and cephalocele were generally well recorded, spina bifida was not. The Read Code algorithm used by the authors located spina bifida in the mother and not the child in 37% of cases.
One paper considered the recording of smoking status in the GPRD.45 Although current smoking is generally well recorded, former smoking is not as well recorded. Appendectomy is also under-recorded in the GPRD; in a study of ulcerative colitis, the self-reported rate of appendectomy was 13% in the random sample of patients; however, the GPRD-coded rate of appendectomy in the same study was only 3.5%.46
Three papers reported results relating to accuracy of the date of diagnoses. There were discrepancies in date recording in 45/95 (47%) of dementia cases.28 The differences were generally small, with an interquartile range (IQR) of −7 to 0 weeks. In a study of the validity of inflammatory bowel disease recording, the median difference in the first date reported by the GP and the first inflammatory bowel disease diagnosis in the electronic record was −8 days (IQR = −81 to 0 days). However, for 33 of the 53 patients included in the study, the first recorded diagnosis of inflammatory bowel disease in the electronic record was within 30 days of the date reported by the GP.19 Recording of the date of acute myocardial infarction was also generally reliable; only 31/201 (15%) of confirmed cases had a GP-reported date that was inconsistent with the electronic record. The differences in dates were generally small; 28/31 (90%) of the GP-reported dates were within 15 days of the date in the electronic record.35
The GPRD compared with other databases or statistics
In total, 12 papers compared GPRD database prevalence or consultation rates with other primary care databases or national statistics registers. All compared the GPRD with UK data, except for one comparison with a US-based database.47
Three papers compared GPRD consultation or prevalence rates against the Morbidity Statistics from General Practice 1991–92 (MSGP4), a UK-wide survey of consultation patterns in primary care.48–50 The MSGP4 itself has been evaluated;48 96% of all consultations in the GP surgery were recorded, suggesting it contains good-quality data on consultation patterns in the UK. There was good agreement between the GPRD and MSGP4 for 11 common respiratory conditions. However, consultation rates and prevalence of diabetes and musculoskeletal conditions were underestimated in the GPRD compared with the MSGP4.49,50
Three studies compared the GPRD with the Doctors' Independent Network (DIN), a UK-based primary care database that has been collecting routine data from over 300 practices distinct from the GPRD since 1989.51–53 Generally, there was good agreement between the two databases for common childhood conditions, hay fever, ischaemic heart disease, and prescribing for skin emollients.
The six remaining papers compare the GPRD with a variety of other primary and secondary care databases. Three papers report similar rates of disease among the GPRD and other databases. The UK-based MediPlus primary care database, which covers about 150 practices across the UK, provided similar crude incidence rates to the GPRD for venous thromboembolic disease.54 Derby et al compared rates of suicide in the GPRD to a US-based database held by the Group Health Cooperative, and report that the overall rate of suicide among users of antidepressants was similar to the rate in the Group Health Cooperative.47 A comparison of the GPRD with the Hospital Episodes Statistics demonstrates comparable overall and age-specific incidence of Guillain-Barré syndrome.55
There were some differences between disease coding in the GPRD and other datasets. A comparison of the GPRD and the Living in Britain National Household Survey from 1996 suggests that current smoking rates in the GPRD are 79% of the expected rate. The rates for ex-smokers were substantially underestimated; the GPRD rate for ex-smoking was 29% of what was expected according to the National Household Survey.45 Frischer et al describe under-reporting of drug misuse recording in the West Midlands Regional Drug Misuse Database compared to the GPRD.56 The prevalence of congenital heart defects was higher in the GPRD than in the National Congenital Anomaly System.57 The same authors also validated heart defect diagnoses using a GP questionnaire, and reported an overall PPV of 0.935, suggesting that the GPRD is a good source of information on congenital heart defects.18