The article by Khan and colleagues1 highlights the strength of the General Practice Research Database (GPRD) as a research-quality database providing accurate diagnostic data to researchers on a wide range of conditions, and for millions of patients. While the search strategy for this study was broad and inclusive of prescription data, procedures, and smoking in addition to diagnoses, the authors did not identify as many articles as expected.
We published a similar systematic review of the validity of diagnoses in the GPRD2 and found over 200 relevant publications, compared to the 49 articles identified in this study. There are two explanations for this difference. First, many validations were not mentioned in the title, abstract, or keywords of the articles and we therefore broadened our search to all studies using GPRD data. Second, our review included studies that validated diagnoses using algorithms, manual review of electronic records, and sensitivity analysis in addition to those methods included by Khan et al. Despite these differences in scope, our results were broadly similar and showed high validity of GPRD diagnoses, with a median positive predictive value across diagnoses of 89% (range 24–100%).
While our study was larger, Khan and colleagues assessed one important aspect of validity that we did not: the accuracy in timing of diagnoses. For some research questions this may be of little interest but in studies investigating triggers of acute conditions or assessing direct toxic effects of pharmacological agents, timing is important and inconsistencies in the accuracy of an event date could be cause for concern.
In acute conditions (for example, myocardial infarction) with definite event dates, differences between the electronic record and the GP's own notes have a straightforward interpretation of simple errors in the recording of the date. However, when validating the timing of non-acute conditions the authors of validation studies should state whether the GP was asked to provide the date when the index of suspicion was first raised or the date of a definite diagnosis, to enable interpretation of any differences.
The relative lack of data on this aspect of validation and the resulting uncertainty in the timing of acute events highlight the benefits of linkage of the GPRD with other datasets. As discussed in both of our papers, linkage to disease registries could bring additional information with which to validate the diagnosis and its timing. For some conditions this may negate the need to obtain additional information from the GPs that, as we both point out, is expensive and limits the number of patients validated to a selected and potentially unrepresentative group.
- © British Journal of General Practice, January 2011