We would like to thank Djasmo, Echteld, and Spee for their well-founded and insightful comments on our report on the external validation of the MHS.1
Regarding the reference standard, Djasmo et al point out that the results of our study may be biased since the expert panel establishing the reference diagnosis was not blinded to the results of the MHS, that they assume that blinding without loss of data would have been possible, and that a kappa of 0.62 does not indicate a substantial agreement. Regarding the latter, several authors suggested that a kappa between 0.6 and 0.8 indicates a substantial agreement.2,3 However, we do not have the primary intention to discuss the appropriateness of such threshold recommendations. We think that the main message is that the agreement was not perfect and that this indicated a difference between the blinded and the unblinded reference panel. But it is important to state that the lack of total agreement did not necessarily mean that the blinded reference panel made the more accurate decision. A reference panel blinded to the items of the MHS would have had to make a decision without knowledge of the sex, age, history of CHD, if pain had depended on effort, or if it had been reproducible by palpation. We found it reasonable to assume that, especially in cases in which only data of the telephone follow-up were available, lack of these data may result in a less accurate decision and a misclassification bias. In the end we had to weigh the risk of a bias introduced by a lack of blinding against a risk of misclassification bias. Based on our practical experience with this kind of reference standard we estimated the latter as higher, but we acknowledge this limitation.
Regarding the accuracy of the MHS, Djasmo et al state implicitly that missing one in 50 patients with CHD may be too high and they state explicitly that missing four out of 21 with acute coronary syndrome (ACS) is too high. Regarding the first point we suppose that the predictive values present the most informative measures from a clinical point of view since they account for the prevalence of the target disease in the respective setting. Increasing the sensitivity would substantially decrease the positive predictive value, especially in a low prevalence setting. However, we must state that the accuracy of the MHS regarding the diagnostic outcome, ACS is lower than in regards to the outcome myocardial ischaemia. We also agree that this fact deserves more attention. Diagnosis of ACS remains a major challenge in primary care since patients often present in an early stage and specific tests (for example, biomarkers) lack sensitivity.4,–,6 Different, parallel approaches may be necessary and may already be used by GPs in clinical practice. In one approach the GP could ask in a first step if chest pain is caused by myocardial ischaemia and, if the answer is yes, decide on a second step if the situation should be classified as ‘acute’ or ‘stable’. In another approach GPs may ask in every patient with chest pain if there are any red flags indicative for ACS or other conditions requiring urgent admission to hospital. While the MHS aims to support the first approach it does not substitute the second.
Lastly, Djasmo et al state that our study does not prove that the MHS performs better than GPs’ own judgment based on common practice. We agree that such a comparison would be an important step in the evaluation of the MHS. Even more interesting would be an impact study investigating the effect of using the MHS on outcomes relevant to patients, like mortality. However, the primary aim of our study was to test the robustness of the MHS. We are currently working on an analysis comparing different diagnostic strategies based on the MHS, GPs’ assessments, and combinations of both. Since this will be a secondary analysis and since the study was not powered to answer these questions, results will be explanatory. A major concern in future studies will be the sample size. Let us assume that the sensitivity of GPs’ assessments would be 85% and that an increase in sensitivity of 5% would be judged as clinically relevant. Based on these assumptions, and the prevalence of CHD in primary care, a sample size of about 6000 patients would be necessary to compare these two diagnostic tests in an adequately powered study using a paired design.7 Necessary sample sizes for impact studies using outcomes relevant to patients may even be higher. We are not sure if these studies will ever be conducted and assume that recommendations must be based on the limited evidence we have so far.
- © British Journal of General Practice 2012