Abstract
Background In 2003 the National Institute of Clinical Excellence published guidelines recommending the use of brain natriuretic peptide (BNP) and the electrocardiogram (ECG) as part of the diagnostic work up of individuals with heart failure. However, the guideline did not address whether one test was superior to the other or whether performing both tests was superior to performing single tests.
Aim To investigate the relative test accuracy of the ECG, BNP, N terminal-pro brain natriuretic peptide (NT-proBNP) and combinations of two or more tests in the diagnosis of left ventricular systolic dysfunction (LVSD) in the primary care setting.
Design of study Cohort studies making within-subject comparisons of intervention diagnostic test(s) with reference standard results.
Method Standard systematic review methodology was followed.
Results Thirty-two primary studies met the review inclusion criteria. Studies were of variable quality and highly clinically heterogeneous, therefore restricting the use of meta-analysis. Within these limitations BNP, NT-proBNP and the ECG all had similar test sensitivity (>80% in the majority of studies). Specificity of the three tests was not as good. Three studies directly comparing BNP and the ECG found no difference in sensitivity and limited support for improved specificity of BNP. Two studies found no difference in sensitivity and limited evidence for an improvement in specificity for the combination of the ECG and BNP compared to single tests.
Conclusion On the basis of existing evidence, the ECG, BNP and NT-proBNP are useful in excluding a diagnosis of LVSD (good sensitivity). However, use of abnormal test results to select individuals for echocardiography may overwhelm services. There is currently no evidence to justify the use of one test over another or the use of tests in combination. The additional cost of BNP is not self-evidently justified by improved test accuracy. Further research is needed to directly compare the diagnostic performance of these tests in homogeneous, representative primary care populations.
INTRODUCTION
The crude prevalence of chronic heart failure is estimated to be between 0.4% and 3.2%.1–10 The incidence and prevalence of heart failure in the population is rising mainly as a result of an ageing population and improved survival from the main aetiological cause, coronary heart disease.1 The direct cost of health care for heart failure patients in the UK has recently been estimated at £716 million (1.83% of total NHS expenditure); the majority of costs are a result of hospitalisations.11 Strategies to reduce hospitalisations (and particularly repeat hospitalisations) are best placed in primary care where the majority of the diagnosis and day-to-day management of heart failure occurs.
Left ventricular systolic dysfunction (LVSD) is one of the major underlying mechanisms causing chronic heart failure. However, recent evidence suggests that the drugs effective in reducing mortality and morbidity in this patient group continue to be under-prescribed and prescribed in sub-optimal doses.12,13 Diagnostic uncertainty in the primary care setting is argued to be a major cause of this inappropriate prescribing.14–17 Measurement of ventricular function is considered the reference standard for diagnosing LVSD18 due to the disparity that exists between ventricular function and associated signs and symptoms. However, primary care access to echocardiography is currently limited. The electrocardiogram (ECG), brain natriuretic peptide (BNP) and N terminal-pro brain natriuretic peptide (NT-proBP) are being promoted as tests that may be used when echocardiography is not available, or in order to pre-select individuals with a suspected diagnosis of LVSD for further investigation with echocardiography.19–30
Both the ECG and natriuretic peptides have a relatively high sensitivity (a ‘normal’ test result is good at ruling out a diagnosis of LVSD), but comparatively poor specificity with the potential to lead to considerable over-investigation of abnormal results. The National Institute of Clinical Excellence (NICE) published a guideline about the diagnosis and management of heart failure in 2003.25 The NICE Guideline is ambiguous concerning whether natriuretic peptide testing alone, the ECG alone or a combination of natriuretic peptide testing and the ECG should be performed. Although this may reflect an inability to discriminate between the diagnostic accuracy of the three tests from the literature reviewed at the time of the Guideline's development, it has the potential to encourage indiscriminate and inefficient testing strategies.
METHOD
Comprehensive ascertainment was achieved through: (1) Electronic databases: (1980–March 2004) MEDLINE, EMBASE Cochrane Library 2003 Issue 4. Terms included a range of text words and MeSH terms concerning the condition of interest (suspected LVSD), the diagnostic tests being compared (BNP, NT-proBNP and the ECG) and the process (diagnosis). (2) Conference abstracts/hand-searching: The proceedings of the British Cardiac Society Annual Conferences 1980–March 2004 and the proceedings of the British Society for Heart Failure 1998 (inception)–March 2004 were hand-searched. The American Journal of Cardiology and the Journal of the American College of Cardiology (1980–2004) (the top two cardiology journals ranked according to the frequency with which diagnostic evaluation studies are published)31 were hand-searched for relevant articles. (3) Citation searches of identified included studies and reviews.
Study selection
Explicit, pre-determined inclusion and exclusion criteria were applied to abstracts or full articles of potential relevance to the review topic. Inclusion criteria were as follows:
How this fits in
The clinical diagnosis of left ventricular systolic dysfunction (LVSD) is difficult, and misdiagnosis results in inappropriate and sub-optimal treatment of patients. The National Institute of Clinical Excellence has recently published a diagnostic algorithm for the diagnosis of LVSD in primary care. However, the algorithm does not clarify whether brain natriuretic peptide (BNP), N terminal-pro brain natriuretic peptide (NT-proBNP), the electrocardiogram or a combination of these three tests should be used routinely in the diagnosis of LVSD. This review provides a comprehensive and up-to-date synthesis of the literature concerning the role of the natriuretic peptides and the electrocardiogram in the diagnosis of LVSD.
Population. Adults suspected of having LVSD with or without comorbid conditions.
Intervention. One or more of the ECG, or the natriuretic peptides BNP and NT-proBNP.
Comparator. ‘Gold’ or reference standard for the diagnosis of LVSD (nuclear cardiology investigative techniques or 2-D echocardiography18,32–34 defining LVSD using a quantitative or qualitative measure of ejection fraction).
Study design. Cohort studies making within-subject comparisons of test results (interventions) with comparators in the same individuals.
Outcome measure. Derivation of a 2×2 diagnostic table in order to calculate test accuracy measures.
Inclusion and exclusion. See Figure 1. Abstracts and titles were initially scanned by one reviewer. Potentially relevant articles were further reviewed by at least two independent reviewers with disagreements resolved by a third reviewer according to the following exclusion criteria: studies with insufficient data to construct a 2×2 diagnostic table; studies concerned with the diagnosis of acute decompensated heart failure; studies concerned with the diagnosis of ventricular diastolic dysfunction alone; studies in which the majority of the target population had been on long-term treatment with ACE inhibitors and/or diuretics for presumed heart failure. Full details of the characteristics of included studies are presented in Supplementary Table 1.
Summary of the study inclusion and exclusion process.
Assessment of study quality and data extraction
Existing quality checklists35–37 were adapted to reflect the topic area and a pro-forma was used by two independent reviewers with disagreements resolved by a third reviewer. The criteria used for the quality assessment of included studies is outlined in Supplementary Table 2 and encompasses the domains of selection bias, verification bias, measurement bias and treatment paradox and disease progression bias. Data extraction was undertaken by two independent reviewers with uncertainty resolved by a third reviewer.
Data synthesis
Analysis was conducted using the Meta-DiSc software.38 Data concerning disease spectrum (prevalence of LVSD), the reference standard used and test accuracy measures (true positives, false positives, false negatives and true negatives) was initially extracted into a spreadsheet.
Where multiple thresholds were provided within a study the worst and best estimates of the diagnostic odds ratio (DOR) for the individual study were used to investigate heterogeneity and to derive pooled estimates of test accuracy where appropriate. This approach was used in the absence of an agreed method for pooling studies comprising a mix of single and multiple thresholds. Analysis proceeded with an investigation for the presence of diagnostic threshold39 and subsequently an investigation for other sources of heterogeneity. The a priori hypotheses were that heterogeneity in observed test accuracy is likely to be affected by differences in the prevalence of LVSD (representing the probability of having LVSD prior to testing with either the ECG or the natriuretic peptides) and according to the reference standard test employed encompassing the test (echocardiography or nuclear cardiology), whether there was a requirement for symptoms of heart failure to be present and whether measurement of ejection fraction was qualitative or quantitative.40 Heterogeneity was assessed using both χ2 and I2 statistics in order to take into account the low power of the χ2 test.41–42 Where P<0.01 and I2 was ≤50% studies were considered sufficiently homogenous to proceed with pooling to derive a summary estimate of test accuracy. In the absence of both diagnostic threshold and other sources of heterogeneity pooled sensitivities and specificities were calculated according to the Der Simonian-Laird random effects model.41 In the presence of a diagnostic threshold effect, but an apparent absence of other sources of heterogeneity, summary sensitivity and specificity43 were derived from summary ROC curves according to the Moses-Shapiro-Littenberg method.44 Where significant heterogeneity persisted despite sub-group analysis, the range of sensitivities and specificities across included studies are presented as an indication of test accuracy.
RESULTS
Number of studies
Of 4625 potentially relevant citations and abstracts, 115 needed detailed scrutiny of their whole text to make an inclusion decision. Seventy studies were excluded on the basis of full publications: 27 studies not concerned with the accuracy of the natriuretic peptides or the ECG; five not using an appropriate reference standard; 24 not concerned with a population suspected of having chronic LVSD; one comprising solely of patients on long-term treatment (diuretics and ACE inhibitors); six not including sufficient data to construct a 2×2 diagnostic table and seven erroneously using linear regression to measure agreement between a test and the reference standard.45
Twenty-nine primary studies and three posters reported in conference proceedings were included in the analysis46–77 (n = 32), of which 14 investigated the diagnostic accuracy of the ECG; 16 BNP; seven NT-proBNP, two BNP and the ECG combined and three directly comparing the ECG and BNP in the same study population. In some instances a single study investigated the diagnostic accuracy of more than one test.
Characteristics of included studies
See Supplementary Table 1. All studies were of cross-sectional design. Fourteen studies were conducted in the UK, eight in Europe and nine elsewhere.
Spectrum of study participants. The mean age of study participants ranged from 53–79 years. In 16 out of 32 studies mean age was not reported.
Eleven studies were conducted on primary care populations, eight studies on secondary care populations, five studies on patients post myocardial infarction (MI) and eight on populations constructed by a mixture of primary and secondary care physicians.
Forty-three per cent of ECG studies employed exclusion criteria that would affect diagnostic accuracy or they lacked sufficient detail on which to base a judgement. The corresponding figures for BNP were 69%; 71% for NT-proBNP and 100% for BNP and the ECG combined. As an illustration, in four out of 16 BNP studies, and one out of seven NT-proBNP studies, patients were excluded if they exhibited characteristics or morbidity common in patients with LVSD but which independently increase BNP levels (for example renal dysfunction, diuretic treatment or increasing age). This degree of selection of study samples is likely to alter test accuracy to a degree that precludes transferability to the primary care setting in practice and is reflected in part by the observed range of prevalence rates across included studies.
LVSD prevalence ranged between 4.5% and 83% in primary care settings, between 23% and 55% in secondary care settings, between 4.5% and 52% in populations constructed by a mixture of primary and secondary care physicians and between 36% and 48% in post-MI patients. The prevalence of LVSD in studies investigating the diagnostic accuracy of the ECG ranged between 4.5% and 83% (median = 24.5%); BNP 7% and 52% (median = 31.5%); NT-proBNP 6.7% and 48% (median = 27%) and ECG and BNP combined 10–52%. In two studies, LVSD prevalence was not reported.
Definition of an ‘abnormal’ test result (diagnostic threshold). ECG: five of the 14 studies investigating the diagnostic accuracy of the ECG alone or in combination with BNP relied on ECG interpretation by secondary care physicians, in four out of 14 studies ECG reporting was automated, and in one study ECG reporting was performed independently by primary and secondary care physicians. In four out of 14 studies details of ECG interpretation were not reported. The definition of an ‘abnormal’ ECG varied across studies although the majority (71%) used six or more ECG abnormalities to define ‘abnormal’.
Natriuretic peptides: the definition of ‘abnormal’ varied widely in both BNP and NT-proBNP studies and ranged between 5 pmol/L and 49 pmol/L (median 16 pmol/L) for BNP studies and 5 pmol/L and 250 pmol/L (median 31 pmol/L) for NT-proBNP studies. This variation is likely to reflect a desire on behalf of researchers to optimise test performance in each study population.
Reference (gold) standard: a range of reference standards were used for each of the ECG, BNP and NT-proBNP. These included nuclear cardiology, nuclear cardiology or echocardiography, echocardiography and symptoms of chronic heart failure, and echocardiography alone. In addition, studies using echocardiography employed a range of methods of measurement (quantitative and qualitative), and where ejection fraction was quantified studies employed a range of definitions of abnormal varying between 30% and 50%.
In summary, included studies varied widely in terms of the spectrum of patients being considered, the application of the index test and the reference test employed.
Quality assessment
Quality assessment of included studies is presented in Supplementary Table 2. Where an aspect of quality was not clear or not reported a conservative approach was taken and that quality component was assumed to be absent.
Sample size. No studies reported sample size calculations. Sample size of studies ranged between 83 and 14507 (median = 287).
Thirty-six per cent of ECG studies; 38% of BNP studies; 57% of NT-proBNP studies and one of two (50%) studies investigating the diagnostic accuracy of BNP and the ECG combined were of relatively poor quality as indicated by a total score of three out of a total of six criteria.
Selection bias. In this review selection bias is most likely to operate for the ECG (a test routinely available in contrast to the natriuretic peptides) where preferential forward referral of positive and indeterminate ECGs as opposed to normal ECG results will result in an over-estimation of sensitivity. Eighty-six per cent of the ECG studies and both of the studies investigating the accuracy of the ECG and BNP combined were judged as likely to have been affected by selection bias.
Availability of reference test result (verification bias). Overall, in 29% of ECG studies, individuals were excluded from the analysis of test accuracy because reference test results were unavailable (indeterminate or lost). Corresponding figures for BNP were 38%, NT-proBNP 43%, and for studies investigating the diagnostic accuracy of BNP and the ECG combined, 50%.
Outcome measurement. Blind measurement of outcomes (intervention and reference test results) is important to avoid overestimation of test accuracy. In 64% of ECG studies, outcome measurement was blind. The corresponding figure for BNP studies was 69%; NT-proBNP studies 71%; and for studies investigating the diagnostic accuracy of BNP and the ECG combined, 50%.
Repeatability. Seventy-nine per cent of ECG studies had methods that were judged to be repeatable. The corresponding figure for BNP was 81%, NT-proBNP 29%, and for studies investigating the diagnostic accuracy of BNP and the ECG combined, 100%.
Reliability. Twenty-one per cent of ECG studies provided estimates of reliability for either the index test and/or the reference test. The corresponding figure for BNP was 25%, NT-pro BNP 43%, and for studies investigating the diagnostic accuracy of BNP and the ECG, 0%.
Timing of diagnostic test under investigation. Left ventricular function may change over time as a result of the natural progression of disease and fluctuations in disease severity (disease progression bias) and as a result of changes in treatment (treatment paradox). For 21% of ECG studies the time interval between application of the diagnostic test under evaluation and the reference standard was greater than 1 day or unknown. The corresponding figure for BNP was 63%, for NT-pro BNP 71%, and for studies investigating the diagnostic accuracy of BNP and the ECG combined, 0%.
In summary, the quality of included studies was variable. Verification bias in the natriuretic peptide studies in particular is likely to result in an overestimation of test accuracy while selection bias in the ECG studies in particular is likely to result in an overestimation of test sensitivity.
Diagnostic test accuracy
See Table 1, Supplementary Table 3, and Supplementary Figures 1–6.
Within study comparisons of the test accuracy of the ECG and BNP.
Investigation for diagnostic threshold. Unsurprisingly, given the wide variation in threshold employed across BNP and particularly NT-proBNP studies, there was evidence of a significant threshold effect for the group of BNP studies (P<0.03) and for the group of NT-pro BNP studies (P<0.001). By contrast the group of ECG studies did not demonstrate a significant threshold effect (P<0.639), which was surprising given the range of methods of interpretation employed across included studies.
Indirect comparison of the ECG, BNP and NT-proBNP. See Supplementary Table 3 and Supplementary Figures 1–6. The sensitivity of the ECG in this review ranged from 41.5% (26.3–57.9%)62 with a corresponding specificity of 87% (78.3–93.4%)62 to 98.4% (94.2%–99.8%)72 with a corresponding specificity of 66.1% (58.6–73.0%).72 Supplementary Figure 1 shows that the majority of ECG studies demonstrate a point estimate of sensitivity >80%, while Supplementary Figure 2 indicates that the estimates of specificity appear less good and more heterogeneous, with the majority of studies demonstrating a specificity of <80%. Those studies with sensitivity estimates <80% were not remarkable with respect to any study characteristics presented in Supplementary Table 1 or in terms of study quality.
The pooled DOR for all ECG studies was highly heterogeneous (P<0.000). This heterogeneity did not decrease when sub-grouping studies according to variations in disease spectrum or variation in reference standards employed (see Supplementary Table 3). It was, therefore, not possible to derive a pooled summary estimate of sensitivity and specificity for the ECG or sub-groups of ECG studies.
The sensitivity of BNP in this review ranged from 20% (13.3–45.5%)65 with a corresponding specificity of 89% (80–93.6%)65 to 100% (86.8–100%)71 with a corresponding specificity of 47% (34–61%).71 Supplementary figures 3 and 4 show a similar pattern of test accuracy for BNP as for the ECG with the majority of BNP studies demonstrating a point estimate of sensitivity >80% with more heterogeneous and poorer estimates of specificity. Those studies with sensitivity estimates of <80% were performed on patients post MI but were not remarkable in terms of study quality.
The pooled DOR for all BNP studies was highly heterogeneous (P<0.001). Heterogeneity decreased when sub-grouping studies according to the reference standard employed (see Supplementary Table 3 for summary estimates of sensitivity and specificity). Sub-grouping studies according to disease spectrum (LVSD prevalence) did not consistently reduce heterogeneity and summary estimates are only available for an LVSD prevalence of <20%.
NT-proBNP. The sensitivity of NT-proBNP in this review ranged from 24.5%46 (13.8–38.3%) with a corresponding specificity of 95% (92.2–97%)46 to 98.1% (90.1–100%)46 with a corresponding specificity of 23% (18.7–27.7%).46 Supplementary Figures 5 and 6 show a similar pattern of test accuracy for NT-proBNP as for BNP and the ECG, with the majority of BNP studies demonstrating a point estimate of sensitivity >80% with poorer and more heterogenous estimates of specificity. Those studies with poorer sensitivity estimates were all studies with an NT-pro BNP cut-off ≥100 pmol/L but were not remarkable in terms of any other study characteristic or in terms of study quality.
The pooled DOR for all NT-pro BNP studies was highly heterogeneous (P<0.006). Heterogeneity was decreased when sub-grouping studies according to the reference standard used (Supplementary Table 3) but not according to LVSD prevalence.
Heterogeneity precluded indirect comparisons of test accuracy for the ECG, BNP and NT-proBNP. Although heterogeneity was reduced in BNP and NT-proBNP studies when grouped according to the reference standard test used, the small size of the sub-groups did not allow calculation of confidence intervals and thus it was not possible to determine if there was a significant difference in test accuracy between the two natriuretic peptide tests.
Direct comparison of the ECG with BNP (same patient population). Three studies allow the direct comparison of the ECG with one of the natriuretic peptides, BNP58,61,62 (Table 1). This eliminates sources of heterogeneity as comparisons between tests are within-study. However, each study has to be considered in isolation. Table 1 illustrates that in all three studies sensitivity did not differ between the ECG and BNP. Two of three studies directly comparing BNP and the ECG demonstrated an improved specificity with BNP and one of the studies showed no difference in specificity.
ECG combined with BNP compared to BNP alone and the ECG alone. Two studies allow comparison of the performance of the ECG and BNP combined compared to individual tests.58,61 The pooled DOR for the two studies was heterogeneous (P>0.01) and so it was not possible to drive a summary estimate of the test accuracy of the ECG and BNP combined. Both studies demonstrate an improvement in specificity for the combination of the ECG and BNP compared to the use of the ECG alone, but no improvement in sensitivity.
DISCUSSION
Clinical significance of findings
The NICE guidelines concerning the management of chronic heart failure25 advise that the ECG or one of the natriuretic peptides (BNP or NT-proBNP) or a combination of natriuretic peptide testing and ECG testing should be employed as part of the diagnostic work up for individuals with suspected chronic heart failure.
Use of single tests. This review demonstrates that the ECG, BNP and NT-proBNP have good sensitivity but that it is not possible to distinguish between them using either direct or indirect comparisons on this measure of test performance. Estimates of specificity for the three tests are more heterogenous but there is limited evidence that BNP may have superior specificity to the ECG. However, the clinical significance of any improvement in terms of reducing referrals for echocardiography has not been explored.
Use of a combination of tests. This review demonstrates that a combination of the ECG and BNP does not improve sensitivity. There is very limited evidence (two studies) for an improvement in specificity with a combination of the two tests, but again the clinical significance of this improvement has not been explored.
On the basis of existing evidence, it is, therefore, recommended that either a natriuretic peptide test (BNP or NT-proBNP) or the ECG should be used as part of the diagnostic work up of individuals with suspected chronic heart failure and that there is no evidence to justify the use of both tests. The choice of employing ECG or a natriuretic peptide will be affected by issues such as relative cost and availability. The estimated cost of an ECG is approximately £10 (personal communication, Finance Department, Northern General Hospital, Birmingham, August 2003) compared to approximately £20 for a natriuretic peptide test (personal communication, Roche Diagnostics and Bayer Diagnostics, August 2003).
Strengths and limitations of this review
This review represents an up-to-date and comprehensive review of primary research investigating the diagnostic accuracy of the natriuretic peptides (BNP and NT-proBNP) and the ECG in patients with suspected chronic heart failure. In addition, this review clearly sets out the clinical context in which existing studies of diagnostic test accuracy for these tests have been conducted and the associated problems when trying to synthesise primary research in this area as a result of clinical and methodological heterogeneity. Our chosen methods for synthesising this group of studies is based on the premise that heterogeneity in meta analysis is inevitable and should be quantified.78 Our conclusions are that the degree of heterogeneity present in all but a few small sub-groups of our included studies would mean that pooling and the production of summary test accuracy estimates would be inappropriate and misleading.
Implications for further research
A substantial and recent body of work exists around the use of the natriuretic peptides and, to a lesser extent, the ECG in the diagnostic work up of individuals with chronic heart failure. However, the research is characterised by substantial heterogeneity and an absence of direct test comparisons that reduces its use in clinical practice. In addition, there is an absence of work investigating the test performance of current normal diagnostic practice for diagnosing LVSD in the primary care setting. It is, therefore, not possible to conclude whether the addition of new testing strategies, including use of the natriuretic peptides, would lead to an improvement in referral practices. Any further work with existing primary research would require the use of individual patient data. Variations in test accuracy with changes in clinical characteristics of patients or changes in the application of the diagnostic tests, such as the operator, the reference standard employed or the threshold used to define abnormality, could then be investigated.
Alternatively, further primary research is needed to directly compare the diagnostic performance of these tests in representative primary care populations and whether their use results in improved test accuracy compared to current diagnostic practice. It could be argued that there is a need for more diagnostic tests studies to be performed in the clinical setting in which they are to be used (or are being used) and to incorporate health outcomes, impact on referral practices and cost-effectiveness. Assessing diagnostic tests in isolation and out of clinical context has limited use past the assessment of a test's potential usefulness in phase I and II diagnostic study designs.79
Supplementary Material
Acknowledgments
Thank you to Sue Bayliss (information specialist) for help with devising the search strategy.
Notes
Further reading
Since this paper was written a further piece of related work has been completed by the QIS in Scotland. See: Craig J, Bradbury I, Cummins E, et al. Health Technology Assessment report 6. The use of B-type natriuretic peptides (BNP and NT-proBNT) in the investigation of patients with suspect heart failure. Edinburgh: NHS QIS, May 2005. It can be accessed at: http://www.nhshealthquality.org/nhsqis/qis_display_findings.jsp?pContentID=2456 (accessed 7 Dec 2005.)
Supplementary information
Additional information accompanies this article at http://www.rcgp.org.uk/journal/supp/index.asp
Competing interests
The authors have stated that there are none
- Received October 22, 2003.
- Revision received March 2, 2004.
- Accepted April 13, 2005.
- © British Journal of General Practice, 2006.