Abstract
Background There are international differences in the epidemiology of depression and the performance of primary care physicians but the factors underlying these national differences are uncertain.
Aim To examine the international variability in diagnostic performance of primary care physicians when diagnosing depression in primary care.
Design of study A meta-analysis of unassisted clinical diagnoses against semi-structured interviews.
Method A systematic literature search, critical appraisal, and pooled analysis were conducted and 25 international studies were identified involving 8917 individuals. A minimum of three independent studies per country were required to aid extrapolation.
Results Clinicians in the Netherlands performed best at case finding (the ability to rule in cases of depression with minimal false positives) (AUC+ 0.735) and this was statistically significantly better than the ability of clinicians in Australia (AUC+ 0.622) and the US (AUC+ 0.653), who were the worst performers. Clinicians from Italy had intermediate case-finding abilities. Regarding screening (the ability to rule out cases of no depression with minimal false negatives) there were no strong differences. Looking at overall accuracy, primary care physicians in Italy and the Netherlands were most successful in their diagnoses and physicians from the US and Australia least successful (83.5%, 81.9%, 74.3%, and 67.0%, respectively). GPs in the UK appeared to have the lowest ability to detect depression, as a proportion of all cases of depression (45.6%; 95% CI = 27.7% to 64.2%). Several factors influenced detection accuracy including: collecting data on clinical outcomes; routinely comparing the clinical performance of staff; working in small practices; and having long waits to see a specialist.
Conclusion Assuming these differences are representative, there appear to be international variations in the ability of primary care physicians to diagnose depression, but little differences in screening success. These might be explained by organisational factors.
INTRODUCTION
Depression is one of the most common mental-health problems worldwide.1–3 The study on Psychological Problems in General Health Care (PPGHC), conducted across 14 countries, found that 14% of primary care attendees suffered from major depression;4 a more recent study in six European countries found a lower rate of 8.5% in men.5 Across all settings, best estimates for major depressive disorder are a 1-year prevalence rate of 4.1% (95% confidence interval [CI] = 2.4% to 6.2%), a lifetime risk of 6.7% (95%CI = 4.2% to 10.1%), and an incidence rate of about 9% over 12 months.6
However, it should be noted that there are international differences in the epidemiology of depression, especially when comparing developing (low-income) with developed (high-income) societies.2,7–11 There are also cultural differences in the expression of mental disorders, regardless of national boundaries.12,13 Such differences may impact on the phenomenology, detection, and treatment of depression.14
Regarding symptoms, in some cultures conventional concepts of depression taken from the International Classification of Diseases and the Diagnostic and Statistical Manual of Mental Disorders may not hold.15,16 For example, low mood may not be a universal core feature of the disorder;17 similarly, rates of somatic symptoms as a presenting complaint of depression vary considerably.18,19
How this fits in
It is known that there are international differences in the epidemiology and phenomenology of depression. This study shows that there are also international differences in the case-finding performance of primary care physicians and that these might be partly explained by organisational factors. These appear to include collecting routine outcome data, comparing the clinical performance of staff and working autonomously.
Regarding treatment differences, a number of large international studies reveal low treatment rates in less-developed countries (between 3% and 23%), although this is dependent on prior diagnostic success.10,20 In developed countries about one-quarter to one-third of those with mental illness receive no treatment.21
Regarding diagnostic differences, typical diagnostic sensitivity and specificity among primary care physicians has been previously reported but Mitchell et al did not fully examine national differences.22 However, there is reason to suspect that variations exist because the World Health Organization (WHO) PPGHC study found considerable cross-national differences. Rates of identification (diagnostic sensitivity) ranged from a low of 19.3% in Nagasaki, Japan to 74.0% in Santiago de Chile, Chile.23,24 In a related report from the WHO study, Munitz et al reported on diagnostic rates of depression in a subset of 1199 patients in six countries.25 They found substantial differences and suggested that these might result from difficulties conceptualising depression by the physician and not necessarily differences in clinical presentation. For example, many clinicians in other countries may not consider depression to be categorical and may prefer a continuum of mood change.26
One unresolved question is whether such differences are amenable to change; in addition there is also speculation regarding whether scales or tools contribute to improved identification of depression in primary care. Although the validity of depression scales has been extensively studied in high-income countries,27 it has been investigated less in low and middle-income countries with some data from India,28,29 Ethiopia,30 Burkina Faso,31 Chile,32 and Brazil.33 No research to date has shown high uptake of such tools in routine care. Further, well-designed studies demonstrating beneficial patient outcomes are lacking (with no data in low and middle-income countries) even though research supports application by non-physicians and community health workers.34,35
Despite the strengths of the WHO's PPGHC study, its major limitation is that only diagnostic sensitivity was reported and no information on specificity or overall accuracy was recorded. Therefore, this study's aim was to summarise international rates of recognition of depression by pooling smaller-scale studies that incorporated both sensitivity and specificity. This was limited to high-income countries as there were no qualifying studies in low or middle-income settings.
The second aim was to examine service-level predictors of accuracy that might explain such variations. For the purposes of this review, case identification was defined as the application of a tool to identify (rule in) individuals with the index disorder. Screening was defined as the systematic application of a tool to rule out individuals without the index disorder.
METHOD
Inclusion and exclusion criteria
Diagnoses of depression by clinicians from high-income countries were examined as there were no available studies reporting specificity in low or middle-income settings. In 2006, the World Bank considered 60 countries – including the UK, Australia, Italy, the Netherlands, and the US – to have high-income economies (defined as a gross national income per capita in 2006 of ≥US$11 116).36
The principle inclusion criteria were studies that examined the diagnostic accuracy of primary care physicians' clinical ability to detect depression, defined by semi-structured interview. Studies relying on mood questionnaires alone were excluded as such methods are not accepted as an adequate criterion standard. In order to attempt to gain a representative sample, a minimum of three methodologically similar independent studies were required from any individual country to enter the meta-analysis. In all cases it was required that case ascertainment was conducted by contemporaneous interview or questionnaire. Studies employing case ascertainment by casenote method (chart review) were excluded. Studies were not restricted on the basis of age of the recruited patients.
Search and critical appraisal
This study's methods have been previously reported.22 In brief, a systematic literature search, critical appraisal, and pooled analysis were conducted. The abstract databases of Medline/Pubmed, PsycINFO, and Embase were searched from inception to September 2009. In the full-text collections of Science Direct, Ingenta Select, Ovid Full text, and Wiley-Blackwell Interscience, the same search terms were used but as a full-text search and citation search. The abstract databases SCOPUS and Web of Knowledge (4.1, Institute for Scientific Information) were searched using key papers in a reverse citation search. Data were extracted using a standardised spreadsheet by two authors and re-examined by a further author independently.
Meta-analysis and meta-regression
In order to account for sample-size variations a meta-analytic weighted rate was calculated for sensitivity and specificity.37 Where heterogeneity was moderate to high, a random effects meta-analysis was performed using StatsDirect (version 2.6.2). A Bayesian plot of conditional probabilities that converts hypothetical sensitivity and specificity into interpretable conditional post-test probabilities from all pre-test probabilities was also constructed.38,39 The area under the Bayesian positive curve (AUC+) allows statistical comparison of rule-in success and the area above the negative curve (AUC–) allows statistical comparison of rule-out success, without interference from prevalence variations. These can be calculated simply using Microsoft Excel®.40
A meta-regression that examines the significance of predictor variables that might explain various types of diagnostic accuracy was performed. Predictor variables (Appendix 1) were chosen from a large 2009 survey of primary care practices across 11 countries and involving 10 320 primary care physicians.41
RESULTS
Study description and methods
Twenty-five international studies were identified from five high-income countries, involving 8917 individuals. There were four studies from Australia,42–45 four from Italy,46–49 six from the Netherlands,50–55 three from the UK,56–58 and eight from the US,59–66 all reporting diagnostic sensitivity (Table 1).
Prevalence
As expected, prevalence varied by country with the highest rates of depression occurring in Italy (27.4% 95% CI = 17.5% to 38.5%) and the Netherlands (22.7%; 95% CI = 12.5% to 34.9%). Lower rates were recorded in the UK (15.6%; 95% CI = 3.2% to 34.9%) and the US (12.5%; 95% CI = 7.4% to 18.7%), with the lowest rate in Australia (10.9%; 95% CI = 6.4% to 16.3%).
Diagnostic sensitivity and specificity
The highest rates for clinicians' diagnostic sensitivity was seen in Italy 64.0% (95% CI = 43.6% to 82.1%) but this was not statistically significantly greater than the rates of other countries. Moderate accuracy was found in Australia (59.1%; 95% CI = 42.4% to 74.7%) and the Netherlands (52.5%; 95% CI = 36.2% to 68.6%). Lowest sensitivity was recorded by clinicians in the US (49.2%; 95% CI = 37.6% to 60.7%) and the UK (45.6%; 95% CI 27.7% to 64.2%).
Regarding the ability to reassure those without depression the highest accuracy came from clinicians in the Netherlands (88.5%; 95% CI = 81.6% to 93.9%) but this was not statistically significantly greater than that of other countries. Fair diagnostic specificity was also recorded in studies from the US (81.1%; 95% CI = 75.6% to 86.0%) and Italy (79.3%; 95% CI = 55.5% to 95.5%). Lowest accuracy was present in Australia (71.9%; 95% CI 57.3% to 84.4%). No results were reported from the UK.
Bayesian cross–national comparison of accuracy
After converting sensitivity and specificity into rule-in and rule-out accuracy, clinicians in the Netherlands performed best at case finding (AUC+ 0.735; 95% CI = 0.698 to 0.772). This was statistically superior to clinicians from both Australia (AUC+ 0.622; 95% CI = 0.569 to 0.674) and US (AUC+ 0.653; 95% CI = 0.624 to 0.682). Clinicians from Italy had intermediate case-finding abilities (AUC+ 0.678; 95% CI = 0.655 to 0.702) (Figure 1).
Figure 1 Plot of conditional probabilities for crossnational differences in rulein and rule-out accuracy of depression by GPs.
Regarding screening (the ability to rule-out cases of no depression with minimal false negatives), there were no strong differences between clinicians, although those from Italy performed best (AUC– 0.628; 95% CI = 0.604 to 0.653) and those from the US worst (AUC– 0.577; 95% CI = 0.547 to 0.606). Clinicians from the Netherlands (AUC– 0.602; 95% CI = 0.561 to 0. 0.642) and Australia (AUC– 0.593; 95% CI = 0.540 to 0.645) occupied a middle ground. No results were reported from the UK. Combining both rule-in and rule-out accuracy (combined AUC) showed that clinicians from the Netherlands and Italy were most accurate in their diagnoses.
Comparison of overall true positive and true negative performance (fraction correct statistic) confirmed that significantly higher performance was seen in Italy (83.5%; 95% CI = 82.0% to 84.9%) and the Netherlands (81.9%; 95% CI = 80.2% to 83.5%) as compared with the US (74.3%; 95% CI = 70.9% to 77.4%) and Australia (67.0%; 95% CI = 64.7% to 69.3%). No results were reported from the UK. Data from a single study in an adolescent sample56 hinted at low overall accuracy in the UK (Table 1).
Table 1 Studies of primary care physicians' ability to diagnose depression against psychiatric interviews.
Predictors of diagnostic accuracy
Sensitivity and specificity
There were no statistically significant predictors of diagnostic sensitivity, although long waiting times to see a specialist were linked to sensitivity with a B coefficient of 0.31 and a trend of P = 0.061. Two variables influenced detection specificity. These were ‘clinical outcomes’ (B coefficient of 0.35, P = 0.003) and ‘clinicians compared’ (B = 0.16, P = 0.02).
Rule-in and rule-out accuracy
Two variables were linked with rule-in performance (AUC+): ‘difficulty ordering tests’ (B = –0.10, P = 0.07) and ‘small practices’ (B = 0.24, P = 0.02). Two variables were linked with rule-out performance (AUC–) namely ‘difficulty ordering tests’ (B = –0.12, P = 0.06) and ‘long waits’ (B = 0.18, P = 0.03).
DISCUSSION
Summary of main findings
This research employed studies with similar case-ascertainment and criterion standards to enhance comparability. After examining diagnostic sensitivity, a substantial difference was found in that clinicians in Italy identified almost two-thirds of depressed individuals correctly (64.0%; 95% CI = 43.6% to 82.1%) whereas those in UK and US identified less than half of depressed individuals presenting in primary care. However, due to limitations in sample size (particularly in Italy and UK-based studies) this was not statistically significant. It is possible that these differences in diagnoses related to variations in prevalence, as rates of depression were also highest in Italy and relatively low in the UK and the US. However, after weighting for sample size and adjusting for prevalence it was found that clinicians in the Netherlands performed best at depression case finding (the ability to rule-in depressed cases with minimal false positives) and this was significantly better than the ability of clinicians in Australia and the US. Clinicians from Italy had intermediate case-finding abilities. Regarding screening (the ability to rule-out non-depressed cases with minimal false negatives) there were no strong differences between clinicians as each national group performed similarly in this regard. Taking all correct classifications together, the overall accuracy of primary care physicians in Netherlands and Italy was superior to the accuracy of clinicians from the US and Australia.
Limitations of the study
This analysis has several important limitations. It is difficult to ensure the diagnostic rates are representative of practices across an entire country. To address this, a minimum of three methodologically similar studies were required from any individual country to enter the meta-analysis; however, there is no guarantee that practice from these studies is representative of the country as a whole. In fact, looking at national differences, there was considerable variability even within individual countries; for example, a sensitivity of between 26.1% and 78.8% in the Netherlands (unadjusted national differences without allowance for prevalence variations).
It was required that each study had to share the same method of case ascertainment, that is prospective physician opinion by interview or questionnaire, but clinicians were asked subtly different questions about their clinical opinion and this could influence results. Studies based on casenote methods, which often generate substantially different results,67 were excluded as were studies with a significant training component. In fact, only one casenote study was found that had a sufficient sample and, thus, no cross-national comparisons based on casenotes could be made.
A further limitation was the inclusion of studies based on patients of any age. In studies from Australia three out of four involved older people, whereas in other settings these were in the minority. Also, inter-rater reliability statistics were not calculated.
A final limitation was that it was not possible to sufficiently examine all possible predictors of recognition such as healthcare organisation, physician payment system, clinician workload, or catchment area. This was disappointing but these factors had not been adequately recorded.
Comparison with existing literature
Previously, one international study found significant differences in recognition according to country of study. High rates of diagnostic sensitivity (≥50%) were found in Manchester, Paris, Santiago, Seattle, and Verona, and low rates (≤20%) in Ankara, Athens, Ibadan, Nagasaki, and Shanghai.23 The current study used an entirely different approach to the large interview-based study reported by Simon et al.23 Data were pooled from multiple studies of physician practices in five countries (four in the case of specificity). Studies with similar case-ascertainment and criterion standards were used to enhance comparability.
What factors might positively influence low detection rates in some countries? This meta-regression compared 28 nationally-representative service-related factors that might influence diagnostic sensitivity and specificity. It was found that, if a practice routinely reviews data on clinical outcomes and if a practice routinely compared the clinical performance of staff with other practices, diagnostic specificity appeared to be higher. This is coherent with a model that suggests that increased performance monitoring may improve clinical performance.
It was found that working in small practices (of less than two full-time equivalents) influenced case-finding ability as did ease of ordering specialised diagnostic tests. Ease of ordering tests also influenced screening accuracy as did GPs reporting long waits to see a specialist. The combination of small practices and poor access to specialist services is interesting as it may force practitioners to be more self-reliant and perhaps improve continuity of care. One potentially important factor – whether practices routinely used written guidance for depression – was not statistically significant.
There have been few direct observation studies on the accuracy of clinicians stratified by national or cultural groups. Leo et al found few differences when comparing diagnoses for Caucasian and African–Americans patients.68 Yet there are acknowledged differences in phenomenology between these groups.69–71 It is possible that depression presents differently in some countries. For example, it is known that in some cultures Westernised concepts from ICD and DSM may not apply.15,16 It has been reported that somatic symptoms are the commonest presenting features of depression in high income countries.72,73 but against this Chang et al found Koreans were more likely to express the symptoms like ‘low energy’ and ‘concentration difficulty’, and less like ‘depressed mood’ and ‘thoughts of death’ during an episode of major depressive disorder compared with a US population.74 Similarly there are also variations in the classic psychological symptoms of depression.75–77
Beyond diagnosis, previous studies have found important variations in delivery of care. Many authors have commented on national differences in management of depression.20,78–80 In Europe and the US, 74% and 67% of those with mental illness, respectively, receive no adequate treatment.21,24 In the European Study of Epidemiology of Mental Disorders project conducted in Belgium, France, Germany, Italy, the Netherlands and Spain,24 6% of the sample was defined as being in need of mental health care but 48% of these participants reported no formal healthcare use. After initiating treatment, between 70% (Germany) and 95% (Italy) receive some kind of follow-up care.81 Recently these inequalities have been recognised and considerable effort has been made in the UK, Australia, and Canada to increase the efficiency of mental health care in primary care settings.82–84
Implications for future research
This study shows that diagnostic sensitivity across clinicians in high-income countries varies but variation is less than previously recorded between high-income and low-income settings. Interestingly, there were no appreciable differences in diagnostic specificity. Primary care physicians in the Netherlands and Italy were most successful in their diagnoses and those in the US, Australia (and perhaps the UK) were least successful. Factors that enhanced detection included: access to better healthcare resources but poorer access to hospital specialists; working in small practices; having routine review of clinical outcomes; and routine review of clinical performance of staff. Further investigation might reveal whether these organisation factors can be exported into countries where clinicians are less successful at identifying depression in primary care.