Study characteristics
The PRISMA flow diagram detailing the selection process for the 60 articles included in the systematic review25–84 is given in Figure 1. Thirty-one of the 60 (51.7%) identified studies met the threshold of ‘high quality’.31,33,35,38–40,42,43,45,47–49,51–56,58–63,65,70,72–75,83 Of these studies, 74.2% (n = 23/31) reported the number of GPs that had high or moderate burnout along ≥1 of the burnout subcomponents (emotional exhaustion, depersonalisation, and personal accomplishment) and overall burnout; 58.1% (n = 18/31) reported mean and standard deviation estimates for ≥1 of the burnout subcomponents (data not shown).
Figure 1. PRISMA flow diagram on identification and selection of articles. an-values are greater than 31 because studies can report burnout as both a dichotomous and continuous variable. MBI = Maslach Burnout Inventory — Human Services Survey.
Supplementary Appendix S6 provides a description of selected demographic data extracted from the 60 included studies in this review; burnout cut-offs, mean, and proportion estimates are provided in Supplementary Appendices S7 and S8. Estimates are provided separately for male and female GPs if they are reported in the respective study.
Study time periods ranged from 1987 to 2020, comprising data from 22 177 GPs across 29 countries spanning five continents. The majority of these studies (70.0%, n = 42/60) were conducted in Europe, 18.3% (n = 11/60) were conducted in Asia, with the remaining studies conducted in the following three continents: Africa 1.7% (n = 1/60), North America 3.3% (n = 2/60), and Oceania 6.7% (n = 4/60). Where a study was conducted over different time periods, data for the earliest period were extracted (Supplementary Appendix S6). Most of the studies (70.0%, n = 42/60) used the 22-item version of the MBI-HSS (Supplementary Appendix S6).
The studies predominantly used the following standard cut-offs19 to denote high burnout for the three burnout subscales: emotional exhaustion ≥27 (38.3%, n = 23/60), depersonalisation ≥10 (30.0%, n = 18/60), and personal accomplishment ≤33 (28.3%, n = 17/60) (Supplementary Appendix S7). As for high overall burnout, the studies (28.3%, n = 17/60) generally used the following criteria: high emotional exhaustion and depersonalisation, and low personal accomplishment.
The reported findings collectively show that there is wide variation in the demographic data, as well as burnout cut-offs and estimates, extracted from the studies included in the review. Selected demographic characteristics reported in the 31 high-quality studies are provided in Supplementary Appendix S9. The heterogeneity in demographic and burnout data observed for the 60 included studies remained for the higher-quality 31 studies included in the meta-analysis. However, the ranges of the burnout estimates reported in these studies are considerably narrower than those reported for all 60 studies.
Pooled results
Figure 2 reports the pooled random-effect mean estimates using continuous data based on the scores obtained for the difference burnout subscales: 16.43 (95% confidence interval [CI] = 13.57 to 19.29; I2 = 100.0%; P≤0.001) for emotional exhaustion; 6.74 (95% CI = 5.29 to 8.18; I2 = 99.8%; P≤0.001) for depersonalisation; and 29.28 (95% CI = 23.61 to 34.96; I2 = 100.0%; P≤0.001) for personal accomplishment.
Figure 2. Meta-analysis of GP burnout using continuous data: a) emotional exhaustion; b) depersonalisation; and c) personal accomplishment. Weights are from random-effects analysis.a
aAdditionally, while the 31 studies comprised the total set of studies on which the meta-analysis was conducted across all dimensions of burnout, some types of estimates were not reported in some studies. Some studies reported only proportions and/or percentages whereas others reported only mean estimates, and yet others reported both proportions and mean estimates. The total number of studies is 31, which would be reflected by all the studies captured in Figure 2 and also Supplementary Appendix S11. ES = mean score.
These estimates denote moderate levels of burnout for emotional exhaustion and depersonalisation, and a high level of burnout for personal accomplishment, based on standard burnout cut-offs for these subscales, indicating significant levels of burnout among GPs. As evident in the high I2 (>99%), there is considerable heterogeneity across studies. Supplementary Appendix S10 shows that the mean burnout estimates for the different burnout subscales varied depending on the country’s geographical region (P-value for heterogeneity ≤0.001). Meta regressions results showed that the continent in which studies were conducted had no effect on variation in mean burnout estimates across studies. There were insufficient observations within subgroups to conduct meta regressions for country. Overall, there was no evidence that the geographical region influenced variation in mean burnout estimates across studies.
Studies reported the following pooled prevalence estimates for GPs that exceeded the threshold for high or moderate burnout (Supplementary Appendix S11): high emotional exhaustion 32% (95% CI = 26 to 39; I2 = 97.95%; P≤0.001); high depersonalisation 31% (95% CI = 19 to 43; I2 = 99.49%; P≤0.001); low personal accomplishment 27% (95% CI = 22 to 32; I2 = 96.86%; P≤0.001); high overall burnout 6% (95% CI = 4 to 9; I2 = 95.42%; P≤0.001); moderate emotional exhaustion 28% (95% CI = 22 to 35; I2 = 95.79%; P≤0.001); moderate depersonalisation 23% (95% CI = 15 to 31; I2 = 97.55%; P≤0.001); moderate personal accomplishment 33% (95% CI = 22 to 44; I2 = 98.51%; P≤0.001); and moderate overall burnout 32% (95% CI = 19 to 44; I2 = 99.40%; P≤0.001).
As evident in the high I2 (>95%), there is considerable heterogeneity across studies. The results (in Supplementary Appendix S10) of subgroup analyses conducted with at least 10 studies to investigate this heterogeneity show that the prevalence of burnout dimensions varied depending on the country’s geographical region and cut-off for moderate overall burnout (P-value for heterogeneity ≤0.001). Although some covariates were dropped because of collinearity, meta regressions conducted using the metareg command showed that the continent in which the studies were conducted was generally not an important determinant of high or moderate burnout (P>0.20); however, high depersonalisation was significantly lower in Europe (regression coefficient −0.565; 95% CI = −0.768 to −0.362; P≤0.001) and North America (regression coefficient −0.354; 95% CI = −0.646 to −0.063; P≤0.001) compared with Asia, and moderate overall burnout was significantly lower in Europe (regression coefficient −0.424; 95% CI = −0.803 to −0.046; P = 0.03) compared with Asia.
Taken together, the findings indicate that, although the continent in which the studies were conducted is not a robust determinant of GP burnout across studies, there is some evidence that GP burnout is lower in Europe and higher in Asia.
The subgroup analysis by country revealed that the country the study was conducted in did not influence high emotional exhaustion; high depersonalisation was significantly higher in China (regression coefficient 0.543; 95% CI = 0.386 to 0.700; P≤0.001) than in the other countries included in the meta regression; low personal accomplishment was significantly higher in China (regression coefficient 0.213; 95% CI = 0.088 to 0.339; P = 0.01), Denmark (regression coefficient 0.220; 95% CI = 0.117 to 0.324; P≤0.001), and England (regression coefficient 0.211; 95% CI = 0.080 to 0.341; P = 0.01) than in other countries. Overall, there is some evidence that GPs from China experienced higher depersonalisation than GPs from other countries (Supplementary Appendix S10).
In addition, overall, there was high residual heterogeneity for high burnout (≥95% for continent and ≥70% for country) and moderate burnout (≥84% for continent) There was no residual heterogeneity (0.00%) and high explained between-study variance for the cut-off for moderate overall burnout (adjusted R 2 99.93%), indicating that this cut-off may be an important determinant of heterogeneity in moderate overall burnout estimates across studies. The findings also reveal that less restrictive burnout criteria used in the studies are associated with higher GP burnout prevalence. For example, the more restrictive criteria for moderate overall burnout used in the studies of high emotional exhaustion and/or high depersonalisation have a smaller regression coefficient of 0.170 compared with the less restrictive criteria of high emotional exhaustion and/or high depersonalisation and/or low personal accomplishment, which has a regression coefficient of 0.355 (Supplementary Appendix S10).
Tests of publication bias via funnel plots85 and Egger tests86 were conducted and results provided in Supplementary Appendix S12. The results provide no evidence of publication bias using the dichotomous data. Visual inspection of the funnel plots showed no asymmetry in all distributions for burnout studies. Furthermore, the Egger tests did not show significant results and thus suggested no evidence of publication bias among the studies on burnout proportions. However, Egger tests on studies using the continuous data showed some evidence of possible small-study effects, with significant results (P≤0.001) for mean emotional exhaustion, mean depersonalisation, and mean personal accomplishment.
As another sensitivity test, the meta-analysis was conducted including studies of lower quality (rated ≤6 on the JBI) that were more susceptible to risk of bias. The results (Supplementary Appendix S13) showed that the burnout estimates were similar and still displayed significant heterogeneity for all studies (including those of lower quality) as for only higher-quality studies.