Skip to main content

Main menu

  • HOME
  • ONLINE FIRST
  • CURRENT ISSUE
  • ALL ISSUES
  • AUTHORS & REVIEWERS
  • SUBSCRIBE
  • BJGP LIFE
  • MORE
    • About BJGP
    • Conference
    • Advertising
    • eLetters
    • Alerts
    • Video
    • Audio
    • Librarian information
    • Resilience
    • COVID-19 Clinical Solutions
  • RCGP
    • BJGP for RCGP members
    • BJGP Open
    • RCGP eLearning
    • InnovAIT Journal
    • Jobs and careers

User menu

  • Subscriptions
  • Alerts
  • Log in

Search

  • Advanced search
British Journal of General Practice
Intended for Healthcare Professionals
  • RCGP
    • BJGP for RCGP members
    • BJGP Open
    • RCGP eLearning
    • InnovAIT Journal
    • Jobs and careers
  • Subscriptions
  • Alerts
  • Log in
  • Follow bjgp on Twitter
  • Visit bjgp on Facebook
  • Blog
  • Listen to BJGP podcast
  • Subscribe BJGP on YouTube
Intended for Healthcare Professionals
British Journal of General Practice

Advanced Search

  • HOME
  • ONLINE FIRST
  • CURRENT ISSUE
  • ALL ISSUES
  • AUTHORS & REVIEWERS
  • SUBSCRIBE
  • BJGP LIFE
  • MORE
    • About BJGP
    • Conference
    • Advertising
    • eLetters
    • Alerts
    • Video
    • Audio
    • Librarian information
    • Resilience
    • COVID-19 Clinical Solutions
Research

Measuring depression severity in general practice: discriminatory performance of the PHQ-9, HADS-D, and BDI-II

Isobel M Cameron, Amanda Cardy, John R Crawford, Schalk W du Toit, Steven Hay, Kenneth Lawton, Kenneth Mitchell, Sumit Sharma, Shilpa Shivaprasad, Sally Winning and Ian C Reid
British Journal of General Practice 2011; 61 (588): e419-e426. DOI: https://doi.org/10.3399/bjgp11X583209
Isobel M Cameron
Roles: lecturer
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amanda Cardy
Roles: research fellow
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John R Crawford
Roles: professor of psychology
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Schalk W du Toit
Roles: specialist registrar in psychiatry
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steven Hay
Roles: staff grade psychiatrist
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kenneth Lawton
Roles: senior clinical lecturer
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kenneth Mitchell
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sumit Sharma
Roles: specialist registrar in psychiatry
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shilpa Shivaprasad
Roles: specialist registrar in psychiatry
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sally Winning
Roles: staff grade psychiatrist
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ian C Reid
Roles: professor of psychiatry
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info
  • eLetters
  • PDF
Loading

Abstract

Background The UK Quality and Outcomes Framework (QOF) rewards practices for measuring symptom severity in patients with depression, but the endorsed scales have not been comprehensively validated for this purpose.

Aim To assess the discriminatory performance of the QOF depression severity measures.

Design and setting Psychometric assessment in nine Scottish general practices.

Method Adult primary care patients diagnosed with depression were invited to participate. The HADS-D, PHQ-9, and BDI-II were assessed against the HRSD-17 interview. Discriminatory performance was determined relative to the HRSD-17 cut-offs for symptoms of at least moderate severity, as per criteria set by the American Psychiatric Association (APA) and NICE. Receiver operating characteristic curves were plotted and area under the curve (AUC), sensitivity, specificity, and likelihood ratios (LRs) calculated.

Results A total of 267 were recruited per protocol, mean age = 49.8 years (standard deviation [SD] = 14.1), 70% female, mean HRSD-17=12.6 (SD = 7.62, range = 0–34). For APA criteria, AUCs were: HADS-D = 0.84; PHQ-9 = 0.90; and BDI-II = 0.86. Optimal sensitivity and specificity were reached where HADS-D ≥9 (74%, 76%); PHQ-9 ≥12 (77%, 79%), and BDI-II ≥23 (74%, 75%). For NICE criteria: HADS-D AUC = 0.89; PHQ-9 AUC = 0.93; and BDI-II AUC = 0.90. Optimal sensitivity and specificity were reached where HADS-D ≥10 (82%, 75%), PHQ-9 ≥15 (89%, 83%), and BDI-II ≥28 (83%, 80%). LRs did not provide evidence of sufficient accuracy for clinical use.

Conclusion As selecting treatment according to depression severity is informed by an evidence base derived from trials using HRSD-17, and none of the measures tested aligned adequately with that tool, they are inappropriate for use.

  • depression
  • primary care
  • sensitivity
  • severity
  • specificity

INTRODUCTION

UK GPs are funded through the Quality and Outcomes Framework (QOF) for assessing the severity of symptoms of depression1 with one of the following:

  • the Patient Health Questionnaire 9 (PHQ-9);2

  • the Hospital Anxiety and Depression Scale (HADS), Depression Subscale (HADS-D);3 or

  • the Beck Depression Inventory, Second Edition (BDI-II).4

This initiative accords with guidelines in which different treatment options are advocated according to severity.5 Further, it has been suggested that using such tools is more reliable than relying on GPs' perceptions alone.6 Unfortunately, the widely quoted evidence base supporting differential treatment selection on the basis of severity5,7 is founded on studies using the clinician-rated, 17-item Hamilton Rating Scale for Depression (HRSD-17),8 not the scales recommended by the QOF. It is, therefore, important to determine the extent to which the QOF-endorsed scales agree with the measure used to generate the original evidence that supports using severity assessments to plan treatment, such as deciding whether or not to prescribe an antidepressant drug.

The PHQ-9 and HADS-D differ significantly in categorising depressive severity in UK,9 Swedish,10 and Australian11 studies. It comes as no surprise, therefore, to discover that practices using the PHQ-9 record a different prevalence of moderate and severe symptoms of depression than practices that use HADS-D.12 It is not known, however, whether either of the scales can categorise patients appropriately; it is known only that the scales are not equivalent and, as such, cannot both be valid in this regard.

The severity cut-offs of the PHQ-9 were pragmatically derived to be ‘simple for clinicians to remember and apply’, following which they were verified in terms of their relative associations with variables expected to increase with severity.2 As such, the scores were never derived in reference to a standard measure of severity. The validity of the PHQ-9 has tended to be considered in terms of its diagnostic accuracy, rather than the differentiation of severity.13–16

The severity cut-offs of the BDI-II were empirically derived in reference to the structured clinical interview for the Diagnostic and Statistical Manual of Mental Disorders, third edition, revised (DSM-III-R) (SCID)17 categories of mild, moderate, and severe.18 However this study by Beck et al was conducted on a sample sought entirely from a primary care site at the University of Pennsylvania, so the findings may not generalise effectively to a primary care population in the UK.

How this fits in

The UK Quality and Outcomes Framework (QOF) rewards practices for measuring the severity of symptoms of depression. Although demonstrably robust as casefinding tools, the QOF-endorsed scales have not been comprehensively validated for severity measurement. The severity categories of the QOF-endorsed depression measures do not adequately align with the severity categories of the 17-item Hamilton Rating Scale for Depression (HRSD-17) interview. Optimal cut-offs yielded likelihood ratios that were not adequate for clinical practice. As treatment is determined according to an evidence base derived from trials using HRSD-17, these QOF-endorsed scales are invalid.

The HADS-D severity cut-offs were derived from a sample recruited from general medical outpatient clinics.3 In terms of severity assessment, the scale was assessed against an unreferenced five-point scale administered by the researchers. As with the PHQ-9, examinations of the HADS-D have not tended to validate accuracy regarding the measurement of the severity of symptoms of depression.16,19–21

At present, there is an absence of objective psychometric comparison between the endorsed measures that would enable GPs to choose or reject a severity assessment tool on the basis of clinical relevance or validity. The aim of this study was to assess the discriminatory performance of the QOF-endorsed measures in categorising the severity of symptoms of depression against the HRSD-17 in primary care patients with a GP-generated diagnosis of depression.

METHOD

Participants

Patients were recruited from nine general practices in Grampian, Scotland, which were selected to yield participants with a mixed socioeconomic and urban/rural demographic. In order to be included in the study, participants had to be primary care patients aged ≥16 years old with a new or existing GP diagnosis of depression. This reflects current QOF arrangements, whereby GPs use their clinical judgement to identify depression prior to assessing severity. Individuals without the necessary spoken or written language skills to complete the questionnaires and interview were excluded.

Depression severity measures

In addition to recording demographic factors, self-complete depression severity questionnaires were applied.

HADS-D. The HADS consists of 14 items, each rated 0–3 according to the severity of difficulties experienced. Subscales for depression (HADS-D) and anxiety can be totalled, with a possible range for each of 0–21. The scores can then be interpreted as follows: mild (8–10), moderate (11–14), or severe (≥15) difficulties.

PHQ-9. The PHQ-9 consists of nine questions, rated 0–3 according to the increased frequency of difficulty experienced in each area covered. Scores, with a possible range of 0–27, are summed and can then be interpreted as follows: no depression (0), minimal (1–5), mild (6–9), moderate (10–14), moderately severe (15–19), or severe (≥20) depression.

BDI-II. The BDI-II is a depression severity questionnaire consisting of 21 items, each rated 0–3 according to severity of difficulties experienced. Scores, with a possible range of 0–63, are summed; depression can then be interpreted as minimal (0–13), mild (14–19), moderate (20–28), or severe (≥29).

Reference standard. The HRSD-17 was devised for use with patients with an existing diagnosis of depression and is intended to quantify the results of an interview assessing symptom severity. Consisting of 17 items, it was used as the ‘standard’ for depression severity measurement due to its wide use in intervention studies that have taken depression severity into account.5 The American Psychiatric Association (APA)22 and the National Institute for Health and Clinical Excellence (NICE)23 have published different severity bandings for the HRSD-17. For APA these are: none (0–7), mild (8–13), moderate (14–18), severe (19–22), and very severe (≥23); for NICE these are: none (0–7), sub-threshold (8–13), mild (14–18), moderate (19–22), severe (≥23).

NICE's Clinical Guideline 91 offers no evidence to support the new cut-offs they propose but states that the change was necessary, with the updated guideline now including sub-threshold depression due to changes to the diagnosis based on the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders.23

Procedure

The HADS-D, PHQ-9, and BDI-II were assessed against both the APA and NICE severity criteria. The GRID-Hamilton Depression Rating Scale (GRID-HAMD) schedule was used as administration and scoring are standardised in this method, which helps maximise inter-rater reliability without altering the measure's original intent.

The recruitment process is outlined in Figure 1. After giving informed consent, participants completed the QOF-endorsed questionnaires and took part in a clinical interview that was conducted by one of six psychiatrists in either a doctor's surgery or a community hospital. Prospective participants were given the choice to take part in a telephone interview if preferred. In such cases the GRID-HAMD was still used, but the psychiatrists were given published instructions for the validated telephone version of HRSD-17.24

Figure 1
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1

The recruitment process.

Six psychiatrists conducted the interviews; they were blind to the questionnaire responses. Order of administration of the interview and questionnaires was randomised, stratified by practice, to reduce any confounding by order of completion. For those randomised to receive the questionnaires first, participants were encouraged to complete them on the same day before the interview, or the day before the interview. For those randomised to receive the questionnaires after the interview, participants were encouraged to complete them on the same day after the interview, or the following day. The questionnaires were put into a booklet; their order within the booklet was systematically varied in blocks of 20.

Inter-rater reliability of HRSD-17

A random sub-sample of participants consented to have their assessment audio-recorded. Over a 12-month period commencing May 2008, five psychiatrists made a recording on five occasions and one on three occasions. After this, each psychiatrist was given five recordings, one from each of the other raters, to listen to and make their own ratings (blind to the original ratings). The psychiatrists were instructed not to attempt to rate from the recordings the two items in the HRSD-17 requiring visual observation (retardation and agitation).

Statistical analyses

Data were assessed for normality using the one-sample Kolmogorov–Smirnov test of goodness of fit. Any confounding related to the order of administration was assessed with t-tests. The HADS-D, PHQ-9, and BDI-II were assessed for convergence with the HRSD-17. Data were only included where the HRSD-17 and the self-complete measures were done within 3 days of one another and where data were complete. It was considered that, with a maximum time difference of 3 days, there would be sufficient overlap in reference points.

Convergent validity was examined using Pearson correlation coefficients of each QOF-endorsed scale with the HRSD-17. Convergence of the scales' severity bandings was also investigated using the Wilcoxon signed-rank test for related samples. The established HADS-D, PHQ-9, and BDI-II severity cut-off bands for at least moderately severe symptoms of depression were assessed relative to the moderate cut-off of the APA and the NICE criteria of the HRSD-17 using receiver operating characteristic (ROC) curves.25 ROC curve analysis was then used to determine each questionnaire's optimal cut-off for at least moderately severe symptoms of depression. This cut-off was considered the most relevant focus as it is the point at which guidelines advocate the use of antidepressant medication.5,7

Sensitivity and specificity of the scales at detecting symptoms of at least moderate severity were calculated with accompanying confidence intervals (CIs),26 as were positive and negative predictive values (PPVs and NPVs, respectively) and likelihood ratios (LRs).27–28 LRs of >10 and <0.1 were considered to provide sufficient evidence for use in clinical practice.27 Analyses were conducted using SPSS (version 17).

To assess inter-rater reliability on the HRSD-17, the 15 items rated by both a primary and secondary rater were summed. Paired t-tests assessed consistent differences; intraclass correlation (ICC) was calculated to express the between-pair variance as a proportion of the total variance.

RESULTS

Between October 2007 and April 2009, 1134 patients were invited to participate in the study; 286 (25%) did participate. Of those, 137 were randomised to complete the questionnaires before a HRSD-17 interview and 131 were randomised to complete the questionnaires after the interview. Eighteen participants were not randomised and a further participant was interviewed with the non-GRID version of the HRSD–17 and so these data were excluded. The mean age of responders was 49.8 years (standard deviation [SD] 14.1), 184 (70%) were female, and 244 (99%) were of white ethnicity.

Some 222 (84%) interviews were conducted face to face and 41 (16%) by telephone. Four of the participants who returned their questionnaires did not attend the interview. Assessment results are shown in Table 1. None of the scale scores differed significantly from normal distribution and no adverse events were reported.

View this table:
  • View inline
  • View popup
Table 1

Results of measurement tool assessments

Analysis was restricted to the 233 participants who fully completed the questionnaires and participated in the HRSD-17 interview within 3 days.

Convergent validity

The questionnaires correlated moderately with HRSD-17: HADS-D and HRSD-17, r = 0.68; PHQ-9 and HRSD-17, r = 0.79; BDI-II and HRSD-17, r = 0.75.

Convergence of severity banding

When compared with APA's HRSD-17 depression severity cut-offs, the HADS-D tended to categorise participants in a milder category (P<0.001), whereas the PHQ-9 and BDI-II tended to categorise participants in a more severe category (P<0.01 and P<0.001 respectively). Compared with NICE's HRSD-17 depression severity cut-offs, the HADS-D did not converge (P<0.01). The tendency for the PHQ-9 and BDI-II to categorise participants in a more severe category than the HRSD-17 was further pronounced when using the NICE cut-offs (P<0.001 for PHQ-9 and BDI-II). Figures 2a and 2b demonstrate the lack of alignment with the APA and NICE cut-offs respectively, (that is, only where the label indicates ‘= HRSD–17’ was their alignment in the categorisation of the depression severity).

Figure 2a
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2a

Percentage convergence of severity categories (HRSD-17 American Psychiatric Association Handbook of Psychiatric Measures cut-offs versus HADS-D, PHQ-9 and BDI-II).

Figure 2b. Percentage convergence of severity categories (HRSD-17 National Institute for Clinical Excellence (NICE CG91) cut-offs versus HADS-D, PHQ-9 and BDI-II).

Empirically derived cut-offs for symptoms of depression of moderate severity

The area under the ROC curve of each self-complete depression measure for the APA and NICE cut-offs for symptoms of depression of at least moderate severity is shown in Table 2. The ROC curves are presented in Figures 3a and 3b. All three measures were shown to perform significantly better than chance at discriminating between those above and below the APA (P<0.001) and NICE (P<0.001) thresholds for moderate symptoms of depression.

View this table:
  • View inline
  • View popup
Table 2

Area under the ROC curve of HADS-D, PHQ-9, and BDI-II depression severity measures, relative to HRSD-17

Figure 3a
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3a

ROC curve of HADS-D, PHQ-9, and BDI-II depression severity measures against HRSD-17 ≥14 (APA criteria of at least moderate severity).

Figure 3b. Receiver operating characteristics (ROC) curve of HADS-D, PHQ-9 and BDI-II depression severity measures against HRSD-17≥19 (NICE criteria of at least moderate severity).

For each self-complete depression severity measure, Table 3 shows the discriminatory properties at the moderate cut-off defined by the scales' developers, as well as the optimal cut-off, which was informed by the ROC analysis relative to the moderate HRSD-17 criteria of the APA and NICE. Respective PPVs and NPVs are also presented. Best sensitivity and specificity are found using the NICE criteria and cut-offs derived from the ROC curve analysis. However, the positive LR for all the measures is <10; most of the negative LRs are >0.1, with the exception of that for PHQ-9 ≥10 cut off against the NICE HRSD-17 criteria), indicating the scales are not sufficiently robust to rule in or out the presence of symptoms of depression of at least moderate severity.27

View this table:
  • View inline
  • View popup
Table 3

HAD-D, PHQ-9, and BDI-II depression severity measures: discriminatory performance of detecting at least moderate depression severity relative to HRSD-17

Inter-rater reliability

The difference between the ratings of the original assessment (primary value) and the ratings made from the audio recordings (secondary value) of the 15-item summed Hamilton scale — which excludes the two items that required visual observation —was normally distributed. The test statistic for the paired t-test was 0.09 (degrees of freedom = 29), P = 0.93, indicating that there was no evidence of any systematic difference between the primary and secondary ratings. The ICC coefficient for the summed scale was 0.95 (95% CI = 0.90 to 0.98), demonstrating acceptable agreement.

DISCUSSION

Summary

The HADS-D, PHQ-9, and BDI-II correlated moderately with the HRSD-17. All of the scales differed significantly in how they categorised depression severity relative to the HRSD-17 cut-offs, as determined by both APA and NICE criteria. Efforts to derive optimal cut-offs did not yield values with LRs that were adequate to inform clinical practice.27

Strengths and limitations

To the authors' knowledge, this is the first study to assess the three QOF-endorsed depression severity measurement tools in terms of their ability to measure the severity of symptoms of depression.

The HRSD-17 interview is not diagnostic and some have argued that the QOF-endorsed measures ought to be assessed against an interview such as the SCID.12 However, the aim of using these scales in UK treatment of depression is not as case-finding tools, but as assessors of severity of depression that has already been diagnosed by a GP, in order to identify appropriate evidence-based treatment options. Although the SCID generates diagnostic categories with a crude severity dimension, the evidence base relies on the HRSD-17 in clinical trials, comparing variations in severity with outcome.29,30 The fact that different cut-offs exist for HRSD-17 (APA and NICE) highlights ongoing uncertainty in this area of rating scales and their validity. Furthermore, NICE guidance states that its new classification should not be taken as clear cut-offs;23 this raises the question ‘What should they be taken as?’.

The HRSD-17 scores in this sample were broad in their range, but only a quarter of patients who were invited to participate did so. The priority for this sample, sought for psychometric assessment, was that, as well as covering a broad socioeconomic and urban/rural demographic, it had a distribution of patients with symptoms of depression of differing severity. The authors are satisfied that this study's sample included a wide range of primary care patients from those in remission to those with very severe symptoms. The sample was similar, in terms of sex, with patients consulting GPs for depression throughout Scotland in 2007/2008.31

The sample does not represent the ethnic diversity of some parts of the UK and this may impact on the generalisability of the findings.

Comparisons with existing literature

Several studies have raised concerns regarding the validity of the HADS-D and PHQ-9 in terms of their assessment of depression severity.9–11 The current study is able to conclude that, both the HADS-D and the PHQ-9 categorise the severity of depression inaccurately when compared with HRSD-17. The HADS-D tends to place participants in a milder category of depression than the HRSD-17 and the PHQ-9 tends to place individuals in a more severe category. This latter tendency is also true of the BDI-II. These findings suggest the scales are not all measuring the same aspects of depression.

The assessment of the PHQ-9 found the measure to have better psychometric properties, in terms of severity assessment, when compared with another European study.32 The current study had a closer time restriction between administrations of the HRSD-17 and completion of the questionnaires and should, therefore, more accurately assess concurrent measurement of mood.

The rationale for introducing depression severity measures into the QOF was partly informed by a study in which it was observed that GPs were inaccurate in their categorisation of depression severity.6 However, the standard by which GPs were assessed in this study was the HADS-D. This study has demonstrated that the HADS-D is inaccurate in categorising the severity of symptoms of depression and it is, therefore, questionable whether the clinical judgement of GPs in assessing depression severity in patients whom they suspect to be depressed, is any better or worse. Given the present findings, GPs' intuition regarding the benefit of history-taking over applying such measures33–35 may be well founded.

It has been emphasised by NICE that the interpretation of scores alone should not be relied upon when assessing an individual with possible depression, but that other factors — including functional impairment, history, family history, and presence of other comorbid conditions — should also be considered.5 Nonetheless, to be of value, the scales must meet acceptable standards of accuracy.

Implications for research and practice

Although this study proposes new cut-offs for possible use in assessing depression severity, the likelihood ratios indicate they are still insufficiently precise to recommend for clinical use. As such, the time and effort expended in recording the use of the scales within the QOF mechanism seems to be something of a poor investment. More work is required to determine the rational selection of treatment strategies for depression in primary care — there is a danger that the setting of the QOF standard examined here has lent an unjustified veneer of confidence to the management of the condition, obscuring the paucity of basic research.

Acknowledgments

We would like to thank the patients and staff of the nine practices in Grampian who kindly participated in this study and, in particular: Dr Martin McCrone, for facilitating the practice recruitment; Ms Kirsty Sykes and Ms Laura McQueen, for preparing research materials and assisting with data checking; Professor Richard Morriss, for drawing our attention to the GRID version of HRSD-17; Professor Ian Anderson, for advice regarding severity cut-offs of HRSD-17; the Scottish Primary Care Research Network, for support with recruitment; and NHS Quality Improvement Scotland, for funding.

Notes

Funding

Funding was provided by NHS Quality Improvement Scotland (QIS), which had no further role in study design; the collection, analysis and interpretation of data; the writing of the report; or in the decision to submit the paper for publication.

Ethical approval

This research was conducted with the approval of the North of Scotland Research Ethics Committee (reference number: 07/S0802/40).

Provenance

Freely submitted; externally peer reviewed.

Competing interests

Isobel M Cameron, Ian C Reid, John R Crawford and Kenneth Lawton were grantholders of funding from NHS Quality Improvement Scotland, which covered IMC's salary for work carried out on this paper.

Discuss this article

Contribute and read comments about this article on the Discussion Forum: http://www.rcgp.org.uk/bjgp-discuss

  • Received March 2, 2011.
  • Revision received March 28, 2011.
  • Accepted May 17, 2011.
  • © British Journal of General Practice, January 2011

REFERENCES

  1. ↵
    1. NHS Employers and the General Practitioners' Committee
    (2009) Quality and Outcome Frameworks: guidance for GMS contract 2009/10, 1–162.
  2. ↵
    1. Kroenke K,
    2. Spitzer RL,
    3. Williams JB
    (2001) The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 16(9):606–613.
    OpenUrlCrossRefPubMed
  3. ↵
    1. Zigmond AS,
    2. Snaith P
    (1983) The Hospital Anxiety and Depression Scale (HAD). Acta Psychiatrica Scandinavica 67(6):361–370.
    OpenUrlCrossRefPubMed
  4. ↵
    1. Beck AT,
    2. Steer RA,
    3. Ball R,
    4. Ranieri WF
    (1996) Comparison of Beck Depression Inventories-IA and -II in psychiatric outpatients. J Pers Assess 67(3):588–597.
    OpenUrlCrossRefPubMed
  5. ↵
    1. NICE
    (2009) Depression: the treatment and management in adults (update) (NHS National Institute for Clinical Excellence, London), CG90, pp 1–585.
  6. ↵
    1. Kendrick T,
    2. King F,
    3. Albertella L,
    4. Smith P
    (2005) GP treatment decisions for patients with depression. Br J Gen Pract 55(513):280–286.
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Anderson IM,
    2. Baldwin RC,
    3. Cowen PJ,
    4. et al.
    (2008) Evidence-based guidelines for treating depressive disorders with antidepressants: a revision of the 2000 British Association for Psychopharmacology guidelines. J Psychopharmacol 22(4):343–396.
    OpenUrlCrossRefPubMed
  8. ↵
    1. Hamilton M
    (1960) A rating scale for depression. J Neurol Neurosurg Psychiatry 23:56–62.
    OpenUrlFREE Full Text
  9. ↵
    1. Cameron IM,
    2. Crawford JR,
    3. Lawton K,
    4. Reid IC
    (2008) Psychometric comparison of the PHQ-9 and HADS for measuring depression severity in primary care. Br J Gen Pract 58(546):32–36.
    OpenUrlAbstract/FREE Full Text
  10. ↵
    1. Hansson M,
    2. Chotai J,
    3. Nordstom A,
    4. Bodlund O
    (2009) Comparison of two self-rating scales to detect depression: HADS and PHQ-9. Br J Gen Pract 59(566):283–288.
    OpenUrlAbstract/FREE Full Text
  11. ↵
    1. Reddy P,
    2. Dunbar J,
    3. Ford D,
    4. Philpot B
    (2010) Identification of depression in diabetes: the utility of the PHQ-9 and HADS-D. Br J Gen Pract 60(575):239–245.
    OpenUrlFREE Full Text
  12. ↵
    1. Kendrick T,
    2. Dowrick C,
    3. McBride A,
    4. et al.
    (2009) Management of depression in UK general practice in relation to scores on depression severity questionnaires: analysis of medical record data. BMJ 338:b750.
    OpenUrlAbstract/FREE Full Text
  13. ↵
    1. Gilbody S,
    2. Richards D,
    3. Brealey S,
    4. Hewitt C
    (2007) Screening for depression in medical settings with the Patient Health Questionnaire (PHQ): a diagnostic meta-analysis. J Gen Intern Med 22(11):1596–1602.
    OpenUrlCrossRefPubMed
    1. Wittkampf KA,
    2. Naeije L,
    3. Schene AH,
    4. et al.
    (2007) Diagnostic accuracy of the mood module of the Patient Health Questionnaire: a systematic review. Gen Hosp Psychiatry 29(5):388–395.
    OpenUrlCrossRefPubMed
    1. Kroenke K,
    2. Spitzer RL,
    3. Williams JBW,
    4. Lowe B
    (2010) The Patient Health Questionnaire Somatic, Anxiety, and Depressive Scales: a systematic review. Gen Hosp Psychiatry 32(4):345–359.
    OpenUrlCrossRefPubMed
  14. ↵
    1. Lowe B,
    2. Spitzer RL,
    3. Kerstin G,
    4. et al.
    (2004) Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians' diagnoses. J Affect Disord 78(2):131–140.
    OpenUrlCrossRefPubMed
  15. ↵
    1. Spitzer RL,
    2. Williams JBW,
    3. Gibbon M,
    4. First MB
    (1990) User's guide for the Structured Clinical Interview for DSM-III-R (American Psychiatric Press, Washington DC, US).
  16. ↵
    1. Beck AT,
    2. Steer RA,
    3. Brown GK
    (1996) Manual for Beck Depression Inventory-II (Psychological Corporation, San Antonio, TX).
  17. ↵
    1. Herrmann C
    (1997) International experiences with the Hospital Anxiety and Depression Scale — A review of validation data and clinical results. J Psychosom Res 42(1):17–41.
    OpenUrlCrossRefPubMed
    1. Bjelland I,
    2. Dahl AA,
    3. Haug TT,
    4. Neckelmann D
    (2002) The validity of the Hospital Anxiety and Depression Scale: an updated literature review. J Psychosom Res 52(2):69–77.
    OpenUrlCrossRefPubMed
  18. ↵
    1. Crawford JR,
    2. Henry JD,
    3. Crombie C,
    4. Taylor EP
    (2001) Normative data for the HADS from a large non-clinical sample. Br J Clin Psychol 40(Pt 4):429–434.
    OpenUrlCrossRefPubMed
  19. ↵
    1. American Psychiatric Association Task Force for the Handbook of Psychiatric Measures editor
    (2000) Handbook of psychiatric measures (American Psychiatric Association, Washington DC, US).
  20. ↵
    1. NICE
    (2009) Depression with a chronic physical health problem. NHS National Institute for Clinical Excellence, CG91.
  21. ↵
    1. Potts MK,
    2. Daniels M,
    3. Burnam MA,
    4. Wells KB
    (1990) A structured interview version of the Hamilton Depression Rating Scale: evidence of reliability and versatility of administration. J Psychiat Res 24(4):335–350.
    OpenUrlCrossRefPubMed
  22. ↵
    1. Murphy JM,
    2. Berwick DM,
    3. Weinstein MC,
    4. et al.
    (1987) Performance of screening and diagnostic tests. Application of receiver operating characteristic analysis. Arch Gen Psychiatry 44(6):550–555.
    OpenUrlCrossRefPubMed
  23. ↵
    1. Harper R,
    2. Reeves B
    (1999) Reporting of precision of estimates for diagnostic accuracy: a review. BMJ 318(7194):1322–1323.
    OpenUrlFREE Full Text
  24. ↵
    1. Deeks JJ,
    2. Altman DG
    (2004) Diagnostic tests 4: likelihood ratios. BMJ 329(7458):168–169.
    OpenUrlFREE Full Text
  25. ↵
    1. Altman DG,
    2. Machin D,
    3. Bryant TN,
    4. Gardner MJ
    (2000) Statistics with confidence (BMJ Books, London), 2nd edn.
  26. ↵
    1. Kirsch I,
    2. Deacon BJ,
    3. Huedo-Medina TB,
    4. et al.
    (2008) Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration. PLoS Med 5(2):e45.
    OpenUrlCrossRefPubMed
  27. ↵
    1. Fournier JC,
    2. DeRubeis RJ,
    3. Hollon SD,
    4. et al.
    (2010) Antidepressant drug effects and depression severity: a patient-level meta-analysis. JAMA 303(1):47–53.
    OpenUrlCrossRefPubMed
  28. ↵
    1. NHS National Services Scotland
    General Practice — Practice Team Information (PTI.). http://www.isdscotland.org/isd/3711.html (accessed 1 Jun 2011).
  29. ↵
    1. Wittkampf K,
    2. van Ravesteijn H,
    3. Baas K,
    4. et al.
    (2009) The accuracy of Patient Health Questionnaire-9 in detecting depression and measuring depression severity in high-risk groups in primary care. Gen Hosp Psychiatry 31(5):451–459.
    OpenUrlCrossRefPubMed
  30. ↵
    1. Leydon GM,
    2. Dowrick CF,
    3. McBride AS,
    4. et al.
    (2011) Questionnaire severity measures for depression: a threat to the doctor-patient relationship? Br J Gen Pract 61(583):117–123.
    OpenUrlAbstract/FREE Full Text
    1. Mitchell C,
    2. Dwyer R,
    3. Hagan T,
    4. Mathers N
    (2011) Impact of the QOF and the NICE guideline in the diagnosis and management of depression: a qualitative study. Br J Gen Pract 61(586):279–289.
    OpenUrl
  31. ↵
    1. Dowrick C,
    2. Leydon GM,
    3. McBride A,
    4. et al.
    (2009) Patients' and doctors' views on depression severity questionnaires incentivised in UK quality and outcomes framework: qualitative study. BMJ 338:b663.
    OpenUrlAbstract/FREE Full Text
Back to top
Previous ArticleNext Article

In this issue

British Journal of General Practice: 61 (588)
British Journal of General Practice
Vol. 61, Issue 588
July 2011
  • Table of Contents
  • Index by author
Download PDF
Download PowerPoint
Article Alerts
Or,
sign in or create an account with your email address
Email Article

Thank you for recommending British Journal of General Practice.

NOTE: We only request your email address so that the person to whom you are recommending the page knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Measuring depression severity in general practice: discriminatory performance of the PHQ-9, HADS-D, and BDI-II
(Your Name) has forwarded a page to you from British Journal of General Practice
(Your Name) thought you would like to see this page from British Journal of General Practice.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Measuring depression severity in general practice: discriminatory performance of the PHQ-9, HADS-D, and BDI-II
Isobel M Cameron, Amanda Cardy, John R Crawford, Schalk W du Toit, Steven Hay, Kenneth Lawton, Kenneth Mitchell, Sumit Sharma, Shilpa Shivaprasad, Sally Winning, Ian C Reid
British Journal of General Practice 2011; 61 (588): e419-e426. DOI: 10.3399/bjgp11X583209

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero

Share
Measuring depression severity in general practice: discriminatory performance of the PHQ-9, HADS-D, and BDI-II
Isobel M Cameron, Amanda Cardy, John R Crawford, Schalk W du Toit, Steven Hay, Kenneth Lawton, Kenneth Mitchell, Sumit Sharma, Shilpa Shivaprasad, Sally Winning, Ian C Reid
British Journal of General Practice 2011; 61 (588): e419-e426. DOI: 10.3399/bjgp11X583209
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One
  • Mendeley logo Mendeley

Jump to section

  • Top
  • Article
    • Abstract
    • INTRODUCTION
    • METHOD
    • RESULTS
    • DISCUSSION
    • Acknowledgments
    • Notes
    • REFERENCES
  • Figures & Data
  • Info
  • eLetters
  • PDF

Keywords

  • depression
  • primary care
  • sensitivity
  • severity
  • specificity

More in this TOC Section

  • Supporting families managing childhood eczema: developing and optimising eczema care online using qualitative research
  • Accuracy of the NICE traffic light system in children presenting to general practice: a retrospective cohort study
  • Impact of a case-management intervention for reducing emergency attendance on primary care: randomised control trial
Show more Research

Related Articles

Cited By...

Intended for Healthcare Professionals

BJGP Life

BJGP Open

 

@BJGPjournal's Likes on Twitter

 
 

British Journal of General Practice

NAVIGATE

  • Home
  • Current Issue
  • All Issues
  • Online First
  • Authors & reviewers

RCGP

  • BJGP for RCGP members
  • BJGP Open
  • RCGP eLearning
  • InnovAiT Journal
  • Jobs and careers

MY ACCOUNT

  • RCGP members' login
  • Subscriber login
  • Activate subscription
  • Terms and conditions

NEWS AND UPDATES

  • About BJGP
  • Alerts
  • RSS feeds
  • Facebook
  • Twitter

AUTHORS & REVIEWERS

  • Submit an article
  • Writing for BJGP: research
  • Writing for BJGP: other sections
  • BJGP editorial process & policies
  • BJGP ethical guidelines
  • Peer review for BJGP

CUSTOMER SERVICES

  • Advertising
  • Contact subscription agent
  • Copyright
  • Librarian information

CONTRIBUTE

  • BJGP Life
  • eLetters
  • Feedback

CONTACT US

BJGP Journal Office
RCGP
30 Euston Square
London NW1 2FB
Tel: +44 (0)20 3188 7400
Email: journal@rcgp.org.uk

British Journal of General Practice is an editorially-independent publication of the Royal College of General Practitioners
© 2022 British Journal of General Practice

Print ISSN: 0960-1643
Online ISSN: 1478-5242