Intended for healthcare professionals

CCBYNC Open access
Research

Reliability of patient responses in pay for performance schemes: analysis of national General Practitioner Patient Survey data in England

BMJ 2009; 339 doi: https://doi.org/10.1136/bmj.b3851 (Published 29 September 2009) Cite this as: BMJ 2009;339:b3851
  1. Martin Roland, professor of health services research1,
  2. Marc Elliott, senior statistician2,
  3. Georgios Lyratzopoulos, clinical senior research associate1,
  4. Josephine Barbiere, research assistant1,
  5. Richard A Parker, research assistant1,
  6. Patten Smith, director of research methods3,
  7. Peter Bower, reader in health services research4,
  8. John Campbell, professor of general practice and primary care5
  1. 1Department of Public Health and Primary Care, University of Cambridge, Institute of Public Health, Cambridge CB2 0SR
  2. 2RAND Corporation, 1776 Main Street, Santa Monica, CA 90401-3208, USA
  3. 3Ipsos MORI, London SE1 1FY
  4. 4National Primary Care Research and Development Centre, University of Manchester, Manchester M13 9PL
  5. 5Peninsula Medical School, Smeall Building, Exeter EX1 2LU
  1. Correspondence to: M Roland mr108{at}cam.ac.uk
  • Accepted 16 September 2009

Abstract

Objective To assess the robustness of patient responses to a new national survey of patient experience as a basis for providing financial incentives to doctors.

Design Analysis of the representativeness of the respondents to the GP Patient Survey compared with those who were sampled (5.5 million patients registered with 8273 general practices in England in January 2009) and with the general population. Analysis of non-response bias looked at the relation between practice response rates and scores on the survey. Analysis of the reliability of the survey estimated the proportion of the variance of practice scores attributable to true differences between practices.

Results The overall response rate was 38.2% (2.2 million responses), which is comparable to that in surveys using similar methodology in the UK. Men, young adults, and people living in deprived areas were under-represented among respondents. However, for questions related to pay for performance, there was no systematic association between response rates and questionnaire scores. Two questions which triggered payments to general practitioners were reliable measures of practice performance, with average practice-level reliability coefficients of 93.2% and 95.0%. Less than 3% and 0.5% of practices had fewer than the number of responses required to achieve conventional reliability levels of 90% and 70%. A change to the payment formula in 2009 resulted in an increase in the average impact of random variation in patient scores on payments to general practitioners compared with payments made in 2007 and 2008.

Conclusions There is little evidence to support the concern of some general practitioners that low response rates and selective non-response bias have led to systematic unfairness in payments attached to questionnaire scores. The study raises issues relating to the validity and reliability of payments based on patient surveys and provides lessons for the UK and for other countries considering the use of patient experience as part of pay for performance schemes.

Introduction

Financial incentives to improve quality of care, sometimes called pay for performance schemes, have been introduced recently in many countries, including the United States,1 2 Spain,3 and Australia.4 The United Kingdom embarked on the most ambitious of these schemes in 2004 with an initiative in which 25% of general practitioners’ pay was tied to a complex set of quality indicators, the quality and outcomes framework.5 In common with other countries, most of the indicators in the original UK framework related to clinical care. These incentives were associated with accelerated improvement for some aspects of chronic disease management6 and a reduction in inequalities in the delivery of primary care.7

Patient experience is an additional important component of quality, and questionnaires to measure patient experience have been widely used to assess care.8 9 10 11 12 13 Some schemes have included patient experience as an incentivised element of quality, either with providers incentivised to report on patient experience14 or with a direct link between patients’ evaluation and doctors’ income.15 16 In 2007 and 2008, financial incentives were attached to the results of a survey in the UK in which patients reported on how easy they found it to get an appointment with their doctor. In 2009, as part of negotiations between NHS Employers and the General Practitioners Committee of the British Medical Association, this incentive was included in the quality and outcomes framework: £68m ($109m) of general practitioners’ pay was tied to patients’ reported experience of access to care (getting urgent appointments and being able to book ahead).17

Following the development of a substantially extended survey instrument during the second half of 2008, the new GP Patient Survey was administered by Ipsos MORI between January and March 2009. Postal questionnaires in English were sent with reply paid envelopes to 5 660 232 patients aged 18 or over who had been continuously registered with a general practice in England for at least six months. Non-respondents were sent reminders and additional questionnaires one month and two months after the initial mailing unless they indicated that they wished to opt out of the survey. Questionnaires could be completed in 13 non-English languages on line or by telephone, and in British Sign Language on line (3.4%, 0.01%, and 0.03% of responses respectively). Patients could also complete the survey in Braille or large print, but these options produced no responses. Out of 8426 practices in England, 135 with fewer than 50 qualifying patients at the time of sampling were excluded, as were another 18 practices, leaving 8273 (mean list size 4946 patients) contributing to the data. To get sufficient precision on responses related to payments, stratified random samples were drawn from practice lists, which resulted in an average of 260 responses from practices of all sizes. The sampling method involved over-sampling from small practices and from practices with low response rates in previous surveys conducted by Ipsos MORI. The mean number of patients invited to take part per practice was 684. Details of the development of the survey—including piloting, cognitive interviewing, sampling, sample size calculations, and details of response rates—have been reported elsewhere.18 19

Results of the survey were published in July 2009 for each general practice in England at www.gp-patient.co.uk/results. Patients generally reported positive experiences and high levels of satisfaction with their general practices. Questions where difficulty or dissatisfaction with care was expressed by more than 15% of patients included privacy in reception, getting through on the telephone, speaking to a doctor on the phone, being able to book appointments ahead, waiting to be seen by a doctor, and being able to see a doctor of the patient’s choice. A change in the payment formula that related questionnaire scores to general practitioner income (see appendix 1 on bmj.com) meant that many general practitioners did not receive the income they expected from their survey results. Political discussion focused on the validity and reliability of the survey, and many appeals were mounted by general practitioners in relation to the payments they had received. These appeals revolved around two core issues—that insufficient patients were surveyed in some practices, and that the poor response rate meant that the survey results were biased.

Using patient experience to measure quality raises issues of reliability and validity that are not encountered with more commonly used measures of clinical quality. Reliability issues include the precision of measurement and the ability of questions to discriminate between practices.20 In terms of validity, there are several reasons why responses to patient questionnaires need to be interpreted with caution, including bias associated with selective non-response,21 inaccuracy in recall,22 the effect of context on patient response,23 different expectations in different population groups,24 and other sources of evidence on which patients draw when making judgments.25 26 In addition, there are few truly independent methods of assessing external validity. For example, compared with the US,27 UK patients rarely change practice because they are dissatisfied with their care, so voluntary movement between practices cannot readily be used as an external measure of validity.

As part of an ongoing programme of research on patient experience, we explored issues of reliability and non-response bias in the UK survey, outlining important lessons for the UK and for other countries wishing to link physician payments to the results of patient questionnaires.

Method

We examined the representativeness of survey respondents in two ways. Firstly, we compared the patients who responded to the survey with those who were sampled in term of age, sex, and deprivation. Secondly, we compared the respondents with estimates from the general population in terms of age, sex, deprivation, and ethnicity.

We then examined the relation between practice response rates and the scores obtained on the two questions that triggered payments to general practitioners, directly and controlling for patient characteristics. This was done to determine whether practices with lower response rates might have received responses from patients with systematically more positive or negative experiences, leading to biased scores.

Finally, we calculated the reliability of practice-level mean scores for the two questions that related to payments to general practitioners. The reliability coefficient (an index of 0 to 1) represents the proportion of the variance of practice-level mean scores that is attributable to true differences between practices, as opposed to differences which might be attributable to incomplete measurement. Reliability greater than 70% indicates acceptable reliability; reliabilities of 80-90% are regarded as preferable for higher stakes applications such as pay for performance.28

Details of the statistical methods used for these analyses are shown in appendix 2 on bmj.com. Analyses were carried out using SPSS version 15 for descriptive analyses, and STATA version 10 for regression models, weighting analyses, and analyses of reliability and non-response bias.

Results

The overall response rate was 38.2% (2 163 456), which is comparable to that achieved in surveys using similar methodology in the UK, but lower than the 44% and 41% achieved in the shorter 2007 and 2008 surveys which measured only patients’ experience of access to primary care.29 30 In part, the response rate reflects deliberate over-sampling from practices with historically low response rates to obtain sufficient responses for each practice. The overall response rate weighted to adjust for this aspect of the survey design was 42.3 %. An average of 261.7 (standard deviation 39.9) patients responded in each practice; 361 practices (4.4%) had fewer than 200 responses, and 47 practices (0.6%) had fewer than 50 responses.

Representativeness of patients who responded to the survey

Table 1 shows the sociodemographic factors that predicted survey response. The youngest patients (age 18-29 years) were the least likely to respond. The odds of responding increased slightly in middle aged patients, then increased substantially in elderly patients to a peak at the ages 70-79, where the odds of responding were 5.54 times as high as for the reference group aged 18-29. Response propensity then declined beyond age 79, with patients aged 80-89 (odds ratio 3.59) less likely to respond than those aged 60-69 (odds ratio 4.53). For socioeconomic deprivation, the odds of responding declined approximately linearly with increasing deprivation, with those in the second deprivation fifth having 10% lower odds of response than those in the reference top fifth (odds ratio 0.90 (95% CI 0.90 to 0.91)) and those in the bottom fifth of deprivation having about half the odds of response as those in the top fifth (odds ratio 0.52 (0.52 to 0.53)). For sex, the odds of responding were 41% lower for men than for women, allowing for the effects of age and deprivation.

Table 1

 Predictors of patient response to GP Patient Survey 2009 by age, sex, and deprivation (n=5 658 740*)

View this table:

Table 2 shows the proportions of respondents from different age and deprivation bands and ethnic groups compared with recent population based estimates for England from the Office for National Statistics. These are shown in the table in terms of weighted characteristics of respondents, but analyses using unweighted estimates showed a similar overall pattern.

Table 2

 Comparison of weighted demographic characteristics of respondents to GP Patient Survey 2009 with estimates for the resident population of England aged ≥18 years (mid-year 2007). All values are percentages

View this table:

These data show under-representation of men, younger people, and people from deprived areas among survey respondents. For ethnicity, the proportion of respondents describing themselves as white British was similar to the general population. There was under-representation of respondents describing their ethnicity as “Asian/Asian British,” “black/black British,” “mixed,” and “Chinese,” with 2.7% fewer respondents from these four ethnic groups than would have been expected from the general population. At the same time, there was a substantial over-representation of individuals describing their ethnicity as “other” (3% more than the proportion in the general population).

Non-response bias

Practices with high rates of response had slightly higher scores on the two access questions relating to general practitioner payments (PE7 and PE8, see details in appendix 1). Pearson (Spearman) correlations were 0.34 (0.34) for PE7 and 0.15 (0.18) for PE8 (P<0.001 for each). However, these associations were almost entirely a function of the demographic characteristics of practices, as the partial correlations of response rate with the two payment questions were weak and inconsistent in direction once we controlled for demographic factors (0.04 and −0.07 for PE7 and PE8 respectively, P<0.05 for each). The response rate on its own was associated with less than 0.2% of the variance in both PE7 and PE8, and with practice PE7 scores increasing by 0.4% and PE8 scores decreasing by 1.1% for a 10% increase in practice response rate.

Practice-level reliability

The two questions on access to care which related directly to payments to general practitioners (PE7 and PE8) had intra-class correlations of 0.079 and 0.125 respectively, resulting in practice-level reliability coefficients of 93.2% and 95% at the overall mean number of 262 responses per practice (with a mean number of 160 and 134 responses for the two payment questions). In total, 97.0% and 96.9% of practices achieved the 90% threshold for reliability for PE7 and PE8, and 99.5% of practices achieved the number of responses required for 70% reliability. To meet the higher standard of 90% reliability, 105 and 64 responses to the two payment questions were needed.

Although the survey met internationally recognised standards for high reliability,29 a change in the payment formula between 2008 and 2009 increased the effect of random variation on payments for many practices (see appendix 1).

Discussion

The quality and outcomes framework has focused British general practitioners’ attention on clinical aspects of care,31 32 and a focus of incentives on more holistic aspects of care, including patient experience, may seem a logical extension of the scheme. As with clinical indicators, patient experience was measured at the level of the practice (averaging between four and five doctors per practice in the UK) even though this may itself obscure a substantial amount of variation that is likely to occur between individual physicians.33

The 2009 GP Patient Survey had an overall response rate of 38.2%, which is 3-6% lower than the other recent access surveys in primary care, though the rate in part reflects deliberate over-sampling from practices known to have low response rates. Many general practitioners expressed concern that these low rates of response would lead to unreliable scores, with particular concern expressed about the questions which related to payment. Previous research suggests that, as long as rigorous probability sampling processes are followed, the association between response rates and non-response bias is weak.34 35 36 In surveys of healthcare experiences, it is generally the case that, after controlling for patient demographic characteristics, patients with more positive experiences are more likely to respond than those with less positive experiences,37 and there has been some concern that low response rates may disproportionately omit those patients with poor experiences. This would upwardly bias the scores of providers with low response rates relative to their peers with higher response rates, and some have proposed corrections in attempts to address this possibility.37 For the UK GP Patient Survey, we found little evidence that variation in response rates would result in any systematic disadvantage to practices with either low or high rates of response for the questions that were associated with payments to general practitioners.

We observed patterns of non-response that are similar to those in other surveys37 38 39—with men, young people, and people living in deprived areas being less likely to respond to the questionnaire. In some ways, this reflects consulting experience, since women and older people are more likely to consult a doctor. However, the low response rate from deprived areas reinforces systematic under-representation of the views of people living in deprived areas as they are on average high users of general practice services.

The sampling procedure produced highly reliable estimates of patient experience at the average number of 262 completed surveys per practice. For the 3% of practices with fewer than the 105 and 64 responses to the two payment questions needed to meet the highest standards of 90% reliability (for PE7 and PE8 respectively), we recommend that larger numbers of patients from these practices should be sampled in future rounds of the survey. The overall effect of random variation on payments was increased compared with previous years because of a change in the payment formula, and we discuss this in more detail in the appendices to this paper.

Information on patient experience from the GP Patient Survey is now publicly available for English practices. However, practitioners may not engage with these new measures of quality if they believe them to be flawed in terms of validity or reliability. The analyses here suggest that the current survey procedures result in reliable survey estimates of performance at the practice level on the pay for performance items which we examined. However, the attachment of large amounts of payment may also have led practitioners to focus excessively on the some technical aspects of the survey, and limit their value as a quality improvement tool. Future research should focus on the validity of the responses in addition to establishing whether case mix differences in practice populations affect the equivalence of measures of patient experience.

The UK experience of integrating survey based measures of patient experience into pay for performance initiatives has several lessons for other countries. Firstly, such measures must be designed to produce reliable and valid estimates of performance at the intended level of measurement (such as practice or practitioner). Secondly, financial incentives must be designed so that true differences between practices are large in relation to background random variation. Reliable survey measurement is a necessary but not sufficient condition for robust pay for performance based on patient experience. Designers of payment formulas typically pay close attention to the incentives they intend to create, but may not consider the ways in which the formula can erode the reliability of an otherwise valid measure. Pay for performance initiatives are unlikely to be effective unless there is a strong correspondence between actual performance and compensation. Thirdly, few doctors are likely to be familiar with technical aspects of survey methodology, and the linkage of large payments to surveys may create considerable disquiet. In one US example, public reporting of patient experience data for hospital care preceded an anticipated pay for performance scheme, providing an opportunity to address these concerns before the addition of significant financial stakes.40

Patient reported measures of quality are an important aspect of care, and the GP Patient Survey represents a major opportunity to improve care on a national scale. None the less, additional refinements of the measurement or compensation process and ongoing dialogue with practising doctors will be essential if the survey is to play an important role in improving patient experience in the UK.

What is already known on this topic

  • Patient experience is an important component of quality of health care

  • Some countries are introducing patient experience as part of pay for performance schemes for healthcare workers

  • UK general practices are now paid on the basis of patient responses to two questions on getting appointments which are part of a larger national survey

What this study adds

  • The questions associated with payment to general practitioners showed a high degree of reliability

  • Practices with low rates of response were neither advantaged nor disadvantaged in terms of patient scores or payments received

  • A change to the payment formula in 2009 increased the impact of random variation on practice payments

Notes

Cite this as: BMJ 2009;339:b3851

Footnotes

  • We acknowledge the assistance from a wider team involved in development of the survey at Ipsos MORI, including Beccy Maeso, Helen Rowley, Juliet Brown, Lisa Valade-DeMelo, and Sonja Nissen. We are also grateful to Guy Watkins for his description of the GP payment system.

  • Contributors. All authors were involved in designing the analyses, reviewing intermediate analyses, and contributing to the final paper. MR is the guarantor of the paper.

  • Funding. The study was funded with a grant from the Department of Health. The opinions expressed are those of the authors and not of the department. All authors had full access to the data and take responsibility for the accuracy of the data analysis. The Department of Health was not involved in the analysis of the data.

  • Competing interests: PS is an employee of Ipsos MORI, which developed and delivered the GP Patient Survey for the Department of Health. MR and JC act as academic advisers to Ipsos MORI for the survey.

  • Ethical approval: Not required.

  • Data sharing: The results of the GP Patient Survey are publicly available at www.gp-patient.co.uk/results

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.

References

View Abstract