Intended for healthcare professionals

Analysis

Setting performance targets in pay for performance programmes: what can we learn from QOF?

BMJ 2014; 348 doi: https://doi.org/10.1136/bmj.g1595 (Published 04 March 2014) Cite this as: BMJ 2014;348:g1595
  1. Tim Doran, professor 1,
  2. Evangelos Kontopantelis, senior research fellow 2,
  3. David Reeves, reader 3,
  4. Matthew Sutton, professor 4,
  5. Andrew M Ryan, associate professor5
  1. 1Department of Health Sciences, University of York, York YO10 5DD, UK
  2. 2Centre for Health Informatics, Institute of Population Health, University of Manchester, Manchester, UK
  3. 3Centre for Biostatistics, Institute of Population Health, University of Manchester
  4. 4Centre for Health Economics, Institute of Population Health, University of Manchester
  5. 5Division of Outcomes and Effectiveness Research, Weill Cornell Medical College, New York, USA
  1. Correspondence to: T Doran tim.doran{at}york.ac.uk

The UK’s Quality and Outcomes Framework, which was introduced in 2004 to reward general practitioners for meeting performance targets, has been controversial. Tim Doran and colleagues reflect on the difficulties of setting and adjusting targets and the missed opportunities for improving the evidence base

For many policy makers, paying healthcare providers for performance is an intuitively appealing way of improving quality and value in services. However, the results from incentive programmes in healthcare have been inconsistent,1 2 3 4 and progress in designing them has been slow.5 6 Careful calibration of incentives is essential when determining performance indicators and setting targets. As well as being aligned with professional values, targets must be challenging but attainable, and payments must be large enough to promote high quality care without distorting clinical practice. The financial implications for both payers and providers can lead to contentious contractual disputes. We discuss how targets are set in pay for performance programmes and look at the experiences of a national incentive scheme for general practitioners in the UK.

How do payers set targets?

Most pay for performance schemes use either relative targets or absolute targets. The relative target approach compares performance between providers and is used in tournament style programmes, where higher performers are rewarded and low performers may even be penalised—for example, Medicare’s physician value based payment modifier.7 Payers must decide what proportion of providers to reward but don’t make judgments on acceptable levels of performance. This approach has several drawbacks for providers: as they do not know the targets in advance they lack a firm reference point to guide quality improvement efforts, and high levels of achievement may not be rewarded, which is particularly unfair when the distribution of performance scores is narrow and differences between providers are clinically insignificant.

With absolute targets, also known as criterion referenced targets, payers determine performance targets in advance, which requires more work on evaluation. This approach is less directly competitive because all providers can “win” or “lose” depending on their level of achievement, and payers must therefore accept greater uncertainty about programme costs. In contrast to relative targets, where targets are adjusted automatically, absolute targets need to be reset in response to provider performance so that they remain appropriate.

Quality and Outcomes Framework: points mean prizes

The UK’s primary care pay for performance programme—the Quality and Outcomes Framework (QOF)—was introduced in 2004 to provide quality related payments to family practices in addition to their existing capitation payments8 and encourage better practice.

The scheme uses a criterion referenced approach, with absolute targets set across several domains (box 1). Most of the targets relate to management of chronic disease and public health measures (such as smoking cessation and immunisation), but organisational and patient experience factors are also included. As an example, table 1 shows the current QOF targets (“indicators”) for the clinical domain of stroke and transient ischaemic attack.

Box 1: Domains in the Quality and Outcomes Framework in 2013-149

Clinical domain:

  • Atrial fibrillation

  • Secondary prevention of coronary heart disease

  • Heart failure

  • Hypertension

  • Peripheral arterial disease

  • Stroke and transient ischaemic attack

  • Diabetes mellitus

  • Hypothyroidism

  • Asthma

  • Chronic obstructive pulmonary disease

  • Dementia

  • Depression

  • Mental health

  • Cancer

  • Chronic kidney disease

  • Epilepsy

  • Learning disability

  • Osteoporosis: secondary prevention of fragility fractures

  • Rheumatoid arthritis

  • Palliative care

Public health domain:

  • Cardiovascular disease—primary prevention

  • Blood pressure

  • Obesity

  • Smoking

  • Additional services:

    • Contraception

    • Cervical screening

    • Child health surveillance

    • Maternity services

Quality and productivity domain

Patient experience domain

Table 1

Performance targets for stroke and transient ischaemic attack (TIA) in Quality and Outcomes Framework, 2013-14

View this table:

QOF sets a lower and upper threshold for each indicator. Practices can earn an increasing number of points depending on their level of achievement between the lower and upper threshold. The lower threshold sets the minimum level of achievement required to receive any remuneration and the upper threshold the achievement to receive maximum remuneration with a linear scale between (figure). In the absence of robust evidence on performance, thresholds were initially set through discussion between the Department of Health and the British Medical Association (BMA), the representative body for doctors. All lower thresholds were set at 25%, and upper thresholds were set at 50% to 90%, depending on the estimated difficulty of achieving individual indicators.

Figure1

Example of QOF clinical indicator: reported achievement scores for English general practices on DM17 (diabetic patients with total cholesterol level ≤5 mmol/L), 2004-05 to 2012-13

Because the targets are absolute and size of payments varies over a range of performance, all practices performing below the upper threshold have an incentive to make incremental improvements. However, the financial incentive to improve diminishes as the upper threshold is approached and disappears once it is exceeded. This may partly explain why the practices with the lowest baseline achievement rates improved at the fastest rate,10 although similar patterns have been observed in other incentive schemes.11 12 It could also explain the lack of improvement in achievement rates after the first three years of the scheme (figure).

Difficulties of setting QOF thresholds

Meeting QOF targets has become an established part of clinical practice in the UK. Performance is routinely monitored through practices’ clinical computing systems and over 90 000 practice years of data are now publicly available. Until recently this wealth of information was not used to set thresholds. In 2006-07 (the third year of the scheme) most minimum thresholds were arbitrarily raised to 40% and upper thresholds were raised by 5-20 percentage points for 12 of 50 clinical indicators, and lowered by 10 percentage points for one. No other changes were made until 2011, despite average practice achievement rates exceeding the upper thresholds for 66 out of 71 clinical indicators.13

There were three main reasons for this inertia. Firstly, the purpose of payment thresholds was not well defined at the outset. For example, upper thresholds were intended to represent the “maximum practically achievable level to deliver clinical effectiveness,”14 but no formal definition was provided and it was unclear whether thresholds were intended to maintain performance levels or to encourage improvement. Secondly, changes to QOF must be agreed in annual contract negotiations between the Department of Health and the BMA, and BMA representatives sought to protect practice incomes by resisting threshold increases. Contract changes are also applied uniformly across all practices, precluding piloting. Thirdly, the annual QOF review process prioritised clinical evidence on management of chronic disease over evidence on the effect of incentives on clinician behaviour, so while detailed recommendations were made for developing quality indicators,15 performance thresholds were not substantially revised.

The National Institute for Health and Care Excellence (NICE) took over the annual QOF review process in 200916 and explored alternative methods for setting targets. On reviewing national performance, researchers from the University of Manchester proposed basing upper thresholds on the 75th percentile of achievement in the previous year and lower thresholds on the fifth percentile of achievement (box 2).17 The aim was to calibrate targets against historical performance so that they remained appropriate, while retaining a criterion based approach that potentially allowed all practices to obtain maximum rewards. This approach was rejected in the 2010 contract negotiations. Instead upper thresholds for 2011-12 were increased by one percentage point for two indicators and lower thresholds were left unchanged. The following year lower thresholds were increased by 5-10 percentage points for most indicators and upper thresholds were increased by 4-10 percentage points for 13 indicators. In most cases upper thresholds remained below the average practice achievement rates.

Box 2: Method proposed by NICE working group for payment thresholds under the QOF (not adopted)

Upper thresholds—For each year, upper thresholds are set at the lower of the 75th percentile of reported achievement for all practices two years earlier or 95%.

Rationale: Defining maximally achievable performance is not feasible for individual practices because it requires detailed knowledge of individual patients. Benchmarking against national performance therefore provides a guide to attainable levels of achievement. The 75th percentile was selected as it represents the midway point between the average and the highest performers. For several indicators the 75th percentile is at, or close to, 100% achievement, raising concerns that some practices—particularly those with smaller list sizes and less room for error—would be financially disadvantaged or overtreat patients. Setting a 95% maximum upper threshold was intended to address these concerns.

Lower thresholds—For each year lower thresholds are set at the greater of fifth percentile of reported achievement for all practices two years earlier or 40%.

Rationale: The lower threshold sets the acceptable minimum level of performance but also influences the size of payment increments: the closer the lower threshold is to the upper threshold, the greater the increase in remuneration for a 1% increase in achievement. The intention was to strike a balance between setting lower thresholds at a high enough level to compress the payment range (and thereby incentivise practices with average and better baseline performance to work towards the upper thresholds) without compressing the range so far that practices with clinically insignificant differences in performance receive substantially different rewards. On modelling, the fifth percentile provided the best compromise.

In 2012 the Department of Health imposed a series of changes to the general practitioners’ contract, including setting upper thresholds for QOF indicators at the 75th percentile of achievement. To allow practices time to adapt, the new thresholds were applied to 20 indicators in 2013-14, with the remaining indicators due to follow in 2014-15.9 Under this new system upper thresholds would increase by an average of 14.7 percentage points (range 2-46 percentage points).18

However, two design decisions diluted the effect of these changes to the upper threshold. Firstly, lower thresholds were fixed at 40 percentage points below the upper thresholds. Because the distribution of achievement for most QOF indicators is compressed over a narrow range most practices could therefore earn close to the maximum payment without increasing achievement. For example, for indicator DM17—the proportion of patients with diabetes and a cholesterol concentration ≤5 mmol/Lmedian achievement in 2012-13 was 81.3%, which was within four percentage points of the 75th percentile of achievement (85.1%); hence most practices would earn at least 90% of maximum remuneration without further improvement (table 2). Secondly, the base payment per point was increased by 17%, from £133.76 to £156.92. The combined effect of these changes is that under the new system practices would, on average, earn 87.5% of their existing income from the clinical indicators if they did not make any improvements. In monetary terms, this represents a mean loss of £8324 a year, with practices in the most deprived areas losing less.18 Although this is not a trivial threat to income, it is much less than that posed by the original NICE working group proposal (table 2).

Table 2

Proportion of practices with reported achievement rates in 2012-13 above the current and proposed thresholds for indicator DM17 (diabetic patients with total cholesterol level ≤5 mmol/L)

View this table:

BMA representatives were nevertheless resistant to the changes19 and succeeded in postponing the threshold changes planned for 2014-15 by a year and in removing several unpopular indicators from the scheme.20 Most of the money earmarked for these indicators (representing 23.8% of total QOF funding) will be redirected into increased capitation payments.

Does increasing targets improve performance?

If providers are primarily motivated by extrinsic (financial or reputational) rewards, increasing upper thresholds should improve performance. Whether this improvement is meaningful or is secured through gaming (for example, by inappropriately excluding patients) will depend on the provider. If intrinsic (professional or altruistic) motivations drive performance—which is consistent with practices overachieving relative to the existing thresholds—raising thresholds may have a limited effect, although practices may still respond to new benchmarks for high quality care.

As the adjustments to QOF thresholds have so far been limited it is difficult to draw conclusions on the effect of the changes. Although thresholds for several QOF indicators were raised in 2006-07 and 2012-13, they remained below most practices’ existing levels of achievement. Raising the thresholds seemed to produce little response from the minority of low achieving practices affected by them, particularly in 2012-13 (figure). To date the change that has had the greatest effect on performance was the increase in the threshold for CHD10 (prescription of β blockers for patients with coronary heart disease) from 50% to 60% in 2006-07. In 2005-06, 1748 practices (21%) reported achievement rates of between 50% and 60%. Mean reported achievement for these practices, which had increased by just 0.3 percentage points in the year before the threshold increase (2004-05 to 2005-06), increased by 11.1 percentage points in the year after (from 55.7% to 66.8%). However, actual prescribing rates increased by just 1.9 percentage points; the apparent improvement in performance was largely the result of practices excluding more patients (exception reporting rates increased from 15.8% to 25.8%).

Assessment of the effects of threshold changes has been further hindered by the fact that they are applied to all UK practices, preventing comparative analysis. However, in 2006-07, a natural experiment arose when the upper threshold for immunising patients with coronary heart disease against influenza was raised from 85% to 90%, while thresholds for immunising patients with three other conditions (chronic obstructive pulmonary disease, diabetes, and stroke) remained unchanged. The small increase in the upper threshold was associated with a 0.41 percentage point increase in immunisation rates and a 0.26 percentage point increase in exception reporting rates for patients with coronary heart disease, relative to increases for patients with other conditions. Increases in immunisation rates were greater for practices with lower baseline performance.21

Conclusions

QOF was introduced to deal with an existing emphasis on volume rather than quality of care that ran “counter to general practitioners’ professionalism, the interests of the NHS, and the interests of patients.”14 However, performance thresholds were set relatively low and not adequately adjusted, allowing most practices to secure close to maximum remuneration despite wide variations in performance. Because payments are adjusted for list size and disease prevalence, QOF effectively became a weighted capitation payment with minimum quality requirements, rather than a discriminatory quality payment. The NHS now has two choices: retreat from financial incentives and secure value by alternative means or reinvigorate pay for performance. Recent contract changes move in both directions at once; reallocating a quarter of QOF funding to core capitation payments reduces the share of remuneration linked to quality measures, while setting upper thresholds at the 75th percentile of achievement creates a more discriminating quality scheme.

Future decisions on reforming pay for performance should be based on assessments of the benefits and costs of incentive programmes. This will require not only clinical evidence on individual quality indicators but evidence on how incentive structures affect provider behaviour and patient outcomes. Opportunities to generate evidence on the effects of changing performance thresholds have repeatedly been missed in QOF, with arbitrary changes simultaneously applied to all practices. Given that the framework directly affects the care of over 12 million patients and represents an investment of over £1bn a year, the need to trial changes to incentive structures is as great as the need to trial clinical interventions. In the absence of trials, causal effects are best estimated by exploiting natural experiments, and the phasing of the latest threshold changes over two years will, at least, provide such an opportunity.

Key messages

  • Targets in pay for performance programmes must be adjusted to provide continuing incentives for high quality care

  • For the first nine years of the UK’s Quality and Outcomes Framework targets were set arbitrarily and improvements stalled

  • Attempts to calibrate targets against historical performance were compromised during contractual negotiations

  • The universal introduction of changes has made it difficult to assess their effects

  • Greater consideration needs to be given to generating evidence on efficacy when implementing pay for performance programmes

Notes

Cite this as: BMJ 2014;348:g1595

Footnotes

  • Contributors and sources: TD, EK, DR, and MS conduct research on the intended and unintended consequences of incentive schemes in healthcare, and between 2009 and 2012 worked as external academic contractors supporting the National Institute of Health and Care Excellence in the development of indicators for the Quality and Outcomes Framework. As part of this work they provided recommendations to the QOF indicator advisory committee on alternative methods for setting performance thresholds. AR conducts research on pay for performance and public quality reporting in healthcare, and the impact of incentive schemes on inequalities and discrimination in healthcare settings. TD and AR designed the paper; EK extracted the data; TD wrote the manuscript; and EK, DR, MS and AR edited the manuscript. TD is the guarantor of the article. The data we used for original analyses are freely available and we have provided references and links in the manuscript.

  • Competing interests: All authors have read and understood the BMJ Group policy on declaration of interests and declare the following: TD was supported by a National Institute for Health Research career development fellowship, EK was partly supported by a NIHR school of primary care research fellowship, and AR was supported by a career development award from the Agency for Healthcare Research and Quality. From 2009 to 2012 TD, EK, DR, and MS held external contracts with NICE for work on setting thresholds and removing indicators from the Quality and Outcomes Framework.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

References

View Abstract