Article Text

Download PDFPDF

The diagnostic performance of scoring systems to identify symptomatic colorectal cancer compared to current referral guidance
  1. Tom Marshall1,
  2. Robert Lancashire1,
  3. Debbie Sharp2,
  4. Tim J Peters2,
  5. K K Cheng1,
  6. William Hamilton2
  1. 1Department of Public Health and Epidemiology, University of Birmingham, Edgbaston, Birmingham, UK
  2. 2Academic Unit of Primary Health Care, Department of Community Based Medicine, University of Bristol, Bristol, UK
  1. Correspondence to Dr Tom Marshall, Department of Public Health and Epidemiology, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK; t.p.marshall{at}bham.ac.uk

Abstract

Objectives To determine the discrimination characteristics of a new algorithm and two existing symptom scoring systems for identification of patients with suspected colorectal cancer.

Design Derivation of algorithm by a case–control study and assessment of discrimination characteristics using receiver operating characteristics (ROC) curves. Three colorectal cancer scoring systems were investigated. The Bristol–Birmingham (BB) equation, which we derived from a large primary care dataset; the CAPER score, previously derived from a primary care case–control study and a symptom score derived from National Institute of Clinical Excellence (NICE) guidance for urgent referral of symptomatic patients. Their discrimination characteristics were investigated in two datasets: the BB derivation dataset and the CAPER score derivation dataset. The main analyses were ROC curves and the areas under them for all three algorithms in both datasets.

Setting Electronic primary care databases.

Main outcome measures Diagnosis of colorectal cancer.

Results In the BB dataset, areas under the curve were: BB equation 0.83 (95% CI 0.82 to 0.84); CAPER 0.79 (95% CI 0.79 to 0.80); the NICE guidelines 0.65 (95% CI 0.64 to 0.66). In the CAPER dataset, areas under the curve were: BB 0.92 (95% CI 0.91 to 0.94); CAPER 0.91 (95% CI 0.89 to 0.93); NICE guidelines 0.75 (95% CI 0.72 to 0.79). In subjects under 50 the discrimination characteristics of NICE referral guidelines were no better than chance.

Conclusions Both multivariable symptom scoring systems performed significantly better than NICE referral guidelines.

  • Colorectal cancer
  • diagnostic characteristics
  • symptoms
  • primary healthcare
  • primary care

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known about this subject?

  • Despite the availability of screening, the majority of colorectal cancers will continue to be diagnosed after presentation with symptoms.

  • Selection of patients for further investigation depends on combining information from a number of symptoms and signs.

  • There is guidance and symptom scoring systems (NICE guidance and the CAPER score) to help primary care physicians identify patients for further investigation but these have not been evaluated.

What are the new findings?

  • Both the new Bristol–Birmingham equation and the CAPER score are markedly better at discriminating between patients with and without colorectal cancer than current NICE guidelines.

  • In patients aged under 50 years current NICE guidelines for urgent referral do not discriminate between patients with and without colorectal cancer.

  • Both the new Bristol–Birmingham equation and the CAPER score are more useful than current NICE guidelines at identifying patients for further investigation.

How might it impact on clinical practice in the foreseeable future?

  • Automated identification of patients with symptoms of suspected colorectal cancer from electronic primary care records using either the Bristol–Birmingham equation or the CAPER score could enhance general practitioners' ability to diagnose suspected colorectal cancer by more than following current NICE referral guidelines.

Introduction

Colorectal cancer remains an important cause of death in the UK. Poorer survival rates in international comparisons may be influenced by later presentation.1–3 Delays in presentation to medical care and diagnosis are well recognised.4 Despite introduction of a national screening programme in England for those aged 60 to 69, the majority of cancers will continue to be diagnosed clinically because most cancers occur after this age, some people decline screening, and screening does not detect all cancers.5

Diagnosis of colorectal cancer is difficult because the condition is relatively uncommon in primary care and the symptoms are also features of more common, benign conditions. A typical full-time general practitioner will diagnose only one new case annually.6 Colonoscopy is the main diagnostic test for suspected colorectal cancer, but this is an uncomfortable procedure, requires referral and has a low rate of important complications.

There are a number of different approaches to helping general practitioners select patients for further investigation. Single symptoms have a low specificity for colorectal cancer, but symptom pairs may have more useful test characteristics.7 8 The National Institute for Clinical Excellence (NICE) published national Referral Guidelines for Suspected Cancer in 2000, and updated these in 2005.9 These use an algorithm based on age and the presence of certain clinical features. However, the guidelines concentrate on typical presentations of cancer; it has been argued that they may even delay diagnosis in patients with atypical presentations.10 Although the guidelines do not recommend referral of patients with constipation or abdominal pain, these features are clearly associated with cancer.5 10 It is possible that current referral guidance will reinforce the finding that diagnostic delay is most common in patients who present with change of bowel habit.11

The CAPER score is a risk scoring system using multiple presenting symptoms.12 It was derived from a primary care based case–control study in a single primary care trust.13 14 The cases were 349 colorectal cancers diagnosed in persons aged over 40 in Exeter, UK, between 1998 and 2002. Five age-, sex- and practice-matched controls were obtained for each case. In the CAPER scoring system, some clinical features—abnormal rectal examination, severe anaemia or rectal bleeding—are, on their own, considered sufficiently high risk to warrant investigation. The CAPER score itself is intended for use with patients without these high-risk features, but who have multiple low-risk symptoms. The score seeks to identify those at higher risk from this low-risk pool. The weaknesses of the CAPER study were that it was undertaken in a single geographical area, used paper-based records and was relatively small.12

This paper describes the derivation of the Bristol–Birmingham (BB) equation and compares its performance to the NICE referral guidelines and the CAPER score, using the two different datasets used to derive the BB equation and the CAPER score.

Materials and methods

Derivation of the BB equation: identification of cases, controls and variables

The BB equation was derived from data provided by The Health Improvement Network (THIN), a national database of electronic primary care records. THIN includes all consultations, prescriptions, diagnoses and primary care investigations for all patients in participating practices. There are 2.2 million currently active patients in over 300 practices, distributed across all regions of the UK. Only electronic primary care records within a period of acceptable mortality recording were included in the analysis.15

We have previously reported the derivation of risk estimates for colorectal cancer based on symptom pairs.16 The multivariable model described here is an extension of this. Cases were all patients aged 30 years or older with a diagnosis of colorectal cancer between January 2001 and July 2006 (data before this was excluded as direct laboratory transfer of haemoglobin values began around 2000). Seven controls per case, matched for practice, sex and age were selected using a computerised random number sequence (seven being the standard number offered in THIN database studies). Where possible, controls were matched to the same age in years as cases (this was possible for 96.4%): the remainder were matched within 1 year, 2 years etc., up to a maximum of 5 years. Only cases and controls with at least 2 years of electronic records before the date of diagnosis of the case (the index date) were used. For a small number of very old cases in small practices, fewer than seven controls could be found. THIN staff performed these stages.

Read codes for 24 clinical features of colorectal cancer were selected (list available from authors) and identified in the medical records. Only the 2 years before the index date were studied. A new prescription may be a proxy for a symptom, such as a laxative for constipation. Initially, symptoms and prescriptions for such symptoms were studied separately. For these symptoms and diagnoses, exposure required a Read code of an acute episode or new prescription within 2 years of the index date.

Weight loss was calculated from the most recent and previous weights and divided into two categories: ≥10% weight loss and 5% to 10% weight loss. Most patients did not have two recorded weights, so were labelled unknown. There is a specific Read code for weight loss—doctors had, at times, used it without recording an actual weight; it was studied separately. Anaemia was defined if the most recent record of haemoglobin was less than 11 g/dl in women and less than 12 g/dl in men.

Three risk markers (as opposed to diagnostic features) were also studied: diabetes, obesity and deprivation. For diabetes, patients were considered to be exposed if they had been diagnosed with diabetes at any time before the index date. Obesity was defined as a body mass index greater than 30 kg/m2 within 2 years of the index date. Each patient was allocated a deprivation quintile based on the Townsend score of their postcode. Irritable bowel syndrome is a potential misdiagnosis: we identified all patients with a record of this diagnosis at any time.

An initial quality control analysis was carried out to determine whether the relationships between the variables and colorectal cancer were stable over time and across practices. Univariable ORs were calculated by conditional logistic regression for all variables for each year of diagnosis and for each practice. This analysis identified large differences in variable recording (and in the univariable odds-ratios for some variables) before 2001. Pre-2001 data were therefore removed and all subsequent analyses carried out on 38 314 controls and 5477 cases diagnosed between January 2001 and July 2006 in a total of 317 practices. Their mean age was 70.6 years (range 30 to 105) and 53.1% were male. Demographic details of subjects are shown in table 1.

Table 1

Demographic details of cases and controls in the Bristol–Birmingham database

One variable was dropped from further analysis (a record of anaemia without a corresponding haemoglobin result) because the frequency of this diagnostic code consistently declined over time alongside a corresponding increase in the frequency of haemoglobin results. The OR for this variable also declined over the same period. Analysis of ORs across practices showed that no practices displayed extreme results. Fuller details of these quality control analyses are available from the authors.

Derivation of the BB equation: data analysis

Once the quality control was complete, the initial univariable conditional logistic regression analysis was carried out with the remaining 24 variables. Several variables were combined after initial analyses. As the ORs for diarrhoea, constipation and abdominal pain were similar to the ORs for their related prescriptions (antidiarrhoeals, laxatives and antispasmodics), the pairs of variables were combined. From now on when we use the terms diarrhoea, constipation or abdominal pain, we are referring to either the symptom or a new related prescription for the symptom. The ORs for weight loss without a recorded weight were similar to that for ≥10% weight loss; for a haemoglobin result >14 g/dl similar to that for no haemoglobin result; for a MCV >85 fl similar to that for no MCV result. These categories were combined.

Variables associated with colorectal cancer with a p-value ≤0.1 were entered into multivariable conditional logistic regression analyses. The model included the following variables: constipation, diarrhoea, change in bowel habit, flatulence, a diagnosis of irritable bowel syndrome, abdominal pain, rectal bleeding, haemoglobin concentration (in 1g/dl bands), microcytosis in two bands (mean cell volume <80 fl and 80–84.99 fl), weight loss (<5%, 5% to 9.9% and ≥10%), venous thrombosis or thromboembolism, diabetes and obesity. The first stage of the multivariable analysis included only symptoms in clinically related groups: intestinal motility symptoms included constipation, diarrhoea, change in bowel habit and flatulence; pain symptoms included irritable bowel syndrome and abdominal pain; bleeding symptoms included rectal bleeding, anaemia and microcytosis; systemic symptoms included weight loss and thromboembolism; obesity symptoms included diabetes and obesity. The next stage included all symptoms. Variables where the p-value was greater than 0.05 at any stage, including the final model, were excluded, though these were checked by adding them individually to the final model, using likelihood ratio testing.

Discrimination characteristics

To investigate the discrimination characteristics of the BB equation, individuals in the test dataset were allocated a score equal to their multivariable OR. Because patients with a positive faecal occult blood test, or an abnormal rectal examination, or an abdominal mass unquestionably qualify for further investigation they were allocated an arbitrary maximum score of 100 points in order to make them the highest priority for referral. In this way the score reflected relative priority for further investigation.

The CAPER scores of participants in this study were derived from the presence or absence of six features of colorectal cancer—constipation (25 points), diarrhoea (10), loss of weight (20), abdominal pain or tenderness (15), and one laboratory finding: low haemoglobin (10–11.9 g/dl 30 points; 12–12.9 g/dl 20 points). The CAPER system was derived for use in a population with only low-risk symptoms, with investigation suggested for scores of ≥35 points. Patients with abnormal rectal examination, severe anaemia (haemoglobin <10 g/dl) or rectal bleeding were considered to need referral, so were also allocated an arbitrary score of 100 (table 2).

Table 2

Coefficients for the Bristol–Birmingham (BB) equation, scores for CAPER and for three interpretations of NICE guidelines on urgent referral of patients with suspected colorectal cancer

The NICE guidelines offer a binary choice: urgent referral or no urgent referral, on the basis of a series of categorical variables: age over 40 years, age over 60 years, sex, menopausal status, diarrhoea (looser stools or increasing stool frequency) of 6 weeks' duration, rectal bleeding, abdominal mass, abnormal rectal examination and anaemia. Urgent referral is recommended for patients aged over 60 years with increased stool frequency or with rectal bleeding for 6 weeks; aged over 40 years with increased stool frequency and rectal bleeding for 6 weeks; with an abdominal mass or abnormal rectal examination; iron deficiency anaemia (Hb <11 mg/dl) in men; iron deficiency anaemia (Hb <10 mg/dl) in postmenopausal women.

In the main analysis (NICE 3) we interpreted NICE guidance as follows. The occurrence of a single consultation with change in bowel habit or two consultations with diarrhoea separated by more than 35 but fewer than 120 days were considered to indicate increased stool frequency for at least 6 weeks. Consultations with diarrhoea separated by more than 120 days are likely to be separate episodes. A single consultation with rectal bleeding was taken to indicate rectal bleeding for at least 6 weeks because repeated records of consultations with this symptom were very uncommon. We again assigned a score of 100 points for abdominal mass, positive faecal occult blood or abnormal rectal examination for consistency. We assigned one point for the following features in which urgent investigation is advised: diarrhoea plus rectal bleeding and aged over 40 years; rectal bleeding and aged over 60; diarrhoea and aged over 60; haemoglobin <11 g/dl with microcytosis in a man; and haemoglobin <10 g/dl with microcytosis in a postmenopausal woman (age>52 was taken as a proxy for being postmenopausal). The score thus rose with the number of qualifying symptoms (table 2). In the CAPER dataset, the mean cell volume was not available, so the requirement for microcytosis was dropped; therefore a haemoglobin <10 g/dl in a man was allocated one point.

Two sensitivity analyses used different interpretations of the NICE guidelines. In NICE 1, a single consultation with diarrhoea or change in bowel habit was taken to indicate diarrhoea for 6 weeks. In NICE 2, two consultations separated by more than 35 but fewer than 120 days but not change in bowel habit, were taken to indicate diarrhoea for 6 weeks.

The receiver operating characteristics of a prediction model are usually superior in the dataset from which it was derived. To avoid this ‘home advantage’, two data sets were used to assess the predictive power of the equations: the dataset used to derive the BB equation (described above) and the CAPER dataset. The CAPER dataset includes 349 colorectal cancer cases and 1744 age- and sex-matched controls.8 The mean age of cases was 71.9 years (range 40 to 96) and 50.1% of the dataset was male. The CAPER dataset was obtained by searching both paper and electronic primary care records for symptoms. In the CAPER dataset, weight loss was recorded as only present or absent; therefore this was taken to be equivalent to a >10% weight loss.

Receiver operating characteristics (ROC) curves were constructed and the area under the curve was determined for the three prediction models in both datasets. The large size of the THIN dataset allowed us to undertake extensive sensitivity analyses. We repeated ROC curves for men and women, for each year of diagnosis of cancer from 2001 to 2007 and in 10 year age bands. To determine whether allocating a ‘mandatory referral’ score to abdominal masses, positive faecal occult bloods or abnormal rectal examinations affected the findings, we excluded cases and controls with these features and repeated the analysis. Because the CAPER score was derived from persons aged 40 and over we also repeated the analysis in persons aged over 40.

Yield

We estimated the yield of colorectal cancers using these systems by calculating the positive predictive values (PPVs) at selected points of the ROC curves, using Bayes' theorem (posterior odds = prior odds×likelihood ratio).17 To compare the systems, we used the points on the three ROC curves with the same sensitivity. We derived the prior odds from national incidence rates for 2006, stratified by age and sex.18

Results

We identified 5477 cases and 38 314 controls in a total of 317 practices. Their mean age was 70.6 years (range 30 to 105) and 53.1% were male. Demographic details of subjects are shown in table 1.

Derivation of the Bristol–Birmingham equation

In the univariable analyses, positive faecal occult bloods (OR 24.5, 95% CI 5.1 to 118), abnormal rectal examination (101, 13.3 to 765.2) and abdominal mass (35.0, 20.8 to 58.9) were strongly associated with a diagnosis of colorectal cancer. (table 3) As these features warrant investigation per se they were dropped from further modelling. A family history of colorectal cancer was recorded in only seven cases and eight controls (OR: 6.13, 95% CI 2.22 to 16.9).

Table 3

Initial variables considered for inclusion in the Bristol–Birmingham predictive model and their frequency in cases and controls

Multivariable analysis was carried out using 13 variables, plus the deprivation quintile. In multivariable analyses none of flatulence, irritable bowel syndrome, diabetes, obesity, thromboembolism, or deprivation quintile was independently associated with cancer. The final model therefore included eight variables: constipation, diarrhoea, change in bowel habit, abdominal pain, haemoglobin concentration mean cell volume and weight loss (table 4).

Table 4

Results of multivariable conditional logistic regression analysis for the Bristol–Birmingham predictive model

Discrimination characteristics 1: in the Bristol–Birmingham dataset

Table 2 summarises the way in which scores were derived from the three equations. In the THIN dataset the area under the curve for the BB equation was 0.83 (95% CI 0.82 to 0.84) and for CAPER was 0.79 (95% CI 0.79 to 0.80). The area under the curve for the NICE 3 interpretation of the NICE guidelines was 0.65 (95% CI 0.64 to 0.66) and was consistently superior to NICE 1 and NICE 2. (Figure 1) Excluding patients under 40 made little difference, and excluding those with abdominal mass, positive faecal occult blood and abnormal rectal examination only made modest differences. In all analyses, the BB equation and CAPER score remained superior to NICE.

Figure 1

Receiver operating characteristics curves for a diagnosis of colorectal cancer in the THIN dataset using the Bristol–Birmingham equation, the CAPER score and three algorithms based on NICE guidelines.

For the BB equation and NICE guidelines the areas under the curve were similar in men and women. The CAPER score performed slightly better in women than men. The BB and CAPER scores performed similarly at all ages. However, no interpretation of the NICE guidelines performed better than chance at ages under 50. (Table 1) NICE guidelines performed best at age 80 to 89 years.

Discrimination characteristics 2: in the CAPER dataset

In the CAPER dataset, the areas under the curve were BB 0.92 (95% CI 0.91 to 0.94); CAPER 0.91 (95% CI 0.89 to 0.93); NICE 1 0.76 (95% CI 0.73 to 0.80). (Table 5) In this dataset the area under the curve for NICE 1 was greater than for NICE 3. (Figure 2)

Table 5

Areas under the curve in each age band and both sexes for a diagnosis of colorectal cancer using the Bristol–Birmingham (BB) equation, the CAPER score and three algorithms based on NICE guidelines

Figure 2

Receiver operating characteristics curves for a diagnosis of colorectal cancer in the CAPER dataset using the Bristol–Birmingham equation, the CAPER score and three algorithms based on NICE guidelines.

Yield

Using the more conservative estimate of discrimination characteristics derived from the THIN dataset, NICE 3 has a sensitivity of 0.327 and a specificity of 0.974, giving a positive likelihood ratio of 12.5 and a positive predictive value (yield) of 3.1% at age 70–74. A point on the CAPER and BB ROC curves for the same age was selected to have the same sensitivity. These points had the following characteristics: CAPER—positive likelihood ratio 13.4 and PPV 3.3%; BB—positive likelihood ratio 14.7 and PPV 3.7%. (table 6)

Table 6

Annual age specific incidence of colorectal cancer and positive predictive values if a patient meets the referral criteria (a positive test result)

Discussion

In both datasets the overall discrimination characteristics of the BB equation were consistently slightly better than those of the CAPER score and both were superior to any of the interpretations of current guidance. NICE guidelines performed no better than chance in subjects aged under 50.

Weakness and strengths

The performance of all methods of identifying colorectal cancer was better in the CAPER dataset than in the THIN dataset. The CAPER dataset has some probable advantages: cases were identified from the cancer registry and clinical features of colorectal cancer were identified from both paper and electronic records.12 The overall standardised incidence of cancer in THIN is consistent with cancer registry data, but there is some under-recording, probably as a result of misclassification of solid tumours. Standardised incidence ratios for colorectal cancer range from 0.69 to 0.84 between 2000 and 2006.19 Data linkage to cancer registry records could improve the recording of outcomes.

It is possible that symptoms may be more likely to be recorded in patients in whom cancer is suspected. The routinely recorded symptoms used in our analysis may therefore be stronger predictors than the presence of symptoms alone but this possibility makes no difference to the predictive power of routinely recorded symptoms. Furthermore, initial quality analysis confirmed the stability of recorded symptoms as predictors of colorectal cancer over time and between practices.

We did not use the ‘free text’ comments in the THIN records.20 This will have meant some symptom recording was missed, though there is no reason to believe this would be more common in cases than controls.12

Comparison with previous literature

NICE guidelines may perform less well than the BB equation and CAPER because they include fewer predictor variables, some of which only apply at certain ages. This means that NICE guidelines perform well for the minority of colorectal cancers with typical features (the first part of the ROC curve), but less well for the majority of cases with low risk but not no-risk symptoms.21 Variables absent from NICE guidelines include constipation, loss of weight and abdominal pain. Diarrhoea, rectal bleeding and anaemia are part of CAPER and BB at all ages, although they have an age restriction within NICE. The BB equation includes two further variables, microcytosis and change in bowel habit, and divides haemoglobin level and weight loss into subcategories. Change in bowel habit is an important predictor of colorectal cancer, and clearly doctors use this term differently to either diarrhoea or constipation. One criticism of NICE guidance is that only the minority of patients with colorectal cancer have a high risk symptom before diagnosis, with the majority experiencing constipation, diarrhoea or abdominal pain (or a combination of these).5 Thus, it is not surprising that NICE fails to identify such patients, and that survival from colorectal cancer has improved little since they were published.

Other referral guidelines and symptom scoring systems might be investigated in a similar way. For example, the Selva score was derived from patients referred to secondary care for investigation and makes use of a consultation questionnaire to elicit symptoms.22 It has been reported to have an area under the curve of 0.76 in a population of patients referred for investigation of suspected colorectal cancer.23 However, when the consultation questionnaire was administered to a sample of persons aged 50 to 80 it identified 12% as eligible for urgent referral.24 This underlines the difference in predictive power of symptoms elicited from healthy subjects and symptoms recorded by general practitioners following consultation and the value of using recorded symptoms in this analysis.

Our analysis demonstrates that electronic primary care datasets are an invaluable resource for investigating the discrimination characteristics of referral guidelines and therefore of informing guideline or referral recommendations. They are also a resource for deriving such recommendations. Both multivariable models investigated in this analysis showed much better discrimination characteristics than current NICE guidelines. Using either model to routinely analyse electronic primary care records therefore has the potential to significantly add to general practitioners' ability to identify patients with suspected colorectal cancer. There is a strong case for investigating the utility of such approaches in the diagnosis of colorectal cancer.

Acknowledgments

We wish to thank THIN staff, especially Mary Thompson, for their help and diligence.

References

Footnotes

  • Funding Project funding was received from Cancer Research UK, and sponsorship by the University of Bristol. CRUK reference number: C12345/A7502. Neither body had a role in the study design, data collection, analysis or writing of the report. The researchers are independent of the funding body.

  • Competing interests None.

  • Ethics approval This study was conducted with the approval of the London MREC; 06/MRE02/75.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Linked Articles

  • Digest
    Emad El-Omar Alexander Gerbes William Grady