Abstract
Background Patients with myeloma experience substantial delays in their diagnosis, which can adversely affect their prognosis.
Aim To generate a clinical prediction rule to identify primary care patients who are at highest risk of myeloma.
Design and setting Retrospective open cohort study using electronic health records data from the UK’s Clinical Practice Research Datalink (CPRD) between 1 January 2000 and 1 January 2014.
Method Patients from the CPRD were included in the study if they were aged ≥40 years, had two full blood counts within a year, and had no previous diagnosis of myeloma. Cases of myeloma were identified in the following 2 years. Derivation and external validation datasets were created based on geographical region. Prediction equations were estimated using Cox proportional hazards models including patient characteristics, symptoms, and blood test results. Calibration, discrimination, and clinical utility were evaluated in the validation set.
Results Of 1 281 926 eligible patients, 737 (0.06%) were diagnosed with myeloma within 2 years. Independent predictors of myeloma included: older age; male sex; back, chest and rib pain; nosebleeds; low haemoglobin, platelets, and white cell count; and raised mean corpuscular volume, calcium, and erythrocyte sedimentation rate. A model including symptoms and full blood count had an area under the curve of 0.84 (95% CI = 0.81 to 0.87) and sensitivity of 62% (95% CI = 55% to 68%) at the highest risk decile. The corresponding statistics for a second model, which also included calcium and inflammatory markers, were an area under the curve of 0.87 (95% CI = 0.84 to 0.90) and sensitivity of 72% (95% CI = 66% to 78%).
Conclusion The implementation of these prediction rules would highlight the possibility of myeloma in patients where GPs do not suspect myeloma. Future research should focus on the prospective evaluation of further external validity and the impact on clinical practice.
INTRODUCTION
Myeloma is the second most common haematological malignancy.1 In the UK the 1-year survival rate is 82.7%, 5-year survival is 52.3%, and 10-year survival is 29.1%.2 Myeloma mainly affects older people, with a median age at diagnosis of around 70 years.3,4 Delays in myeloma diagnosis are common: 50% of patients with myeloma experience an interval of >3 months between first presentation to primary care with a myeloma-related symptom and diagnosis, and they consult ≥3 times in primary care before referral to secondary care.5,6 Delays in diagnosis are associated with advanced-stage myeloma at diagnosis, complications, reduced disease-free survival, and poor patient-reported outcomes.7–9
Symptoms alone are poorly predictive of myeloma in primary care because the symptoms associated with myeloma are non-specific and common in patients without myeloma. While GPs may not think to investigate myeloma in patients with non-specific symptoms, they often order simple laboratory tests. When symptoms are combined with blood test abnormalities such as low haemoglobin, raised calcium, or raised creatinine or inflammatory markers, the risk of myeloma increases10,11 and the National Institute for Health and Care Excellence recommends definitive cancer investigation.12 Furthermore, certain blood tests such as low haemoglobin can be observed up to 2 years before a myeloma diagnosis, providing a potential window of opportunity for earlier diagnosis.10,11
Clinical prediction tools for myeloma are quite limited. Currently, the only one that exists in primary care is based on a Clinical Practice Research Datalink (CPRD) study, in which the authors report the positive predictive values of single/paired symptoms and investigations.10 Another study generated a prediction rule that could be useful for myeloma, but it was developed in hospitalised patients and the outcome was not confirmed diagnosis of myeloma but abnormal serum/urine protein electrophoresis.13 The aim of this study, therefore, was to develop novel prediction rules that combine symptoms and blood tests to identify people attending primary care who are at increased risk of myeloma, with a focus on the most commonly requested blood test group in primary care, the full blood count (FBC).
METHOD
A retrospective open cohort study was conducted using electronic health records data from the CPRD, a representative primary care database that includes 11.3 million patients from 674 practices in the UK.14 People were included in the study if they were aged ≥40 years, had been registered with their practice for at least 1 year, and had at least two FBC tests recorded within 1 year (at least one FBC component recorded: haemoglobin, mean corpuscular volume [MCV], platelets, or white cell count) between 1 January 2000 and 1 January 2014, and for whom a minimum-follow up of 2 years was available. The start of the follow-up was from the date of the second FBC test (index date).
Multiple myeloma is a haematological cancer in which 50% of patients experience symptoms for at least 3 months before diagnosis and have multiple consultations in primary care before referral to secondary care. Symptoms on their own are not predictive enough to suggest referral and they have to be combined with abnormalities in blood tests. The authors of the present study developed two clinical prediction rules that combine patient characteristics, symptoms, and common blood tests to identify patients at high risk of having undiagnosed myeloma. The study found that the prediction rules were shown to have good discrimination, and have the potential to reduce the delays observed in the diagnosis of myeloma. |
Patients who had been diagnosed with myeloma or monoclonal gammopathy of undetermined significance (MGUS) before the index date were excluded from the study. Patients with MGUS were excluded as they are usually monitored quite closely for progression to symptomatic myeloma because their risk of progressing is approximately 1% per year, which is markedly higher than the baseline risk in the population.15 End of follow-up was the earliest of 2 years’ follow-up or myeloma diagnosis.
Predictors and outcome
Possible predictors for myeloma were identified from the literature including demographics (age, sex, and body mass index [BMI]), symptoms (back, chest, bone, rib, and joint pain, shortness of breath, recurrent chest infections, fatigue, nosebleeds, bruising, fracture, weight loss, and nausea), and blood test results (FBC components, inflammatory markers: erythrocyte sedimentation rate [ESR], C-reactive protein (CRP), and plasma viscosity (PV), calcium, and creatinine). Myeloma was defined as a new diagnosis of myeloma within 2 years of the index date using a code in the electronic health records (Read code).
Sample size
Twenty or more events per variable is adequate to eliminate bias in Cox models when there are many low prevalence predictor variables.16 With 25 candidate predictor variables and an event rate of 20 events per variable, it was estimated that 500 events were necessary for the derivation dataset. Validation datasets should ideally be ≥200.17
Statistical analysis
The dataset was split into derivation and validation sets based on English geographical region. Two-thirds were assigned to the derivation dataset and one-third to the validation dataset.14 Descriptive statistics were used to summarise the baseline characteristics, predictor variables, and outcomes. Diagnostic accuracy measures (sensitivity, specificity, positive and negative likelihood ratios, and positive and negative predictive values) were calculated for individual and combined symptoms, and for blood test results.
Multiple imputation was used to address missing data. Ten imputations were created for the derivation and external validation datasets separately. Imputation models contained all the predictors, the binary indicator for the outcome, and the cumulative baseline hazard estimated by the Nelson–Aalen estimator.18 Continuous variables were centred and rescaled to help with the convergence of the models. Fractional polynomials were used to identify the optimal functional form of continuous variables: BMI, age, and blood test results.19 Univariable analysis was used a priori to select the inflammatory marker with the highest hazard ratio for inclusion in the multivariable analysis, as inflammatory marker results are highly correlated. In sensitivity analyses blood test results were classified as normal/abnormal (instead of modelling continuously) depending on the reference range provided by the local laboratory. Normocytic anaemia was defined as low haemoglobin and normal MCV. Macrocytic anaemia was defined as high MCV and low haemoglobin.
Model derivation
Starting with the following variables: demographics (age, sex, and BMI), symptoms (back, chest, bone, rib, and joint pain, shortness of breath, recurrent chest infections, fatigue, nosebleeds, bruising, fracture, weight loss, and nausea), and blood test results (FBC components, inflammatory markers — ESR, CRP, and PV — calcium, and creatinine), the mfpmi command in Stata (version 14) was used to select variables for inclusion in Cox proportional hazards models using backwards elimination with a 5% inclusion.20 For the derivation, multivariable Cox proportional hazards models were fitted as follows:
FBC model: demographics, symptoms, and FBC components;
FBC change model: demographics, symptoms, and the absolute change between the index FBC test and previous FBC test; and
all-test model: demographics, symptoms, and all tests currently used for myeloma diagnosis (FBC components, calcium, creatinine, and inflammatory markers).
External validation
The baseline survival function was calculated at 2 years using Kaplan–Meier estimates and combined with the regression coefficients to derive the final equations. These equations were used to predict the probability of myeloma in the external validation dataset. Model performance was examined in terms of calibration, discrimination, and clinical usefulness. Calibration was assessed by the use of calibration plots, and discrimination using the R2 statistic, the D statistic, and the area under the curve (AUC).21,22 Decision curve analysis was used to compare the clinical utility of the models.23 Diagnostic accuracy measures were then estimated for the various cut-offs of myeloma probability.
RESULTS
A total of 1 281 926 patients were included, with a mean age of 63.7 years (SD 13.8), of whom 41.1% were male. The derivation and validation sets were comparable in terms of age, sex, risk factors, symptoms, and blood tests (see Supplementary Table S1 for details). A total of 737 incident myeloma cases (0.06%) were diagnosed within 2 years: 495 (0.06%) in the derivation set, and 242 (0.05%) in the validation set.
Symptoms and blood tests
The most common symptoms recorded for myeloma patients were back pain (19.0% versus 9.4% in non-myeloma) and chest pain (11.3% versus 6.4% in non-myeloma) (Table 1). Anaemia (irrespective of type) was the most common abnormality observed in the FBC (58.1% compared with 15.3% in nonmyeloma) and high MCV with prevalence of 20.0% compared with 6.9% in non-myeloma. Of the inflammatory markers, ESR was most frequently abnormal in patients with myeloma (80.1%). Inflammatory markers, calcium, and creatinine had the highest fractions of missing data. In the derivation dataset, 46.4% of patients with myeloma had anaemia at both tests, with the average time between the two abnormal tests being 2 months. The median time to myeloma diagnosis from the index test was 5.6 months (interquartile range = 1.6 to 15.7) (data not shown).
Table 1. Descriptive statistics for derivation dataset
Prediction model derivation and validation
Back pain, chest pain, rib pain, nosebleeds, and all FBC parameters were selected for inclusion in both the FBC and all-test model (Table 2). The FBC-change model was dropped because FBC change parameters were not selected for inclusion in the final model. In the validation dataset, the FBC model had an AUC of 0.84 (95% confidence interval [CI] = 0.81 to 0.87) (Figure 1) and the all-test model had an AUC of 0.87 (95% CI = 0.84 to 0.90) (Figure 2). The D statistic values were 2.3 (95% CI = 2.1 to 2.5) and 2.7 (95% CI = 2.4 to 2.9) for the FBC model and all-test model, respectively. Similarly, R2 values were 0.56 (95% CI = 0.51 to 0.60) and 0.62 (95% CI = 0.58 to 0.67) for the FBC model and all-test model, respectively. For reference, D statistic values of 0 correspond to a model with an AUC of 0.5, while values ≥3 correspond to models with an AUC >0.9.24 Calibration plots showed good agreement between predicted and observed risk in both the FBC and all-test model (Figures 1 and 2). However, the all-test model under-predicted myeloma risk in the highest decile (Figure 2).
Table 2. Adjusted hazard ratios (95% CI) for the final models for myeloma
Figure 1. Calibration and discrimination of full blood count model.
AUC = area under curve.
Figure 2. Calibration and discrimination of all-test model.
AUC = area under curve.
Diagnostic accuracy and comparison of different diagnostic approaches
Table 3 presents diagnostic accuracy measures for symptoms, blood tests, their combinations, and a range of predicted myeloma probability. Anaemia (irrespective of type) had a sensitivity of 56% (95% CI = 49% to 63%), a specificity of 83% (95% CI = 83% to 84%), and a positive predictive value of 0.18% (95% CI = 0.15% to 0.21%). The FBC and the all-test clinical prediction rules, using the 90th percentile of the predicted probability, resulted in sensitivities of 62% (95% CI = 55% to 68%) and 72% (95% CI = 66% to 78%), respectively, specificities of 90% (95% CI = 90% to 90%) for both models, and positive predictive values of 0.34% (95% CI = 0.29% to 0.40%) and 0.40% (95% CI = 0.34% to 0.47%), respectively. Decision curve analysis showed that, independently of which threshold is used for the models, decisions made using the prediction models result in fewer false positives and more true positives when compared with single tests or symptoms (see Supplementary Figure S1 for details).
Table 3. Comparison of different diagnostic approaches in the validation cohort (after performing imputation)a
Table 4 shows the performance of different diagnostic approaches assuming a population of 100 000 tested patients. The FBC model at the 90th percentile threshold of risk (0.12%) would result in 270 false alarms to one myeloma case diagnosed and in one missed myeloma case to 3910 true negatives. Comparatively, investigating based on anaemia would result in 500 false alarms per myeloma case and in one missed myeloma case to 3190 true negatives. Overall, the number of false positives will be lower using the rule at almost all thresholds compared with all other approaches that use single symptoms or tests. High calcium, low platelets, and low white cell count had very high specificity values (>95%) but low sensitivity values (<10%) (Table 3), which results in few false positives but many missed cases of myeloma.
Table 4. Performance of the different diagnostic approaches in a population of 100 000 tested individuals based on the validation cohort measures
DISCUSSION
Summary
In this study, two clinical prediction models were generated to predict myeloma risk and raise the suspicion of myeloma in patients who are tested with FBC and for whom the GP does not necessarily suspect myeloma. The FBC model includes age, sex, back, chest, and rib pain, nosebleeds, and components of the FBC. The all-test model also includes ESR and calcium. The two models were validated in patients attending primary care. Both models discriminated well between people with and without myeloma, but the FBC model was better calibrated than the all-test model. Choosing to investigate people classified in the top decile of predicted myeloma risk (0.12%) would lead to fewer false alarms for each case of myeloma investigated compared with selecting people based on symptoms or blood test abnormalities alone.
Strengths and limitations
These prediction models are immediately relevant to myeloma diagnosis in primary care as they were developed using data from routinely collected primary care records. A split-sample approach, based on geographical region, allowed for meaningful validation and increases the likelihood of model reproducibility in other datasets from primary care. By assessing discrimination, calibration, the performance of the models at different thresholds, and comparing them with single-test approaches using diagnostic accuracy measures and decision curve analysis, this study has demonstrated the benefits of using a prediction modelling approach over decision rules based on symptoms or tests alone.
This study has several limitations. The population was selected based on two FBCs in order to assess whether change between two FBC components was predictive of myeloma. The change variables were not significant, which could be attributed to the fact that in many cases blood test abnormalities were being detected at both tests, that is, patients were presenting with anaemia on multiple occasions. This is likely because abnormalities in the first test are correlated with the likelihood of having a second test. An English study found that 23.5% of primary care patients aged >65 years had two FBCs over a period of 2 years, suggesting that patients in the current study are a selected population and more likely to represent a sicker population.25 It has also been shown that patients who have blood tests are more likely to have cancer.26 Following abnormalities in the initial FBC, 46.4% of patients with myeloma could have been picked up if investigated at that timepoint. The prediction rules can be applied to this population in order to identify which patients should be further investigated for myeloma at the time of the first test, thus shortening the diagnostic process. The prediction rules developed in this study should be validated further in different populations, such as patients receiving one FBC rather than two, and potentially in other countries in order to confirm their generalisability.
As coding in routine health records is done for clinical purposes, it is influenced by the variability in history taking and recording behaviour between primary care clinicians. It is likely that patients do not report all of their symptoms and also that GPs may only record the symptom(s) they consider the most relevant, especially for myeloma symptoms, which are often quite vague and low risk. To what extent this happens in practice and how it affects the accuracy of the prediction models is unclear.
The all-test model had a large proportion of missing data because calcium and ESR recordings were only available for a small number of patients. This meant that only 8% of the whole sample would be included in a complete case analysis. Multiple imputation was used to avoid limiting the analysis but the reason for missingness may have not been accounted for in the imputation model. Furthermore, the number of imputations might not have been sufficient given the large fraction of missing data, but the large sample size meant that additional imputations would have been computationally prohibitive.
Finally, there was no linkage with Hospital Episode Statistics data or cancer registry data, thus there is a lack of formal outcome ascertainment. The accuracy, quality, and completeness of CPRD data has been validated previously.27
Comparison with existing literature
To the authors’ knowledge, this is the largest retrospective open cohort study to develop prediction rules for myeloma in primary care. The prediction models in this study perform similarly to established prediction rules for cancer.28,29 The findings in the current study regarding the utility of a normal ESR and normal haemoglobin for ruling out myeloma confirm those of previous primary care studies.10,11
Implications for research and practice
This study presents the diagnostic accuracy of multiple thresholds of predicted myeloma risk to illustrate rule-in and rule-out approaches by maximising specificity or sensitivity. The authors recommend selecting a threshold with a specificity >90%, such as the 90th percentile of the FBC model, leading to more true positives and fewer false positives compared with other approaches, such as acting on anaemia alone. More specifically, at the 90th percentile threshold of risk, the rule would diagnose an extra 18% of patients compared with normocytic anaemia and an extra 6% of patients compared with anaemia of any type (normocytic, microcytic, or macrocytic), with fewer false positives (estimated based on data in Table 3). While other blood tests such as calcium have higher specificity, resulting in fewer false positives, their sensitivity is much lower, meaning that many cancers would be missed. Previous studies have shown that hypercalcaemia develops later in disease progression; thus, while predictive of myeloma, it is less useful for detecting myeloma early.11 The median time to myeloma diagnosis from the index test (second FBC) is 5.6 months (interquartile range = 1.6 to 15.7), suggesting that the prediction rules have the potential to reduce diagnostic delays by a substantial amount.
The prediction rules devised in this study are able to raise the suspicion of myeloma in patients who are regularly tested with FBC either for monitoring purposes or as part of a diagnostic process. Patients who are flagged as being at high risk of having myeloma can be tested with serum and urine protein electrophoresis in primary care, and abnormalities in these tests should result in a haematology referral. Nonetheless, myeloma can be missed even with the use of a prediction rule, subject to the decision threshold that is used and the corresponding sensitivity, so in these patients other follow-up tests could potentially be used, such as ESR or PV, if it is indicated by the clinical presentation of the patient.
As these prediction rules are complex scoring systems, they require software. This could be a web-based calculator or could be integrated within the electronic health records of general practices to trigger alerts to GPs about patients with a high predicted risk of myeloma, or to the local laboratory to automatically process or request a myeloma screen. Electronic trigger interventions have been shown to reduce diagnostic delays in colorectal and prostate cancer.30 Future research should explore the feasibility of such a tool, identify and explore the different barriers that might prevent its implementation, and establish its acceptability. Impact studies are recommended to explore the effect of the prediction rule on the diagnostic pathway and on important outcomes such as stage at diagnosis and survival.
Notes
Funding
This study presents work carried out as part of a DPhil scholarship awarded to Constantinos Koshiaris funded by the Primary Care Research Trust, the University of Oxford, and the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research (CLAHRC) Oxford. The research was supported by NIHR CLAHRC Oxford at Oxford Health NHS Foundation Trust. Constantinos Koshiaris is currently supported by a Wellcome Trust/Royal Society Sir Henry Dale Fellowship (Grant number: 211182/Z/18/Z). Jason L Oke is part-funded from the NIHR Oxford Biomedical Research Centre (BRC), Oxford University Hospitals NHS Foundation Trust. FD Richard Hobbs acknowledges part-funding from the NIHR School for Primary Care Research, the NIHR Applied Research Collaboration (ARC) Oxford, the NIHR Oxford BRC (University Hospital Trust), and the NIHR Oxford Medtech and In-Vitro Diagnostics Co-operative. Sarah Lay-Flurrie is part-funded by
the NIHR Oxford BRC and NIHR ARC Oxford and Thames Valley. Brian D Nicholson is funded by the NIHR. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care.
Ethical approval
The study was approved by the Clinical Practice Research Datalink (CPRD) Independent Scientific Advisory Committee (protocol number: 17_088).
Provenance
Freely submitted; externally peer reviewed.
Competing interests
The authors have declared no competing interests.
Discuss this article
Contribute and read comments about this article: bjgp.org/letters