Predicting asthma-related crisis events using routine electronic healthcare data: a quantitative database analysis study

Background There is no published algorithm predicting asthma crisis events (accident and emergency [A&E] attendance, hospitalisation, or death) using routinely available electronic health record (EHR) data. Aim To develop an algorithm to identify individuals at high risk of an asthma crisis event. Design and setting Database analysis from primary care EHRs of people with asthma across England and Scotland. Method Multivariable logistic regression was applied to a dataset of 61 861 people with asthma from England and Scotland using the Clinical Practice Research Datalink. External validation was performed using the Secure Anonymised Information Linkage Databank of 174 240 patients from Wales. Outcomes were ≥1 hospitalisation (development dataset) and asthma-related hospitalisation, A&E attendance, or death (validation dataset) within a 12-month period. Results Risk factors for asthma-related crisis events included previous hospitalisation, older age, underweight, smoking, and blood eosinophilia. The prediction algorithm had acceptable predictive ability with a receiver operating characteristic of 0.71 (95% confidence interval [CI] = 0.70 to 0.72) in the validation dataset. Using a cut-point based on the 7% of the population at greatest risk results in a positive predictive value of 5.7% (95% CI = 5.3% to 6.1%) and a negative predictive value of 98.9% (95% CI = 98.9% to 99.0%), with sensitivity of 28.5% (95% CI = 26.7% to 30.3%) and specificity of 93.3% (95% CI = 93.2% to 93.4%); those individuals had an event risk of 6.0% compared with 1.1% for the remaining population. In total, 18 people would need to be followed to identify one admission. Conclusion This externally validated algorithm has acceptable predictive ability for identifying patients at high risk of asthma-related crisis events and excluding those not at high risk.


INTRODUCTION
The challenge of reducing unplanned hospital admissions and avoidable deaths for common chronic conditions, such as asthma, remains unresolved. Despite effective treatments, evidence-based guidelines, 1 and financially incentivised community-based chronic disease management (via the Quality and Outcomes Framework 2 ), each year in the UK an average of 1500 people die 3 (on average, 3 a day) and 93 000 are hospitalised due to asthma. 4 A total of 5.4 million people in the UK are currently receiving treatment for asthma: 1.1 million children (1 in 11) and 4.3 million adults (1 in 12). 3 Identification of those at increased risk of these events is beneficial both at an individual level to tailor disease management, and at a population level to inform and modify processes of care.
Many risk factors for poor asthma outcomes have been identified, 5-8 some of which have been combined into risk algorithms, including: Asthma UK's Asthma Attack Risk Checker tool; 9 the Asthma Disease Activity Score; 10 and wheeze frequency, admissions, reliever use, and step on British Thoracic Society medication guidelines (WARS) score. 11 Recently, an algorithm has also been developed to identify children at risk of life-threatening asthma. 12 These have been derived from small datasets, including those from clinical trials, or the variables used in the prediction tools have required up-to-date personal characteristics, including psychosocial characteristics or adherence to medication for which comprehensive data are difficult to obtain in large populations. 13 An algorithm to identify patients at greatest risk of poor outcomes using electronic healthcare data would overcome this problem and enable a register of patients at high risk to be generated efficiently.
Most prediction algorithms define a severe asthma attack as one that requires oral corticosteroid therapy or hospital attendance/admission; 14 however, this composite scoring includes variables that are not necessarily colinear. Early treatment with prednisolone may stop the deterioration and prevent an accident and emergency (A&E) attendance and, as such, this composite definition may mask the benefits of prompt management of an attack, with increased prednisolone treatment and reduced hospitalisations; 13

Abstract Background
There is no published algorithm predicting asthma crisis events (accident and emergency [A&E] attendance, hospitalisation, or death) using routinely available electronic health record (EHR) data.

Aim
To develop an algorithm to identify individuals at high risk of an asthma crisis event.

Design and setting
Database analysis from primary care EHRs of people with asthma across England and Scotland.

Method
Multivariable logistic regression was applied to a dataset of 61 861 people with asthma from England and Scotland using the Clinical Practice Research Datalink. External validation was performed using the Secure Anonymised Information Linkage Databank of 174 240 patients from Wales. Outcomes were ≥1 hospitalisation (development dataset) and asthma-related hospitalisation, A&E attendance, or death (validation dataset) within a 12-month period.

Results
Risk factors for asthma-related crisis events included previous hospitalisation, older age, underweight, smoking, and blood eosinophilia. The prediction algorithm had acceptable predictive ability with a receiver operating characteristic of 0.71 (95% confidence interval [CI] = 0.70 to 0.72) in the validation dataset. Using a cut-point based on the 7% of the population at greatest risk results in a positive predictive value of 5.7% (95% CI = 5.3% to 6.1%) and a negative predictive value of 98.9% (95% CI = 98.9% to 99.0%), with sensitivity of 28.5% (95% CI = 26.7% to 30.3%) and specificity of 93.3% (95% CI = 93.2% to 93.4%); those individuals had an event risk of 6.0% compared with 1.1% for the remaining population. In total, 18 people would need to be followed to identify one admission. such, it is important to develop algorithms that identify these two risks separately.
The authors aimed to develop and validate a prediction tool to identify individuals at high risk of an asthma-related crisis event (A&E attendance, hospital admission, or death due to asthma) during the following 12 months, calculated from routinely captured electronic health record (EHR) data.

METHOD Data sources
Derivation dataset. An analytical dataset was used from a published cohort study 15 that used a database of people aged 12-80 years registered at one of 650 primary care practices in the UK with physician-diagnosed and recorded asthma (with no subsequent code for asthma resolved), measurement of full blood count (FBC) at any time in the past, and 2 years' continuous data. The dataset comprised data from the Clinical Practice Research Datalink (CPRD) 16 between 2001 and 2012. Although the CPRD database contains record-linked primary and secondary care data, including reason for admission to hospital, only data from primary care were used to derive the algorithm because EHRs in UK primary care do not consistently code secondary care events. However, both primary and secondary care data were used when assessing the outcome.
Validation dataset. A separate dataset of patients from the Secure Anonymised Information Linkage (SAIL) Databank 17,18 who were registered at 340 general practices in Wales was used to validate the algorithm. Record-linked data from primary and secondary care were available for individual patients and included reason for admission to hospital. Data on asthma outcomes, healthcare interactions (including GP consultations), and prescribed medications were obtained from the SAIL Databank.

Eligibility
Patients included in the existing analytical dataset for the derivation of the at-risk algorithm comprised those with: • active asthma (that is, with a coded diagnosis of asthma and a prescription for asthma treatment in the previous 12 months 19 ); • no diagnosis of any other chronic respiratory disease; • a valid blood eosinophil count (≤5000 blood eosinophils/microlitre [µL]); and • complete data for the baseline and outcome years (the year prior to, and the year following, the last eosinophil count, respectively).
Patients included in the SAIL Databank validation dataset comprised those with at least one asthma diagnosis code before 31 December 2011, no 'asthma resolved' codes between 1 January 2010 and 31 December 2011, and at least one asthma prescription (bronchodilator, corticosteroid, or leukotriene receptor antagonist) code between 1 January 2010 and 31 December 2010. Patients were continuously registered at one general practice between 1 January 2010 and 31 December 2010 (baseline datacollection year) and continually registered (or died) between 1 January 2011 and 31 December 2011 (outcome year).

Predictors
Details of all variables considered as potential predictors for the at-risk algorithm are shown in Supplementary Table S1. These included age, sex, smoking history, comorbidities, respiratory-related medication, healthcare contacts, and blood eosinophil count. For diagnostic variables (for example, ischaemic heart disease and diabetes), Read codes were queried any time up to the end of the baseline year (that is, 31 December 2010) from the validation and derivation databases. Similarly, for blood eosinophil count, body mass index (BMI), and smoking status, the most recent codes any time before 31 December 2010 were used. For the rest of the variables -prescriptions for asthma, allergic rhinitis, diabetes, anxiety and depression, paracetamol use (which is positively associated with asthma 20 ), lower respiratory

How this fits in
Risk stratification is commonly undertaken in primary care but there are no validated prediction algorithms for people with asthma using routine data. An algorithm was developed using a primary care dataset and externally validated showing acceptable predictive ability with a receiver operating characteristic of 0.71 (95% confidence interval = 0.70 to 0.72). The 7% of the population most at risk had an event rate of 6.0%, compared with 1.1% for the remaining population. This algorithm can be used to identify individuals at high risk of an asthma-related crisis event from primary care electronic health records. tract infection (LRTI) consultations, allergic rhinitis diagnosis -the codes were queried between 1 January 2010 and 31 December 2010.

Outcome
For the development of the algorithm, the outcome was defined as ≥1 hospitalisation(s) within 12 months; for the validation of the algorithm, it was defined as a crisis event that comprised an asthma-related hospitalisation, A&E attendance, or death within a 12-month period.

Statistical analysis
Univariate logistic regression models were used to identify baseline measures of disease severity, patient demographics, and comorbidities predictive of ≥1 future event(s). Variables showing an association (P<0.05) with an asthma exacerbation resulting in hospital admission in univariable analyses were entered into a multivariable model, which was reduced using backward elimination to produce a final list of predictors of hospital admission. No model updating was undertaken.
The final model was used to create at-risk scores, indicating the risk of an asthma-related crisis event for each patient in the dataset. To do this, coefficients for those factors present in each patient were summed, along with the intercept, to obtain the risk score (x), which is the logit of the probability of asthma-related attendance at A&E or hospital admission; the probability is given by e x /(1+e x ). Internal validation was not investigated, as a separate dataset was used to perform external validation. The calibration slope coefficient was estimated by splitting the predicted risk into 10 groups, based on deciles and calculating the percentage of people in those with the outcome, estimating a linear regression model with the predicted risk group against the actual risk.
Discrimination (the ability to distinguish between those who do, and do not, experience the outcome) was assessed by calculating the receiver operating characteristic (ROC) for the risk scores. In addition, the specificity, sensitivity, positive predictive values (PPVs), and negative predictive values (NPVs) were calculated for five different at-risk cut-offs (top 1%, 2%, 5%, 7%, and 10%) for the risk scores for both the derivation and the validation datasets. The overall goodness of fit of the score was assessed by estimating the pseudo R 2 from the logistic regression model. Assuming an asthma prevalence of 6%-7%, a 7% cut-off would, on average, identify the most at risk 42-49 individuals from a practice of 10 000 patients. A sensitivity analysis was undertaken for the validation cohort including only data related to hospitalisation.

Participants
The derivation and validation datasets comprised 58 619 and 174 240 people, respectively ( Figure 1). The mean age of participants was 50 years in the derivation dataset and 44 years in the validation dataset, with more females than males in both datasets (Table 1). There were proportionally more people receiving Global Initiative for Asthma (GINA) treatment step 4 or 5 (medium-or high-dose inhaled corticosteroid and long-acting beta-agonist/ muscarinic antagonist +/-add on therapies) and more with a diagnosis of, or treatment for, rhinitis in the derivation dataset than in the validation dataset (Table 1). There were differences between the datasets in terms of smoking status, BMI, anxiety and depression, and paracetamol usage. The outcome was present in 1.65% of individuals in the derivation dataset and 1.40% in the validation dataset ( Table 1).
The results of the logistic regression are presented in Table 2, which gives the estimated weight of each variable and describes the algorithm used to predict asthma crisis events.
The overall ability of the algorithm to discriminate between patients who subsequently had an asthma-related crisis event and those who did not was acceptable, and similar in the derivation dataset (ROC 0.72, 95% CI = 0.71 to 0.74) and the validation dataset (ROC 0.71, 95% CI = 0.70 to 0.72) ( Table 3). Using a cut-point based on the 7% of the population at greatest risk results in a PPV of 5.7% (95% CI = 5.3% to 6.1%) and an NPV of 98.9% (95% CI = 98.9% to 99.0%), with sensitivity and specificity of 28.5% (95% CI = 26.7% to 30.3%) and 93.3% (95% CI = 93.2% to 93.4%), respectively ( Table  3). The discriminative ability of the algorithm was similar in the validation cohort when the outcome was confined to hospitalisation only (see Supplementary Table S2). These individuals had a risk of event of 5.68% (Table 4) and 3.31% when considering hospitalisation only (see Supplementary  Table S3). The at-risk algorithm showed acceptable prognostic performance in the validation data with a 5.4-fold higher asthmarelated crisis event rate in the high-risk group (6.0%) versus the rest of the population (1.1%) at the 7% cut-off (Table 5) or an absolute difference of 4.9%.
The calibration slopes showed acceptable agreement between deciles of mean risk score and proportions of people experiencing asthma-related crisis events in each decile group, with data points close to the line of equality. The slope coefficient for the derivation dataset was 0.99 (95% CI = 0.92 to 1.05), while that for the validation was 0.85 (95% CI = 0.75 to 0.96) (data not shown).

DISCUSSION Summary
Using data that are routinely available in UK primary care EHRs, the authors derived and externally validated an algorithm containing hospitalisation, older age, underweight, smoking, and blood eosinophilia variables to identify individuals at increased risk of experiencing an asthma-related crisis event. This had acceptable overall characteristics with an ROC of 0.72 in the derivation and 0.71 in the validation cohorts, respectively. Using the top 7% of the score as a cut-off, the algorithm correctly identified 28.5% of the asthma population most at risk and 93.3% of those not at risk. A practice can expect a crisis event to occur in 6.0% of the group that is at risk compared with 1.1% of the rest of the population with asthma. Eighteen people would need to be followed to identify one admission. The algorithm can identify people who are at a five-fold increased risk (absolute difference of 5%) of an asthma-related crisis event compared with those not at risk.

Strengths and limitations
The main strength of this study is that it used two separate large databases capturing people from different geographical areas with record linkage between primary and secondary care data. The generalisability of the algorithm is illustrated by its similar

Ethical approval
The study protocol was approved by the Clinical Practice Research Database Independent Scientific Advisory Committee (ISAC) (ISAC approval number: 10_087).

Provenance
Freely submitted; externally peer reviewed.
behaviour in two different datasets. The data on cause (asthma related or not) for hospital admission when deriving the algorithm were deliberately ignored as this information, although predictive of future events, is not routinely available in primary care datasets. However, by linking primary care data with that from secondary care for the purposes of assessing the outcome, it was possible to confirm that the algorithm identifies people at risk of an asthmarelated crisis event.
The limitations were that patients in the derivation, but not the validation, cohort needed to have had a valid FBC to be entered into the database (although specific values, such as eosinophil counts, were not required). This may have resulted in differences in some of the characteristics (for example, age, sex, asthma severity, and number of comorbidities); however, the authors do not believe there is any difference in the diagnosis or management of people with asthma between Wales and England as both countries follow national guidelines. 1 The databases contained data that are now a decade old   to abort an asthma attack, 21 vitamin D monitoring and therapy, 22 and the use of monoclonal antibody therapies. 23 However, there have been no marked changes to the understanding of the aetiology of asthma crises or deaths since the data were collected, and the software systems and determinants of coding decisions in dayto-day practice remain comparable. The authors did not, however, have access to information on medication adherence or social circumstances.
Socioeconomic status has been shown to be a risk factor for hospitalisation 24 and an independent predictor for life-threatening asthma in children. 12 Unfortunately, routine data do not contain this information, although algorithms have been developed for assessing prescription uptake 25 and socioeconomic status is available from postcode data, 26 both of which may be applied to future algorithms.
In addition, the authors did not have death or A&E data in the derivation cohort,   Although the number of short-acting beta-agonist (SABA) prescriptions were included in the list of potential variables, long-acting beta-agonist as monotherapy (which has been described as a risk factor in asthma deaths 27 ) was not, as this regime is rarely prescribed. 28 This algorithm does not predict community-based asthma attacks requiring oral prednisolone.

Comparison with existing literature
The WARS score had an ROC of 0.83 for prednisolone use, 11 but the performance of the score in terms of crisis events is unknown; likewise, the performance measurements of the risk score developed by Bateman et al 10 for asthma attacks are not published. However, the Respiratory Effectiveness Group initiative published an algorithm to predict the risk of ≥2 attacks in the subsequent 2 years with an ROC of 0.79 (95% CI = 0.78 to 0.79). 29 Recent evidence 27 suggests that disease severity is an unreliable measure of risk and, indeed, the results presented here confirmed that GINA treatment step 'no therapy' was as statistically significant a risk factor as steps 4-5. In terms of non-respiratory hospitalisation prediction algorithms, the QRISK2 scorewhich is widely used in the NHS to predict cardiovascular events -has an R 2 of 43.5 and 38.4, and an ROC statistic of 0.82 and 0.79 for females and males, respectively. 30 A systematic review of risk prediction models to predict emergency admission in community-dwelling adults 31 identified 27 different such models and showed that those using clinical data (as in the algorithm presented here) outperformed those using self-reported data; C-statistics ranged from 0.63 to 0.83. The algorithm presented here, which utilised clinical data, had a comparable level of calibration (C-statistic 0.72) to other clinically useful algorithms.
Outcome data were collected as events over a 12-month period to avoid seasonal variations. The algorithm, therefore, predicts hospitalisation in the following year; however, an individual's risk status can change if, for example, they had a hospitalisation just within, or without, of a 365-day period. Different algorithms can show substantial variation in risk at the individual level 32 and should complement physician assessment based on knowledge about individuals.
Nevertheless, the growing workloads on primary care clinicians and the ongoing challenge of rising unplanned admissions and avoidable deaths makes accurate identification and targeting of the individuals  at highest risk an essential part of primary care strategy.

Implications for practice
Primary care software systems routinely use prompts to alert clinicians to overdue asthma reviews and the overordering -and, by implication, overuse -of SABAs. Both are helpful markers of risk that are not always recognised as such, 13,33,34 but they do not reflect the range and complexity of factors found in patients who are most at risk of adverse outcomes. 27,35 Current guidelines recommend that patients are assessed for risk of future attacks. The indicators recommended include a history of previous attacks, SABA use, and other markers of disease control, atopy, and environmental tobacco exposure in children; in adults, these include smoking, obesity, and depression. In April 2020, Quality and Outcomes Framework indicators for disease control were changed 36 from 'Royal College of Physicians (RCP) 3 questions' (on asthma): 37 Have you had difficulty sleeping because of your asthma symptoms (including cough)? Have you had your usual asthma symptoms during the day (for example, cough, wheeze, chest tightness, or breathlessness? Has your asthma interfered with your usual activities (for example, housework, work, school)?
to the Asthma Control Test score plus the number of exacerbations in the previous 12 months. Achieving these new indicators requires more clinician time and greater participation from patients. Failure to attend appointments is, in itself, a risk factor for poor outcomes. 35 The algorithm developed and presented here simplifies the collection and weights the statistical significance of multiple risk factors. It has the potential to save clinicians' time and provide accurate realtime assessments of patients' risk and, as it does not require patients to attend a consultation, also bypasses the dangers of inverse care associated with poor attendance at appointments. The algorithm also concurs with, and provides a mechanism to identify, important markers highlighted in the National Review of Asthma Deaths report, 27 such as patients on no treatment for their asthma. It can be used to generate alerts or prompts to identify patients at high risk of asthma crisis events (A&E attendance, hospitalisation, or death), when their EHRs are accessed so care can be targeted appropriately.
The algorithm is currently being used in a study 38 to validate the role of at-risk asthma registers in primary care. Further work is also needed to explore some of the unexpected indicators, such as low BMI, and to find a way to incorporate important social and behavioural determinants that are not currently captured in primary care EHRs.

Contributors
Annie Burden and Susan Stirling contributed equally to the manuscript as statisticians. In the validation dataset, actual values are masked due to small frequencies in one category. b Hospitalisation or A&E attendance in derivation dataset, and any of hospitalisation, A&E attendance, or death in validation data. A&E = accident and emergency; BMI = body mass index; GINA = Global Initiative for Asthma; IHD = ischaemic heart disease; LTRI = lower respiratory tract infection.
e956 British Journal of General Practice, December 2021