Predicting unplanned admissions to hospital in older adults using routinely recorded general practice data: development and validation of a prediction model

Background Unplanned admissions to hospital represent a hazardous event for older people. Timely identification of high-risk individuals using a prediction tool may facilitate preventive interventions. Aim To develop and validate an easy-to-use prediction model for unplanned admissions to hospital in community-dwelling older adults using readily available data to allow rapid bedside assessment by GPs. Design and setting This was a retrospective study using the general practice electronic health records of 243 324 community-dwelling adults aged ≥65 years linked with national administrative data to predict unplanned admissions to hospital within 6 months. Method The dataset was geographically split into a development (n = 142 791/243 324, 58.7%) and validation (n = 100 533/243 324, 41.3%) sample to predict unplanned admissions to hospital within 6 months. The performance of three different models was evaluated with increasingly smaller selections of candidate predictors (optimal, readily available, and easy-to-use models). Logistic regression was used with backward selection for model development. The models were validated internally and externally. Predictive performance was assessed by area under the curve (AUC) and calibration plots. Results In both samples, 7.6% (development cohort: n = 10 839/142 791, validation cohort: n = 7675/100 533) had ≥1 unplanned hospital admission within 6 months. The discriminative ability of the three models was comparable and remained stable after geographic validation. The easy-to-use model included age, sex, prior admissions to hospital, pulmonary emphysema, heart failure, and polypharmacy. Its discriminative ability after validation was AUC 0.72 (95% confidence interval = 0.71 to 0.72). Calibration plots showed good calibration. Conclusion The models showed satisfactory predictive ability. Reducing the number of predictors and geographic validation did not have an impact on predictive performance, demonstrating the robustness of the model. An easy-to-use tool has been developed in this study that may assist GPs in decision making and with targeted preventive interventions.


Introduction
Increasing rates of unplanned admissions to hospital in older adults are a major burden on healthcare systems worldwide.[4] Preventing unplanned admissions is critical to ensure patient safety and wellbeing, and aligns with the World Health Organization's philosophy of providing tailored care in appropriate settings for older adults. 5A proactive approach optimises the allocation of scarce healthcare resources and addresses a pervasive concern in healthcare systems worldwide, where increasing demand outpaces the capacity of healthcare professionals.In the Netherlands, the integral care agreement (ICA) of 2022 prioritises preventive measures for acute care, particularly for older adults.Through education, prevention, and early signalling initiatives, the ICA aims to reduce unplanned admissions to hospital. 6][9][10] However, GPs are patients' primary point of contact and act as gatekeepers in many healthcare systems, such as the Netherlands. 11Therefore, they play a pivotal role in identifying those at risk for unplanned admission to hospital and the targeting of preventive interventions.A prediction model that can accurately predict high-risk individuals by reusing patient registration data could help GPs identify these individuals.The use of electronic health record (EHR) data offers opportunities for the development, integration, and automated calculation of an individual's risk, because it contains comprehensive patient information and is derived from routine health care.The utilisation of these readily available data for the development of a prediction model facilitates ease of use and reduces time burden on GPs.Previous research has shown that administrative data can be useful in accurately predicting unplanned admissions to hospital. 12,13owever, the methodological quality of these studies was limited and many models required additional data collection, making clinical use difficult.Models based on routine care data have a lower threshold and might therefore be used more frequently.As a result, their potential impact would be greater, even if the predictive power is similar.
The aim of this study was to develop and validate a practical and easy-to-use prediction model for unplanned admissions to hospital using a Dutch representative sample of older people in general practice.The model was developed using current state-of-the-art methods and incorporating readily available EHR data complemented with national administrative data.Also, the study specifically assessed the predictive performance of the model in a subsample of individuals with cognitive decline or dementia.

Method
This study is reported according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines. 14

Sources of data
Pseudonymised EHR data from GPs linked with data from national administrative databases were used in this study.The baseline data covered the year 2013; outcomes were assessed in 2014.The routine EHRs used were from a nationally representative sample of 417 Dutch general practices participating in NIVEL-Primary Care Database (NIVEL-PCD). 15This database covers about 10% of the Dutch population and is representative in terms of practice type, urbanisation level, and age and gender distribution. 15,16Data include information on chronic conditions, medication, and GP consultations.GPs receive support to assist them with coding and also feedback about the quality of their recording. 16In the Netherlands, all Dutch inhabitants are registered with a GP and have mandatory health insurance.GP care is fully insured, therefore the threshold for consultation is low.Nine out of ten people aged ≥65 years visit their GP at least once a year, with an average of eight consultations per year. 17ministrative data were provided by Statistics Netherlands, the governmental institution responsible for processing statistical data in the Netherlands.These included demographic information and data on admissions to long-term care facilities and death.Data on admissions to hospital were derived from the Dutch Hospital Data (DHD) database, made available by Statistics Netherlands.In 2013 and 2014, DHD contained data from 87 out of 88 general and academic hospitals in the Netherlands.

Study population
The study population consisted of individuals aged ≥65 years, living at home, and registered uninterrupted in one practice between 1 January 2013 and 31 December 2013 (baseline period).To avoid potential noise from admissions to a long-term care facility and from deaths in predicting the outcome, individuals who did not experience an unplanned admission among those who died or were admitted to a long-term care facility within the prediction period were excluded from the analysis (see Supplementary Figure S1).The number of excluded individuals varied depending on the follow-up period (3, 6, and 12 months) (Figure 1).

Outcome
The primary outcome was unplanned hospital admissions with ≥1 overnight stay within 6 months and derived from national administrative data.Admissions were defined as unplanned when immediate treatment or assistance within 24 h was necessary according to the medical specialist. 18Admissions without an overnight stay and admissions for psychiatric conditions were excluded as these often require different care trajectories.Secondary outcomes were

How this fits in
Unplanned hospital admissions in older adults are a critical concern for patients, family caregivers, healthcare professionals, and service planners.In this study a robust and easy-to-use prediction model has been developed and validated using routinely recorded data from general practices to predict the risk of unplanned hospital admissions in community-dwelling older adults.Identifying older adults at high risk can facilitate targeted preventive interventions, such as case management, telemedicine, or anticipatory care planning.Moreover, the model could also be utilised by policymakers for capacity planning of hospital beds.
British Journal of General Practice, September 2024 unplanned admissions within 3 and 12 months.

Predictors
Updating existing prediction models was not feasible because of the incomparability of the predictors in this study's dataset with the predictors in existing models, as well as the low methodological quality of these studies. 12o (partially) incorporate information from existing models, in the current study variables commonly included in existing models were selected as candidate predictors for the model, for example, prior admissions to hospital and several chronic conditions. 12In addition, variables were selected based on the insights from a focus group study that was organised among primary healthcare professionals (to be published) and based on the clinical expertise of the authors.
Ultimately, 29 candidate predictors were selected including age, sex, migration background, income, living situation, chronic conditions, prescription medications, and healthcare utilisation (see Supplementary Table S1 for a detailed description).Chronic conditions were derived from World Health Organization International Classification of Primary Care (ICPC-1) 19 coded EHR data recorded up to the end of the baseline period.In NIVEL-PCD, GPs received feedback on the quality of recording and support to assist them with coding. 16Chronic conditions were selected because of their high prevalence in older adults. 20Dementia was added because of its strong association with admissions to hospital. 2Medication variables were derived from prescription data coded with the Anatomical Therapeutic Chemical classification system and included when a medication was prescribed in a chronic fashion (that is, >2 prescriptions 21 ) in the year before baseline.Consult declarations (CTG-codes in Dutch) were derived from coded claims data recorded in general practices in the year before baseline.

Missing data
As the data were derived from routine care processes, any undocumented information in the EHR was not indicated.For the data provided by Statistics Netherlands, income data had missing values for 116 individuals (<0.01%).This justified conducting a complete case analysis considering the negligible proportion of missing data and the minimal potential impact on the results. 12,13,22,23

Statistical analysis
Linearity was assessed for continuous variables using restricted cubic splines. 24on-linear variables were tested as splines and as categorical variables in the logistic model.If the spline did not improve performance, the categorical variant was chosen because the authors wanted this to be a practical model.Collinearity was evaluated using variance inflation factors (VIFs).VIFs ranged between 1.01 and 2.43, therefore problematic collinearity was absent. 25del development.The large sample provided sufficient statistical power to split the sample into a development and validation sample based on geographic region.The larger sample, that is, the six southernmost provinces (n = 142 791/243 324, 58.7%), was used for development and the smaller sample for validation (see Supplementary Figure S2).Geographic validation is considered a stronger approach compared with a random split sample procedure. 14,26r model building, this study followed the recommended steps outlined in the TRIPOD guidelines 14 and by Steyerberg. 27ultivariable logistic regression with backward stepwise selection (P<0.01) was performed using all 29 candidate predictors to design an optimal model.Given the sample size, there was sufficient power to fit a more parsimonious model by incrementally removing the variables with weakest association, until the area under the curve (AUC) deteriorated by ≥0.01.Internal validation was performed through bootstrapping (n = 250).This procedure was repeated twice with smaller subsets of candidate predictors to develop a model with only variables readily available from the EHR (readily available model) and with only easy-to-use variables (easy-to-use model), using 24 and 22 candidate predictors, respectively (see Supplementary Table S1).The easy-to-use model was designed to allow rapid completion by a GP; variables were therefore selected that are quick and easy to fill.All three models were validated in the northern sample.

Model development and validation
The optimal model included eight predictors: sex, age, prior admissions to hospital, chronic obstructive pulmonary disease (COPD), polypharmacy, use of blood thinners, number of GP or practice nurse consultations, and percentage of home visits compared with all contacts with a GP (Table 2).When applied to the validation sample, the AUC was 0.73 (95% confidence interval [CI] = 0.72 to 0.73).Youden's optimal probability threshold was 0.07, reflecting a sensitivity of 65.7% and a specificity of 68.5% in the validation sample (Table 3 and Figure 2).Performance measures are reported for multiple probability thresholds to accommodate varying clinician preferences for risk estimation.
The readily available model contained all predictors of the optimal model except for prior admissions to hospital (Table 2).Compared with the optimal model, the AUC in the validation sample was marginally lower (AUC 0.72, 95% CI = 0.71 to 0.72).
The easy-to-use model included age, sex, admissions to hospital in the past year, heart failure, COPD, and polypharmacy (Table 2).When applied to the validation sample, this resulted in an AUC of 0.72 (95% CI = 0.71 to 0.72).To allow for individualised predictions of this model, a Microsoft Excel spreadsheet is provided as a supplement (see Supplementary Information S1).
For all three models, bootstrapping resulted in an optimism of the AUC, intercept, and slope <0.001, therefore no adjustments of the coefficients were required.Calibration of all models was good; the slope and intercept did not deviate to the extent that model updating was undertaken (see Supplementary Figure S3).

Clinical implications of choice of cut-off value
Choosing a cut-off value provides the opportunity to stratify patients into low-and high-risk groups.This facilitates clinical decision making.To illustrate this, a practice consisting of 500 community-dwelling patients aged ≥65 years was considered.The consequences of two different cut-off values (or probability thresholds) were compared: 0.07 and 0.15.A prior probability of 7.6% for each patient (prevalence) was assumed.The 2 × 2 contingency tables for both cut-offs are shown (Tables 4 and 5).
Using a cut-off of 0.07 stratifies approximately one-third of the practice's older population as high risk, requiring screening or intervention.However, this choice results in a high number of false positives, where individuals are identified as high risk but do not experience the predicted outcome.At the individual level, using a threshold of 0.07 increases the probability of a patient being classified as high risk for unplanned hospital admissions by a factor of 2, from 7.6% to 14.9%.This means that, out of 100 high-risk patients, 15 will have an unplanned hospital admission within 6 months.
Alternatively, using a cut-off value of 0.15, one in ten older patients will be classified as high risk, resulting in a substantially lower number of false positives.However, the number of false negatives doubles, indicating that some potential patients are missed.For a high-risk patient at the 0.15 threshold, the probability of unplanned hospital admissions increases by a factor of 3 to 23.2%.Consequently, out of 100 high-risk patients, 23 would experience an unplanned hospital admission within 6 months.
How much risk a clinician is willing to take to avoid missing an unplanned admission will depend on the clinician's judgement.Opting for a lower threshold results in a low number of false negatives, but raises the probability of false positives, requiring a more extensive and labour-intensive screening process.

Sensitivity analyses
Testing the optimal model in people with cognitive decline resulted in an AUC of 0.67 (95% CI = 0.65 to 0.69) in both samples.The optimal model showed good predictive ability when fitted in the 3-and 12-month follow-up samples.However, the calibration plots showed systematic over-and underestimation in the 3and 12-month samples, respectively.Finally, evaluating the optimal model in a sample including those who died or were admitted to a long-term care facility within 6 months resulted in an AUC of 0.72 (95% CI = 0.72 to 0.73).See Supplementary Tables S2-S4 and Supplementary Figure S4 for details of these analyses.

Summary
In this study, routinely recorded and linked health and census data were used to develop and validate an easy-to-use prediction model for unplanned admissions to hospital in community-dwelling older adults.Predictors associated with unplanned hospital admission included age, sex, admission to hospital in the past year, polypharmacy, the use of blood thinners, COPD, heart failure, number of consultations including telephone consultations and home visit contacts, and the percentage of home visits.The optimal model showed satisfactory discrimination and good calibration.Moreover, geographic validation, reducing the number of predictors, changing the prediction horizon, and including individuals who died or were admitted to long-term care facilities within the prediction period, all resulted in a negligible decrease in discriminative  These results should enable GPs to identify patients who may benefit from targeted admission prevention strategies.To improve predictions, the authors of the current study emphasise the importance of routine recording or incorporation of hospital admission data into the EHR.

Strengths and limitations
A strength of this study is the use of multiple approaches for model development, providing valuable insights into relative effectiveness and practical utility.By considering the advantages and limitations of each approach, healthcare providers and policymakers can make informed decisions about which model is suitable for their specific needs and resources.The use of EHR data enriched with national administrative data resulted in the best predictive model, that is, the optimal model.Using structured EHR data allows the readily available model to be implemented nationwide.However, it includes more time-consuming variables compared with the easy-to-use model.By facilitating rapid bedside assessment, the easy-to-use model is more accessible to GPs, while incorporating the most predictive variable: prior admissions to hospital.Furthermore, the large longitudinal sample and its nationwide representativeness suggests these findings could be generalised across the Netherlands.
This study also has limitations.As advocated in the literature, 29,30 updating an existing prediction model is preferred over simply developing a new model, so information from the previous models is not neglected.However, model updating is only valuable provided the original model's development is appropriately performed, and variables and outcomes are determined in a similar way. 29For this study, however, the low quality of reporting in the previous studies, 12 and the lack of several variables in the current dataset, made updating infeasible.Moreover, differences in care systems between countries complicate the transportability of existing models to other geographical populations, 31 and no model had yet been developed in the Netherlands.Altogether, this large sample called for the development of a new model rather than updating an existing one.Nevertheless, to incorporate data from previous models as much as possible, the current study assessed the variables most frequently included in previous models as candidate predictors for inclusion in this model.Furthermore, although in this study the data are approximately 10 years old, the relevance remains.Reviews have shown the long-term trends and relative stability over time of included predictors of unplanned hospital admissions, such as

Figure 1 .
Figure 1.Flow of participants through study.LTCF = long-term care facility.

Figure 2 .
Figure 2. Graphical presentation of performance measures of the optimal model in the validation sample.

CTV contacts with GP or practice nurse past year, median (IQR)
Data are n (%) unless otherwise specified.aP<0.05 between groups.bAnnualhouseholdincomehad 83 missing in the southern sample and 33 missing in the northern sample.cConcurrentuse of an antidiuretic, ACE inhibitor, and NSAID (see Supplementary TableS1).dDefinedas ≥1 chronic condition and no registered contact with GP or practice nurse in the past year (see Supplementary TableS1).ACE = angiotensin-converting enzyme.COPD = chronic obstructive pulmonary disease.CTV = consultations, telephone consultations, and home visits.DOAC = direct oral anticoagulants.FRIDs = fall risk increasing drugs.IQR = interquartile range.NSAID = non-steroidal anti-inflammatory drug.TIA = transient ischaemic attack.VKA = vitamin K antagonists.

Table 2 . The final prediction models from the multivariable logistic regression based on the development sample together with OR (95% CI) and the AUC in the development and validation samples
AUC = area under the curve.COPD = chronic obstructive pulmonary disease.CTV = consultations, telephone consultations, and home visits.OR = odds ratio.

Table 3 . Measures of predictive performance of the optimal model in the development and validation sample for multiple probability thresholds
The 0.07 threshold is defined by Youden's index as the optimal probability threshold.NPV = negative predictive value.PPV = positive predictive value.