Accuracy of the NICE Traffic Light system for detecting serious illness in acutely unwell children presenting to general practice: a retrospective cohort study. British Journal of General Practice Open.

Background The NICE Traffic Light system was created to facilitate the assessment of unwell children in primary care. No studies have validated this tool in UK general practice. Aim To evaluate the accuracy of this system for detecting serious illness in children presenting to general practice. Retrospective diagnostic accuracy study, using a cohort of acutely unwell children under five years presenting to general practice in England and Wales. Method The Traffic Light categories of 6,703 children were linked with hospital data to identify admissions and diagnoses. The sensitivity and specificity of these categories were calculated against the reference standard: a hospital diagnosed serious illness within seven days of GP consultation, measured using ICD-10 codes. Results 2,116 (31·6%) children were categorised as red; 4,204 (62·7%) amber; and 383 (5·7%) green. 139 (2·1%) children were admitted to hospital within seven days of consultation, of whom 17 (12·2%; 0·3% overall) had a serious illness. The sensitivity of the ‘red’ category (vs. amber/green) was 58·8% (95% confidence interval: 32·9% to 81·6%) and the specificity 68·5% (67·4% to 69·6%). The sensitivity and specificity of ‘red’ and ‘amber’ combined (vs. green) was 100% (80·5% to 100%) and 5·7% (5·2% to 6·3%) respectively.


Abstract Background
The NICE Traffic Light system was created to facilitate the assessment of unwell children in primary care. No studies have validated this tool in UK general practice.

Aim
To evaluate the accuracy of this system for detecting serious illness in children presenting to general practice.

Design and Setting
Retrospective diagnostic accuracy study, using a cohort of acutely unwell children under five years presenting to general practice in England and Wales.

Method
The Traffic Light categories of 6,703 children were linked with hospital data to identify admissions and diagnoses. The sensitivity and specificity of these categories were calculated against the reference standard: a hospital diagnosed serious illness within seven days of GP consultation, measured using ICD-10 codes.

Conclusion
The Traffic Light system cannot accurately detect children admitted with a serious illness, nor those not seriously ill who can be managed at home. This system is not suitable for use as a clinical tool in general practice. Further research is required to update or replace the Traffic Light system.

How this fits in
The NICE Traffic Light system is widely used in general practice for the assessment of unwell children; however, the majority of previous studies validating this tool have been conducted in secondary care settings. No studies have validated this tool within UK general practice. This study found that the Traffic Light system cannot accurately detect or exclude serious illness in

INTRODUCTION
Children with acute illnesses are a common presentation in general practice, and can be a diagnostic challenge for clinicians. 1,2 This is primarily due to their presentation at an early stage of illness, where the signs and symptoms of non-serious and serious illnesses can appear similar. 3 The majority of acute illnesses are self-limiting viral infections, although a minority of children may have a serious illness such as pneumonia or meningitis. 4 The estimated incidence of these illnesses in children presenting to general practice is 1% per year. 3,5 Despite this, fear and anxiety in parents with an unwell child are common. 6,7 In an effort to help primary care clinicians confidently assess unwell children, the National Institute for Health and Care Excellence (NICE) created the 'Traffic Light' system. 8 This tool categorises children as either green, amber, or red depending on their consulting clinical features; corresponding to a low, intermediate, or high risk of serious illness respectively. According to the tool, 'green' children can be managed at home. Children in the 'amber' category can either be referred to hospital for assessment or sent home with safety-net advice. Children in the 'red' category should be referred urgently for assessment in hospital.
The recent relaxation of restrictions during the coronavirus disease 2019 (COVID-19) pandemic has led to a dramatic increase in the prevalence of respiratory illness in young children, with emergency departments experiencing high demand from children, many of which are not seriously ill. 9,10 This illustrates the importance of an accurate primary care tool to identify those needing secondary care assessment.
Validation of the Traffic Light system within general practice has been performed once previously in 2013, using a small Dutch cohort of 506 febrile children. 11 Additional studies have evaluated this tool within emergency department settings. 4,[11][12][13] However, no studies have evaluated the NICE Traffic Light system in UK general practice.
The aim of this study is to evaluate the accuracy of the NICE Traffic Light system for predicting serious illness in acutely unwell children aged under five presenting to UK general practice.

METHODS
The Standards for Reporting of Diagnostic Accuracy (STARD) guidelines were followed in the reporting of this study. 14

Sample Formation
This retrospective cohort study involved the secondary analysis of a dataset collected for the Diagnosis of Urinary Tract Infection in Young Children (DUTY) study. 15 The DUTY study was a prospective cohort study of acutely unwell children aged under five years in primary care, which evaluated the presenting signs and symptoms of urinary tract infections (UTIs). Children from 233 sites in England and Wales (general practices, walk-in centres, and emergency departments) were consecutively recruited between 2010 and 2012 if they were constitutionally unwell due to any acute illness and/or presenting with a UTI symptom described by NICE. The eligibility criteria for recruitment into the DUTY study are presented in Supplementary Table 1. The study presented here only includes the children from the DUTY study who presented to general practice (GP).

Data Processing
The children's clinical features at the time of their GP presentation were mapped retrospectively to equivalent variables within the Traffic Light system as part of a separate study. 16 Children were categorised as 'red' if they had at least one red feature, 'amber' if they had at least one amber feature (and no red), and 'green' if they had no amber or red features. Children with missing data, preventing Traffic Light categorisation, were excluded from our analyses.
Routinely collected hospital data in England and Wales were accessed to identify admissions; provided by Hospital Episode Statistics (collected by NHS Digital) and the Patient Episode Database for Wales, respectively. The data collected from children during the DUTY study were linked to hospital data using the Secure Anonymised Information Linkage (SAIL) Databank.

Outcome Definitions
The primary outcome of interest was an unplanned hospital admission with a serious illness, within seven days of initial GP consultation.
A 'hospital admission' was defined as a spell in hospital under the care of a consultant; assessment in the emergency department was not recorded as an admission unless the treating team decided to admit them. To ensure a strong association between consulting clinical features and any subsequent hospital admission, a seven-day follow-up period was chosen. 4,12,[17][18][19] Our definition of 'serious illness' was created by identifying the NICE definitions used during the creation of the Traffic Light system and exploring previous literature. 2,4,13,[20][21][22] The International Classification of Diseases: 10 th revision (ICD-10) codes for these serious illnesses were used to create a reference table (Supplementary Table 2). The principal serious illnesses included: sepsis, pneumonia, meningitis, and UTI.

Statistical Analysis
To evaluate the test performance of the NICE Traffic Light system, sensitivity and specificity were calculated separately for two thresholds of test positivity: designation of 'red' category and designation of 'red' or 'amber' category. This allowed comparison of test performance between 'high-risk' and 'intermediate to high-risk' populations. Further calculations included 95% confidence intervals (CIs) and positive and negative predictive values.
Two sensitivity analyses were conducted. The first assessed the ability of the Traffic Light system to detect children admitted to hospital (with or without a serious illness), to reduce the impact of incorrect diagnoses coding within the routine data. The second evaluated the Traffic Light system when applied to a cohort of febrile children, chosen because the Traffic Light system was created by NICE to assess febrile children specifically. The NICE definitions of fever were used to identify this subgroup: "measured or perceived elevation of body temperature above the normal daily variation (≥38ᵒC) by a parent or clinician". 8 Analyses were performed using SPSS v.26. 23 To comply with SAIL regulations, data with frequencies less than five were suppressed and presented as '<5' or rounded to the nearest five.

RESULTS
During the DUTY study period, 7,374 children were recruited from primary care. Children presenting to care providers outside of general practice were excluded, in addition to those without available Traffic Light and hospital admissions data. Overall, 6,703 (91.0%) children were eligible for inclusion in this study (Figure 1). This population of children presenting to general practice will be referred to as the 'DUTY cohort'.

Hospital Admissions
Linkage to hospital admissions data was achieved for 98·7% of children with available Traffic Light categories (6,703/6,791). To ensure there was no selection bias between children who could not be linked to hospital data (n=88) and children who could (n=6,703), a series of descriptive statistics were undertaken (Supplementary Table 3). These confirmed no clinically significant differences in median age, duration of illness, or distribution of Traffic Light categories.
Within seven days of presenting to general practice, 139 of 6,703 children (2·1%) were admitted to hospital. The children admitted to hospital were younger than the children who were not admitted (Table 1). Additionally, more children were 'red' at initial presentation to general practice ( Table 1).
The median duration between general practice consultation and admission was 1 day (25 th to 75 th centiles: 0 to 3 days), with just under half of cases admitted the same day (n=57, 41·0%). Children were most commonly discharged on the same day (n=81, 58·3%) or the day after (n=36, 25·9%). The median length of hospital stay was 0 days (0 to 1 day). The most common diagnosis in hospital was an unspecified viral infection (14·4%, Supplementary Table 4).

Serious Illnesses
The prevalence of serious illness in our cohort was 0·3% (17/6,703, 95% CI: 0·2% to 0·4%). The majority of serious illnesses were cases of pneumonia (n=8, 47·1%). No cases of sepsis or meningitis were reported. Information on all diagnosed serious illnesses cannot be disclosed due to small numbers. Of the children diagnosed with a serious illness, 10 (58·8%) were categorised as 'red' at presentation to general practice (Figure 2). The median duration between general practice consultation and hospital admission in this group was 2 days (25 th to 75 th centiles: 0 to 2 days) and the median length of hospital stay was 1 day (0 to 2 days). Children were most commonly discharged on the same day (n=7, 41·2%) or two days later (n<5).
The performance of the Traffic Light system for detecting any admission to hospital was also calculated ( Table 3). The 'red' category had a sensitivity of 61·9% (53·3% to 70·0%) and specificity of 69·1% (67·9% to 70·2%). Combining 'red' and 'amber' improved the sensitivity to 97·8% but reduced specificity to 5·8%. The cross-tabulations used to calculate these measures of test performance are displayed in Supplementary Table 5.

Sensitivity Analysis for Febrile Children
In our sample, 5,032 children (75·1%) were febrile. Of these, 120 (2·4%) were hospitalised and 16 (0·3%) were diagnosed with a serious illness. Therefore, 86·3% (120/139) of children admitted to hospital and 94·1% (16/17) of children with a serious illness were febrile at GP presentation. The Traffic Light system had improved sensitivity but poorer specificity for detecting serious illness when used in this population (Table 4).

Summary
We found that the prevalence of serious illness in 6,703 acutely unwell young children presenting to UK general practice was 0.3% and that the NICE Traffic Light System categorised 31·6% of all children as red; 62·7% as amber; and 5·7% as green. Overall, 139 (2.1%) children were admitted within seven days of their initial presentation. The Traffic Light tool had a sensitivity of 58·8% and specificity of 68·5% for the identification of children admitted to hospital with a serious illness, when comparing 'red' with 'amber'/'green' categories. Changing the threshold to include 'red'/'amber' categories combined, compared to 'green', improved the sensitivity to 100% but worsened the specificity to 5·7%. The results were robust to detecting hospital admissions for any reason, or when applied to febrile children only.

Strengths and Limitations
To our knowledge, this is the first study to validate the Traffic Light system in UK general practice. We used a dataset of 6,703 children, representing one of the largest and most detailed characterisations of clinical features among acutely unwell young children presenting to general practice in the UK, linked to hospital admissions data. Although the NICE guidelines and Traffic Light system were designed for febrile children, our inclusion of all children with an acute illness better represents the variety of children requiring assessment by GPs in clinical practice. Furthermore, we included all illnesses defined by NICE as 'serious', which matches the illnesses that the Traffic Light system was designed to detect. We were also able to provide a current prevalence estimate of serious illness in children presenting to general practice.
There are several limitations to this study. The prospective data were not collected specifically to answer this research question. Consequently, not all clinical features present in the Traffic Light system could be matched to the DUTY dataset during the assignment of Traffic Light categories. 16 The variables which could not be mapped were mostly within the 'red' category and principally involved neurological features such as neck stiffness, or orthopaedic signs such as limb swelling. If these features had been matched, potentially more children would have been 'red', and the sensitivity of the Traffic Light system may have improved. However, most of the key constitutional features of the Traffic Light system were captured by the DUTY study, and 64% of data fields were mapped. 16 Our 'serious illness' reference standard was dependant on diagnostic codes within routinely collected hospital data; reference standards such as clinical, laboratory, and radiological data were not available to assess the evidence supporting a final diagnosis of a serious illness. It is possible that some children with a 'serious illness' were not identified in our study, due to incorrect coding of their diagnoses, thus underestimating our disease prevalence and tool sensitivity. However, the coding in our dataset represents the diagnoses recorded on discharge by a clinician with access to all investigations.
Finally, our sample only included children who fulfilled the DUTY study eligibility criteria. Consequently, our cohort may not be entirely representative of all acutely unwell children within this age group. The criteria did not include children who were constitutionally well unless they had symptoms suggestive of a urinary tract infection. Children with focal illnesses or mild respiratory infections may have been excluded, and children with urinary tract infections may have been overrepresented due to the aim of the DUTY study. However, the Traffic Light system was principally designed for constitutionally unwell children; therefore, we believe our cohort is adequately representative of this. There is a possibility that some of the children presenting to general practice were too ill to be recruited for the original study. However, our low disease prevalence is similar to previous estimates of serious illnesses presenting to general practice. 5

Comparison With Existing Literature
To our knowledge, only one other study has validated the Traffic Light system within general practice. 11 This was a retrospective cohort study conducted by Verbakel et al in the Netherlands, using a sample of 506 febrile children aged 3 months to 6 years. They reported a sensitivity of 100% and specificity of 1% for the identification of serious bacterial infections, using the presence of any 'red' or 'amber' features as a positive test. 11 These results are similar to ours when 'red/amber' categories were combined. However, it was not clear how the authors defined 'serious infection'; either as clinical judgement, hospital admission, or investigations performed in secondary care. Furthermore, information regarding the designation of Traffic Light categories was not provided.
Further studies have assessed the Traffic Light system in emergency department settings, reporting sensitivities between 85% and 99%, and specificities between 2% and 29% (using 'red' or 'amber' combined as a positive test). 4,[11][12][13] Our results for 'red' and 'amber' combined were similar, with a particularly poor specificity. This may be due to the lower prevalence of serious illnesses seen in a GP setting. Notably, these studies limited their outcome to serious bacterial infections only, and included children up to the age of 16 years in some cases. 11,12

Implications for Research and Practice
We conclude that the NICE Traffic Light system is not able to accurately detect or exclude serious illness in acutely unwell children presenting to general practice when the 'red' category is used as a positive threshold. If our cohort's Traffic Light classifications had been followed by GPs, one-third of children (categorised as 'red') would have been urgently referred to hospital. Additionally, using the 'red' category as a threshold for hospital referral would have missed 41·2% of children with a serious illness who were 'amber', although NICE does recommend that clinicians should refer 'amber' children if indicated. Combining 'red' and 'amber' categories improved the sensitivity of the Traffic Light system, such that all seriously ill children were identified. This threshold would allow GPs to be confident sending 'green' children home, but at the cost of referring a substantial number of children to hospital; 94·3% of patients were categorised as 'red' or 'amber'. Moreover, GPs would only be able to confidently exclude serious illness in a minority (5·7%) of children classified as 'green'. The Traffic Light system was created to help GPs confidently assess unwell children, aiding their decisions about who to refer and who to send home by identifying those at risk of serious illness (thus prioritising sensitivity over specificity), but we have shown that it is unable to accurately achieve this. This is an important finding in light of the current strain experienced by primary care services as a consequence of the COVID-19 pandemic. Secondary care can only function to assess and treat patients with serious illnesses provided there is effective functioning of primary care in serving the remainder.
Research is required to derive an updated tool for the assessment of acutely unwell children presenting to general practice. This assessment tool should correctly identify the most unwell children, while preventing unnecessary hospital referrals for children who are more likely to have a self-limiting illness. It should be derived and validated using data from UK general practice or primary care and must be suitable for use in a typical general practice consultation. This may require combining multiple datasets, due to the low prevalence of serious illness within this population. Future research could also assess whether 'point-of-care' markers of illness, such as C-reactive Protein (CRP) and Procalcitonin (PCT), combined with clinical features in a single assessment tool could provide a more accurate indication of illness severity in children.

FUNDING
No funding was granted for the completion of this study. The lead author was granted a monetary award from the Wolfson Foundation (Wolfson Intercalated Award) to help their completion of this research as part of their intercalated Bachelor of Science degree. This funding was not associated with the study design, result analyses, write-up, or the decision to submit this