Characterisation of type 2 diabetes subgroups and their association with ethnicity and clinical outcomes: a UK real-world data study using the East London Database

Background Subgroups of type 2 diabetes (T2DM) have been well characterised in experimental studies. It is unclear, however, whether the same approaches can be used to characterise T2DM subgroups in UK primary care populations and their associations with clinical outcomes. Aim To derive T2DM subgroups using primary care data from a multi-ethnic population, evaluate associations with glycaemic control, treatment initiation, and vascular outcomes, and to understand how these vary by ethnicity. Design and setting An observational cohort study in the East London Primary Care Database from 2008 to 2018. Method Latent-class analysis using age, sex, glycated haemoglobin, and body mass index at diagnosis was used to derive T2DM subgroups in white, South Asian, and black groups. Time to treatment initiation and vascular outcomes were estimated using multivariable Cox-proportional hazards regression. Results In total, 31 931 adults with T2DM were included: 47% South Asian (n = 14 884), 26% white (n = 8154), 20% black (n = 6423). Two previously described subgroups were replicated, ‘mild age-related diabetes’ (MARD) and ‘mild obesity-related diabetes’ (MOD), and a third was characterised ‘severe hyperglycaemic diabetes’ (SHD). Compared with MARD, SHD had the poorest long-term glycaemic control, fastest initiation of antidiabetic treatment (hazard ratio [HR] 2.02, 95% confidence interval [CI] = 1.76 to 2.32), and highest risk of microvascular complications (HR 1.38, 95% CI = 1.28 to 1.49). MOD had the highest risk of macrovascular complications (HR 1.50, 95% CI = 1.23 to 1.82). Subgroup differences in treatment initiation were most pronounced for the white group, and vascular complications for the black group. Conclusion Clinically useful T2DM subgroups, identified at diagnosis, can be generated in routine real-world multi-ethnic populations, and may offer a pragmatic means to develop stratified primary care pathways and improve healthcare resource allocation.


INTRODUCTION
Type 2 diabetes (T2DM) is a heterogeneous, multifactorial condition with a major impact on the health of global populations and with a high economic cost, largely mediated through its vascular complications. 1 Recent studies have identified distinct and replicable subgroups of patients with T2DM in both experimental [2][3][4] and non-experimental (real-world) cohorts [5][6][7]8 using data-driven clustering methods. These studies have defined T2DM subgroups using clinical variables typically assessed in secondary care settings, including measures of insulin secretion and resistance (determined using C-peptide and glucose assays), to delineate subgroups and their likely aetiology. [2][3][4]7 However, in the UK, the majority of T2DM management takes place in primary care settings, where these investigations are rarely performed. It is not known whether using data routinely available in the primary care setting can also be used to identify T2DM subgroups and their association with clinical outcomes. The impact of ethnicity on the characterisation of T2DM subgroups is also poorly understood but is an important area of study given the varied aetiological processes and disease prevalence, and outcomes seen in different ethnic groups with the condition. [9][10][11] The current study therefore set out to characterise T2DM subgroups in a large UK-based multi-ethnic population using routinely collected clinical measures captured in the primary care record. It then sought to investigate differences in control of glycated haemoglobin (HbA1c), time to initiation of antidiabetic treatment, and risk of vascular outcomes by diabetes subgroup and ethnicity. In doing so, the potential utility of data-driven T2D subgroup identification to stratify care delivery in NHS primary care was evaluated.

METHOD Study population
An observational cohort study was conducted using the East London Primary Care Database, which includes anonymised longitudinal health record data from 1 million individuals registered at 125 general practices across the three multi-ethnic Inner London boroughs of Tower Hamlets, Newham, and Hackney (http://www.blizard.qmul.ac.uk/ ceg-home.html).The study population included all adults aged ≥18 years newly diagnosed with T2DM between January 2008 and January 2018 with at least 12 months of continuous registration before first diagnosis of T2DM. T2DM diagnosis was identified using the C10F% Read codes as

Abstract Background
Subgroups of type 2 diabetes (T2DM) have been well characterised in experimental studies. It is unclear, however, whether the same approaches can be used to characterise T2DM subgroups in UK primary care populations and their associations with clinical outcomes.

Aim
To derive T2DM subgroups using primary care data from a multi-ethnic population, evaluate associations with glycaemic control, treatment initiation, and vascular outcomes, and to understand how these vary by ethnicity.

Design and setting
An observational cohort study in the East London Primary Care Database from 2008 to 2018.

Method
Latent-class analysis using age, sex, glycated haemoglobin, and body mass index at diagnosis was used to derive T2DM subgroups in White, South Asian, and Black groups. Time to treatment initiation and vascular outcomes were estimated using multivariable Cox-proportional hazards regression.

Conclusion
Clinically useful T2DM subgroups, identified at diagnosis, can be generated in routine real-world multi-ethnic populations, and may offer a pragmatic means to develop stratified primary care pathways and improve healthcare resource allocation. Keywords epidemiology; inequalities; primary care; type 2 diabetes. defined in the UK Quality and Outcomes Framework (see Supplementary Table S1 for the codelists). 12 Study follow-up commenced on the date of T2DM diagnosis and ended at the earliest of leaving the GP practice, death, or 1 January 2018.

Covariates
Self-reported ethnicity was identified using Read codes and collapsed into the five highlevel categories of the 2011 Census (White, South Asian, Black African/Caribbean, Mixed, and other) (Supplementary Table S1). Age at T2DM diagnosis was calculated from the date of diagnosis and age at data extraction (January 2018). Deprivation was measured using the Townsend score and divided into quintiles. Baseline was defined as the date of T2DM diagnosis. Baseline measures of HbA1c, body mass index (BMI), systolic and diastolic blood pressure, total cholesterol, and estimated glomerular filtration rate (eGFR) were derived from the last value in the year before diagnosis. Comorbid conditions were considered prevalent at baseline if present on the clinical record at any date before T2DM diagnosis. These included hypertension, coronary heart disease (CHD), stroke, heart failure, chronic kidney disease (CKD), retinopathy, and neuropathy.

Outcomes
HbA1c in the 5 years following initial diagnosis was calculated for each individual by taking the mean of all HbA1c values recorded in each 12-month period. Antidiabetic medications were categorised as all oral non-insulin antidiabetic drugs and insulin. Macrovascular disease was defined as a composite of CHD, heart failure, myocardial infarction, and stroke. Microvascular disease was defined as a composite of CKD, neuropathy, and retinopathy. Incident macrovascular and microvascular events were defined as diagnoses recorded at any point after T2DM diagnosis.

Statistical analysis
Latent-class analysis was used to identify subgroups of T2DM for the whole cohort and separately for individuals of White, South Asian, and Black ethnicity. Subgroups were derived using data from four observed indicator variables: age at diagnosis, sex, HbA1c at diagnosis, and BMI at diagnosis. Models with between two and five classes were compared and the optimal number of classes was chosen by evaluating the Bayesian Information Criteria (BIC, with lower values indicating better fit), clinical interpretability, minimum posterior probability of group membership over 70%, and sufficient group membership, defined as >1% of the study population in each class.
Mean HbA1c in the 5 years following initial diagnosis was plotted over time by ethnic group and T2DM subgroup. Among those free from prevalent vascular disease at baseline, age-and sex-adjusted Coxproportional hazards regression was used to estimate differences in the cause-specific risk of incident macro-and microvascular disease between T2DM subgroups by ethnic group. Among individuals free from any antidiabetic treatment at diagnosis, differences in time to initiation of antidiabetic treatment between subgroups for the whole population, and by ethnic group, were modelled using age-and sexadjusted multivariable Cox-proportional hazards regression. Individuals who initiated treatment in the 30 days before diagnosis were included in the analysis by moving their treatment initiation date to 1 day after T2DM diagnosis as they were considered to be 'baseline initiators', for whom the initial prescription formed part of the diabetes diagnosis process.

RESULTS
A total of 31 931 adults with T2DM were included in the study, of whom 47% were of South Asian ethnicity (n = 14 884), 26% were of White ethnicity (n = 8154), 20% were

How this fits in
Previous studies of predominantly White European populations have identified four type 2 diabetes subgroups. In the UK the clinical measures necessary to replicate these subgroups are only available in secondary care data, limiting their usefulness for diabetes management in primary care settings. The current study demonstrated how clinically meaningful type 2 diabetes subgroups can be pragmatically generated using realworld primary care data. Furthermore, it highlighted important differences between type 2 diabetes subgroups with respect to vascular outcomes, treatment initiation, and glycated haemoglobin control. Diabetes subgroups are a useful heuristic for assisting decision making by clinicians that, in turn, can lead to a more personalised design of diabetes care focused on more intensive management of subgroups most at risk of complications, such as those with severe hyperglycaemia at time of diagnosis.
of Black ethnicity (n = 6423), and 6% were of mixed or other ethnicities (n = 1957). Ethnicity was unknown for 1.6% of the study population (n = 513). A three-group latentclass model was chosen because of minimal improvement in BIC criteria or clinical interpretability when compared with four-and five-group models, and this was unchanged when the analysis was stratified by ethnicity. Maximum follow-up time in this study cohort was 10 years, with median follow-up time at 2.9 years (interquartile range 1.0-5.4 years).

T2DM subgroups
Across the whole population, in this study two T2DM subgroups were characterised that had been identified in previous studies: 'mild age-related diabetes' (MARD) was driven by age at diagnosis and was the most prevalent cluster (82% of the total population, n = 26 294), and 'mild obesity-related diabetes' (MOD) was driven by BMI at onset (10%, n = 3059) ( Figure 1 and Table 1). The current study also identified a third subgroup, characterised by severe hyperglycaemia (determined by HbA1c) at diagnosis -'severe hyperglycaemic diabetes' (SHD) -and this was the least prevalent cluster (8%, n = 2578).
As shown in Figure 1, the pattern and features of subgroups seen in South Asians were mirrored in the Black African Caribbean population, with 76% (4895/6423) having MARD (mean age 56.0), 13% (858/6423) with MOD, and 10% (670/6423) with SHD. The clinical features driving MOD (BMI 41.4 kg/m 2 ) and SHD (mean HbA1c 12.8%/115.9 mmol/mol) were more extreme in Black African Caribbean groups than in White and South Asian groups (Table 1).
In both White and Black ethnic groups, the proportion of people with T2DM increased with increasing levels of deprivation. However, this gradient was reversed in the South Asian group with the majority of individuals contributing to the least deprived quintile. Although the sex split was comparable between ethnic groups for the MARD and SHD subgroups, the MOD subgroup was 40% female for the White group, 80.4% female for the South Asian group, and 92.5% female in the Black group (Table 1).

HbA1c trajectories over time
In the MARD and MOD subgroups, HbA1c was below 7.6%/60 mmol/mol at diagnosis and remained so over the first 5 years of follow-up. In the SHD subgroup, HbA1c was significantly elevated at diagnosis (>12.7%/90 mmol/mol) and brought down rapidly within the first 12 months, although never achieving target control. Patterns of HbA1c trajectories in each of the ethnic groups mirrored that of the overall population ( Figure 2).    (Figure 3). These differences were most pronounced in the South Asian and Black groups, for whom subgroup differences in risk of microvascular and macrovascular outcomes were similar. For individuals of White ethnicity, no differences in the risk of either micro-or macrovascular disease by subgroup were evident.

Time to vascular outcomes
Despite the significant variation in age at onset across the MARD subgroups between the three ethnic categories, there was no differential association between MARD subgroup membership and vascular outcomes by ethnicity.
After adjustment for age and sex, initiation of non-insulin and insulin antidiabetic treatment was twice as fast in the SHD subgroup in comparison with the MARD subgroup (non-insulin HR 1.92, 95% CI = 1.83 to 2.02; insulin HR 2.02, 95% CI = 1.76 to 2.32), with differences between the MOD and MARD subgroups smaller in magnitude (noninsulin HR 1.16, 95% CI = 1.11 to 1.22; insulin HR = 1.02, 95% CI = 0.87 to 1.21). Subgroup differences in age-and sex-adjusted time to treatment initiation were largest in the White population and smallest in the Black population (Table 2).

DISCUSSION Summary
In this observational cohort study, three T2DM subgroups were identified in an ethnically diverse UK population. Using only routinely recorded primary care clinical observations the current study replicated two subgroups previously reported in experimental and trial cohorts but this study was unable to identify subgroups based on insulin secretion or insulin resistance, as these rely on biomarkers not widely available in primary care settings.
The current study showed that T2DM subgroups defined at the time of diagnosis were strong predictors of clinically important differences in time to onset of vascular disease, time to initiation of antidiabetic treatment, and attainment of glycaemic control. People classified as having MARD at diagnosis had the slowest progression to vascular complications, slowest initiation of antidiabetic treatment, and best longterm glycaemic control. This association was observed consistently across all ethnic groups, despite age at diagnosis being 22 and 16 years earlier in South Asian and Black groups, respectively, than in the White group. Future research will need to investigate the longer-term impact of the MARD phenotype on vascular complications in South Asian and Black populations People classified as having MOD at diagnosis maintained target HbA1c control over the study duration, had faster treatment initiation than those in the MARD subgroup, and had the fastest progression to macrovascular complications. Those classified as having SHD at diagnosis had persistently poor glycaemic control that never reached target thresholds and had the highest risk of microvascular outcomes despite having the fastest initiation of antidiabetic medication. These findings suggest that people in the age-related subgroup may require less intensive clinical care processes, but that those in the SHD subgroup are likely to benefit from enhanced monitoring, support, and management of their condition.
There were significant differences in the clinical features of the diabetes subgroups according to ethnicity. For example, the MOD subgroup had disproportionately more women from South Asian and Black ethnicities than the White ethnic group. The age at diagnosis of people with MARD was 20 years younger in those of South Asian and Black ethnicity compared with White ethnicity. South Asian and Black people in the MOD and SHD subgroups had higher risk of vascular complications than those of White ethnicity in the MOD and SHD subgroups. Future research in larger multiethnic populations will be needed to further investigate the impact of sex and age at diabetes onset on disease outcomes.

Strengths and limitations
The current study benefited from a significantly larger and more ethnically diverse population than many earlier studies  used for identifying T2DM subgroups. 2,4,5 Furthermore, ethnicity was recorded for 98% of the study population, ensuring that the ethnicity-specific T2DM analyses were unlikely to be biased. This study captured all people with T2DM registered within a contiguous geographic area representative of the general population and other urban centres in the UK that are also ethnically and socially diverse. Furthermore, all practices contributing to this study were following standard diabetes management guidelines as outlined by the National Institute for Health and Care Excellence 13 and performance standards as per the Quality and Outcomes Framework. 14 Measures of serum high-density lipoprotein and triglycerides were not included in the cluster analysis as these are not uniformly collected in routine primary care practice. Their inclusion would have added further refinement to diabetes subgroups as they are surrogate measures of insulin resistance. As a result of the small number of people of mixed ethnicity, there was not sufficient statistical power to generate reproducible latent classes in this ethnic group or estimate interpretable associations with vascular complications, treatment initiation, and HbA1c trajectories. The East London Primary Care Database is subject to the same strengths and biases as all routine data. 15 It is possible that, by using diagnostic codes to define diabetes, some individuals with type 1 diabetes may have been misclassified as having T2DM. Prescriptions for antidiabetic medications issued in primary care were identified in this study but it was not possible to determine whether prescriptions were filled or taken as indicated.
Finally, linked secondary care data were not available and acute vascular events

Provenance
Freely submitted; externally peer reviewed.

Competing interests
The authors have declared no competing interests.

Discuss this article
Contribute and read comments about this article: bjgp.org/letters coded in hospital settings only may have been missed.

Comparison with existing literature
A 2020 systematic review of cluster-based approaches to diabetes subtypes identified 14 studies, of which the majority found identical clusters. 16 First reported by Ahlqvist and colleagues in 2018, clusters related to T2DM included: 'severe insulin deficient diabetes', 'severe insulin resistant diabetes', MOD, and MARD. 4 These subgroups have been replicated across a number of settings including the Netherlands and Scotland, 7 Ukraine, 17 China, 18 and India. 5 Only one included study was conducted in the UK; however, this was a cross-sectional hospital-based study of 33 children with type 1 diabetes. 19 The subgroups identified in the current study closely align with those previously reported. The MARD and MARD subgroups in the current study resembled the MARD and MOD clusters in previous studies. The SHD cluster was specific to the current study and is likely to include the previously reported 'severe insulin deficient diabetes' and 'severe insulin resistant diabetes' clusters.

Implications for research and practice
This study has demonstrated that pragmatic T2DM subgroups can be generated using real-world primary care data and these subgroups can identify important differences in clinical characteristics and vascular outcomes. These findings have wider generalisability to national and global populations. The identification of these subgroups provides a useful heuristic for characterising differences between patients at the population level, including in ethnically diverse populations. The identification of these subgroups at diagnosis could help move away from a 'one size fits all' care pathway and instead offer a stratified care pathway that is readily enabled by clinical data systems. This stratification could enhance the care of those most at risk of complications, and de-intensify care for those who are not. Opportunities to stratify care are particularly relevant in the context of healthcare services constrained in time and financial resources in which many people with T2DM, and clinicians managing their care, feel their care needs are not met. 20 The ability to apply data-driven clustering to real-world data offers wider generalisability to other chronic diseases largely managed in primary care such as hypertension and CKD. Important next steps are to reproduce these findings in other multi-ethnic populations, using larger sample sizes, longer follow-up duration, and lipid profile measures to reproduce these findings at scale. Subsequently, empirical evaluation of subgroup-stratified care using a cluster randomised controlled trial with long-term measurement of outcomes is likely to be necessary.