Table 1

Baseline characteristics of women in the derivation and validation cohorts

Derivation cohort (n = 1 240 864)Validation cohort (n = 667 603)
Mean age (SD)50.3 (17.5)50.1 (17.4)
BMI recorded, n (%)924 268 (74.5)480 001 (71.9)
Mean BMI (SD)25.8 (4.9)25.8 (4.9)
Mean deprivation score, (SD)−0.4 (3.3)−0.2 (3.5)
Smoking status, n (%)
 Non-smoker640 775 (51.6)342 137 (51.2)
 Ex-smoker215 060 (17.3)105 411 (15.8)
 Current: amount not recorded26 749 (2.2)14 151 (2.1)
 Light (<10/day)68 059 (5.5)35 608 (5.3)
 Moderate (10–19/day)92 337 (7.4)50 146 (7.5)
 Heavy (≥20/day)50 831 (4.1)27 711 (4.2)
 Smoking not recorded147 053 (11.9)92 439 (13.9)
Alcohol status, n (%)
 None325 730 (26.3)169 033 (25.3)
 Trivial <1 unit/day402 453 (32.4)203 775 (30.5)
 Light 1–2 units/day192 736 (15.5)100 051 (15.0)
 Moderate or heavy ≥3 units/day25 003 (2.0)13 039 (2.0)
 Alcohol not recorded294 942 (23.8)181 705 (27.2)
Medical and family history, n (%)
 Prior cancer34 324 (2.8)17 863 (2.7)
 Family history of breast cancer45 621 (3.7)22 043 (3.3)
 Family history of gastrointestinal cancer18 759 (1.5)8780 (1.3)
 Family history of ovarian cancer2417 (0.2)1192 (0.2)
 Benign breast disease41 728 (3.4)20 687 (3.1)
 Chronic pancreatitis1042 (0.1)539 (0.1)
 Chronic obstructive pulmonary disease21 516 (1.7)11 358 (1.7)
 Type 1 diabetes3523 (0.3)1921 (0.3)
 Type 2 diabetes37 827 (3.0)20 372 (3.1)
 Endometriosis13 563 (1.1)7153 (1.1)
 Endometrial hyperplasia or polyp3235 (0.3)1621 (0.2)
 Fibroids18 796 (1.5)10 291 (1.5)
 Polycystic ovarian disease10 756 (0.9)5993 (0.9)
 Rheumatoid arthritis13 825 (1.1)7153 (1.1)
 Systemic lupus erythematosus1443 (0.1)687 (0.1)
 HIV or AIDS4186 (0.3)2907 (0.4)
 Oral contraceptive120 840 (9.7)61 830 (9.3)
 Hormone replacement therapy26 275 (2.1)13 402 (2.0)
 Anaemia38 804 (3.1)19 921 (3.0)
  • BMI = body mass index. SD = standard deviation.