INTRODUCTION
Primary care in the UK generates an extraordinary amount of data. There are more than 300 million consultations annually, creating unrivalled opportunities for research.1 The volume of patients that consult primary care practitioners daily, the variety of clinical conditions, the diversity of populations, and the transfer from hand-written records to comprehensive electronic medical systems have heralded a new era in primary care research. Furthermore, the linkage of primary and secondary care data systems creates opportunities for prospective and retrospective studies and epidemiological insights into population health.2
Increasing accessibility of rich data has changed the landscape of research in the community. As well as large datasets based around electronic medical records, primary care researchers also have access to alternative sources of data, which are often free. Also, record linkage among datasets in safe havens can enhance the value of records further.3 The aim of this article is to highlight datasets that are available to primary care researchers and to give examples of how they have been used in primary care research. A new resource detailing primary care and community-based datasets is now available. This resource has been developed by the Farr Institute, an organisation that aims to build capacity in health informatics research (http://www.farrinstitute.org/). The resource is a catalogue of UK-based datasets with metadata (data that provide information about other data) that may be useful to both novice and experienced researchers in primary care.
DATASETS IN THE CATALOGUE
The catalogue has been divided into the following categories: electronic medical record data, quality of primary care services, prescribing data, audit, health surveys, special datasets, cohort studies, administrative dataset, and screening datasets. Available metadata include type of data, context and method of extraction, coverage, geography, duration, volume, granularity (level of detail), coding, consent, and access (including websites and contact details), and were reviewed by dataset custodians. A brief overview of some categories and examples of how the datasets have been used in research follows below.
Electronic medical record data
This section includes large national datasets such as Clinical Practice Research Datalink (CPRD) and QResearch, regional datasets such as the Secure Anonymised Information Linkage Databank (SAIL) based in Wales, and local databases such as Lambeth DataNet, which all use electronic medical records based on computer systems. Some datasets can be linked with secondary care data to carry out cross-sectional or cohort studies. A recent example is a cohort study that used data from the QResearch database linked to the national cancer registry, to develop and validate risk prediction equations to estimate survival in patient, with colorectal cancer.4
Quality of primary care services
UK Quality and Outcomes Framework (QOF) data are routinely collected by GP surgeries. Martin et al used QOF data to look at the recording of physical health targets of those with major mental illness compared with those with chronic kidney disease across the UK. Their findings suggested inequality in access to certain aspects of health care for patients with major mental illness.5
Prescribing data
Regional prescribing data are available across the UK and are often used in research studies, looking at cost-effectiveness of interventions or prescribing patterns. Ashworth and colleagues were able to show that reduced antibiotic prescribing in general practice was associated with decreased patient satisfaction, by linking national patient survey data and prescribing data for England.6
Audit
Although the majority of audits are based in secondary care, some audits are based in the community. The National Audit of Cancer Diagnosis in Primary Care, for example, has been used to study the variation of promptness in presentation of patients subsequently diagnosed with cancer.7
Health surveys
Health surveys for England, Scotland, Wales, and Northern Ireland are carried out on an annual basis and provide rich data on the health of the nation. Results from the Scottish Health Survey were used to show the relationship between dental health and cardiovascular disease mortality.8 Each country also has a national patient cancer experience survey that has been used to look at regional variations in cancer patient experience.9
Special datasets
An example of a special dataset is the Aberdeen Maternity and Neonatal Databank, which collects data from primary and secondary care. Lee et al used this dataset to look at maternal obesity during pregnancy and its association with major cardiovascular events in later life.10 By linking the data with the national register of deaths and Scottish Morbidity Record, they were able to determine that maternal obesity is associated with increased risk of premature death and cardiovascular disease.
Cohort studies
A number of cohort studies exist at national and regional level that collect patient data in the community. The largest cohort study is the UK Biobank, which holds data on 500 000 participants and has been used for a large variety of research studies. Recently, Flint and Cummins used Biobank data to confirm the association between active commuting and healthier bodyweight and composition, supporting the case for promoting active travel to prevent obesity in later life.11
Screening datasets
NHS screening datasets for each UK country are available for breast, cervical, and bowel cancer screening. Massat et al used screening data to look at variation in cervical and breast cancer screening coverage in England, determining the effect of deprivation, ethnicity, and urbanisation on screening uptake.12