Intended for healthcare professionals

Endgames Statistical Question

Randomised controlled trials: balance in baseline characteristics

BMJ 2014; 349 doi: https://doi.org/10.1136/bmj.g5721 (Published 19 September 2014) Cite this as: BMJ 2014;349:g5721
  1. Philip Sedgwick, reader in medical statistics and medical education1
  1. 1Institute for Medical and Biomedical Education, St George’s, University of London, London, UK
  1. p.sedgwick{at}sgul.ac.uk

Researchers investigated the effectiveness of helmet therapy for positional skull deformation in infants aged 5-6 months. A randomised controlled trial study design was used. Helmet therapy was delivered for six months. Control treatment was no treatment, which consisted of the natural course of skull deformation. Participants were 84 infants aged 5-6 months with moderate to severe skull deformation. Infants were assigned to helmet therapy (n=42) or the natural course of the condition (n=42) using random allocation with a block size of eight.1

The main outcomes were anthropometric measurements of the change in skull shape from baseline to 24 months of age, including the oblique diameter difference index and cranioproportional index. Statistical hypothesis testing was used to compare the treatment groups in baseline characteristics, including sex distribution, age, birth rank, health problems, and ethnicity, plus education level of the parents. The researchers reported that the treatment groups were comparable in baseline characteristics.

There were no significant differences between the treatment groups in the main outcome measures. Furthermore, helmet therapy was associated with a high prevalence of side effects. Because helmet therapy also involved high costs, the researchers discouraged the use of this treatment in healthy infants with moderate to severe skull deformation.

Which of the following statements, if any, are true?

  • a) The purpose of randomisation was to achieve treatment groups with similar baseline characteristics

  • b) Block randomisation was used to achieve similar numbers of participants in the treatment groups

  • c) Comparability between treatment groups in baseline characteristics was essential to promote internal validity

  • d) Any imbalance in baseline characteristics between treatment groups would be unaffected by sample size

Answers

Statements a, b, and c are true, whereas d is false.

Participants were assigned to treatment groups using random allocation with a block size of eight. Random allocation, often referred to simply as randomisation, resulted in each infant having an equal probability of being assigned to the helmet therapy or control group. The purpose of randomisation was to eliminate allocation bias and achieve treatment groups similar in baseline characteristics (a is true). Allocation bias is the systematic difference between participants in how they are allocated to treatment groups.2 If the treatment groups had differed in baseline characteristics, confounding might have occurred. Confounding is a difference between treatment groups in those factors that influence treatment and outcome measures, such as demographic characteristics, prognostic factors, and other characteristics that may influence someone to participate in or withdraw from a trial. Therefore, if confounding had existed, any differences between treatment groups in outcome may not have been due to differences in the treatment received but to differences in baseline characteristics.

Random allocation does not guarantee that treatment groups will have similar baseline characteristics. Nonetheless, any imbalance would matter only if such characteristics influenced treatment or the outcome measures. The risk of imbalance is greater for trials with small samples. In particular, it is not guaranteed that treatment groups will have equal or similar numbers of participants after randomisation. A balance in numbers is essential for confounding to be minimised. A greater balance in numbers of participants is achieved as the sample size increases. The sample size in the above trial was small. To ensure similar numbers of participants in the treatment groups and thereby minimise confounding, block random allocation with blocks of size eight was used (b is true). Block random allocation, also known simply as block randomisation, has been described in a previous question.3

The researchers presented the baseline characteristics for both treatment groups (table). This was essential for the success of randomisation to be assessed. Inspection of the baseline characteristics suggests that the treatment groups were comparable. It was important for the treatment groups to be similar in baseline characteristics for the trial to have internal validity (c is true). Described in a previous question,4 internal validity is the extent to which the observed treatment effects can be attributed to differences in treatment and not confounding, thereby allowing the inference of causality to be ascribed to treatment. Description of the baseline characteristics also permitted external validity to be assessed. External validity is the extent to which the study results can be generalised to the population that the sample was meant to represent.4 More generally, presenting the baseline characteristics allows readers to assess whether the results of the trial can be generalised to the patients in their clinical practice.

Characteristics of infants with skull deformation. Values are numbers (percentages) unless stated otherwise

View this table:

In the trial above, the researchers compared the treatment groups in baseline characteristics using statistical significance testing. The intention was to assess whether any important differences existed at baseline. Such testing could detect possible subversion of the allocation process. However, it is generally considered inappropriate to compare treatment groups using statistical testing. Differences between treatment groups would have been investigated using traditional statistical hypothesis testing, with a null and alternative hypothesis.5 For each baseline characteristic, the null hypothesis would have stated that no difference existed between the treatment groups in the population from which the sample was obtained. However, the participants were allocated using randomisation and by definition no difference should exist. Hence, if randomisation was carried out correctly there should be no evidence to reject the null hypothesis. Statistical hypothesis testing would therefore be assessing only whether randomisation achieved its aim—that is, to generate treatment groups with similar baseline characteristics. For the trial above, any differences in baseline characteristics would have occurred because of random variation when allocating participants from a small sample. Generally, as sample size increases, differences between treatment groups in baseline characteristics are expected to become smaller (d is false).

Researchers often collect information on many baseline characteristics. In such cases, comparison of treatment groups using statistical significance testing involves multiple testing. As described in a previous question,6 this has the potential for type I errors—that is, as a result of significance testing for a baseline characteristic the null hypothesis is rejected in favour of the alternative when no difference exists between treatment groups in the population. Therefore, undertaking statistical hypothesis testing on multiple baseline characteristics may produce misleading results.

Testing for significance between treatment groups in baseline characteristics is generally considered inappropriate, unless it is suspected that the randomisation of participants was not conducted properly. Nonetheless, the visual inspection of differences between groups would be considered an unsound method. Major differences in baseline characteristics are rare for clinical trials with large sample sizes. However, many trials are conducted on a small number of participants, and in such trials an imbalance between treatment groups in baseline characteristics is possible. However, such an imbalance would matter only if the characteristics were important and influenced treatment or the outcome measures. Methods of randomisation exist to minimise imbalance between treatment groups in important baseline characteristics. These include block randomisation, as used in the above trial, and stratified randomisation.7 Advanced statistical techniques can be used to adjust outcome measures for baseline characteristics. However, such approaches should be used only on the basis of previous knowledge of the influence of a baseline characteristic on the outcomes, and not on evidence of a significant difference between treatment groups at baseline.

Notes

Cite this as: BMJ 2014;349:g5721

Footnotes

  • Competing interests: None declared.

References

View Abstract