Cracking the Code: Providing Insight Into the Fundamentals of Research and Evidence-Based PracticeA Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research
Introduction
Before any measurement instruments or assessment tools can be used for research or clinical applications, their reliability must be established. Reliability is defined as the extent to which measurements can be replicated.1 In other words, it reflects not only degree of correlation but also agreement between measurements.2, 3 Mathematically, reliability represents a ratio of true variance over true variance plus error variance.4, 5 This concept is illustrated in Table 1. As indicated in the calculation, reliability value ranges between 0 and 1, with values closer to 1 representing stronger reliability. Historically, Pearson correlation coefficient, paired t test, and Bland-Altman plot have been used to evaluate reliability.3, 6, 7, 8 However, paired t test and Bland-Altman plot are methods for analyzing agreement, and Pearson correlation coefficient is only a measure of correlation, and hence, they are nonideal measures of reliability. A more desirable measure of reliability should reflect both degree of correlation and agreement between measurements. Intraclass correlation coefficient (ICC) is such as an index.
Intraclass correlation coefficient was first introduced by Fisher9 in 1954 as a modification of Pearson correlation coefficient. However, modern ICC is calculated by mean squares (ie, estimates of the population variances based on the variability among a given set of measures) obtained through analysis of variance. Nowadays, ICC has been widely used in conservative care medicine to evaluate interrater, test-retest, and intrarater reliability (see Table 2 for their definitions).10, 11, 12, 13, 14, 15, 16, 17 These evaluations are fundamental to clinical assessment because, without them, we have no confidence in our measurements, nor can we draw any rational conclusions from our measurements.
There are different forms of ICC that can give different results when applied to the same set of data, and the ways for reporting ICC may vary between researchers. Given that different forms of ICC involve distinct assumptions in their calculations and will lead to different interpretations, it is important that researchers are aware of the correct application of each form of ICC, use the appropriate form in their analyses, and accurately report the form they used. The purpose of this article is to provide a practical guideline for clinical researchers to choose the correct form of ICC for their reliability analyses and suggest the best practice of reporting ICC parameters in scientific publications. This article also aims to guide readers to understand the basic concept of ICC so that they can apply it to better interpret the reliability data while reading an article with related topics.
Section snippets
How to Select the Correct ICC Form for Interrater Reliability Studies
McGraw and Wong18 defined 10 forms of ICC based on the “Model” (1-way random effects, 2-way random effects, or 2-way fixed effects), the “Type” (single rater/measurement or the mean of k raters/measurements), and the “Definition” of relationship considered to be important (consistency or absolute agreement). These ICC forms and their formulation are summarized in Table 3.
Selection of the correct ICC form for interrater reliability study can be guided by 4 questions: (1) Do we have the same set
Why ICC Matters
Conservative care practitioners regularly perform various measurements. How reliable these measurements are in themselves is clearly essential knowledge to help the practitioners to decide whether a particular measurement is of any value. Without conducting a reliability study personally, this knowledge can only be obtained through scientific literatures. Given that ICC is a widely used reliability index in the literature, an understanding of ICC will help readers to make sense of their own
Conclusion
In summary, ICC is a reliability index that reflects both degree of correlation and agreement between measurements. It has been widely used in conservative care medicine to evaluate interrater, test-retest, and intrarater reliability of numerical or continuous measurements. Given that there are 10 different forms of ICC and each form involves distinct assumptions in their calculations and will lead to different interpretations, it is important for researchers and readers to understand the
Funding Sources and Conflicts of Interest
No funding sources or conflicts of interest were reported for this study.
References (21)
- et al.
Reliability: what is it, and how is it measured?
Physiotherapy
(2000) - et al.
Reliability of zygapophysial joint space measurements made from magnetic resonance imaging scans of acute low back pain subjects: comparison of 2 statistical methods
J Manipulative Physiol Ther
(2010) - et al.
Paraspinal skin temperature patterns: an interexaminer and intraexaminer reliability study
J Manipulative Physiol Ther
(2004) - et al.
A mechano-acoustic indentor system for in vivo measurement of nonlinear elastic properties of soft tissue
J Manipulative Physiol Ther
(2011) - et al.
Reliability of detection of lumbar lateral shift
J Manipulative Physiol Ther
(2003) - et al.
Reliability of the Goutallier classification in quantifying muscle fatty degeneration in the lumbar multifidus using magnetic resonance imaging
J Manipulative Physiol Ther
(2014) - et al.
PulStar differential compliance spinal instrument: a randomized interexaminer and intraexaminer reliability study
J Manipulative Physiol Ther
(2003) - et al.
Measurement of lumbar lordosis in static standing posture with and without high-heeled shoes
J Chiropr Med
(2012) - et al.
Interpretation and use of medical statistics
(2000) - et al.
Foundations of clinical research: applications to practice
(2000)
Cited by (14360)
Dynamic changes in AI-based analysis of endometrial cellular composition: Analysis of PCOS and RIF endometrium
2024, Journal of Pathology InformaticsTranslation and validation of menopause quick 6 (MQ6) into the Malay language
2024, BMC Primary CareThe validity of the Physical Literacy in Children Questionnaire in children aged 4 to 12
2024, BMC Public Health