Assessing rater performance without a "gold standard" using consensus theory

S C Weller; N C Mann

doi:10.1177/0272989X9701700108

Assessing rater performance without a "gold standard" using consensus theory

Med Decis Making. 1997 Jan-Mar;17(1):71-9. doi: 10.1177/0272989X9701700108.

Authors

S C Weller¹, N C Mann

Affiliation

¹ Department of Preventive Medicine and Community Health, University of Texas Medical Branch, Galveston 77555-1153, USA.

PMID: 8994153
DOI: 10.1177/0272989X9701700108

Abstract

This study illustrates the use of consensus theory to assess the diagnostic performances of raters and to estimate case diagnoses in the absence of a criterion or "gold" standard. A description is provided of how consensus theory "pools" information provided by raters, estimating rater competencies and differentially weighting their responses. Although the model assumes that raters respond without bias (i.e., sensitivity = specificity), a Monte Carlo simulation with 1,200 data sets shows that model estimates appear to be robust even with bias. The model is illustrated on a set of elbow radiographs, and consensus-model estimates are compared with those obtained from follow-up data. Results indicate that with high rater competencies, the model retrieves accurate estimates of competency and case diagnoses even when raters' responses are biased.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Adult
Bias
Child
Clinical Competence / statistics & numerical data
Diagnosis*
Elbow / diagnostic imaging
Elbow Injuries
Follow-Up Studies
Humans
Models, Statistical*
Monte Carlo Method
Observer Variation*
Patient Care Team / statistics & numerical data
Radiography