How well do raters agree on the development stage of Caenorhabditis elegans?

Annabel A. Ferguson, Richard A. Bilonick, Jeanine M. Buchanich, Gary M. Marsh, Alfred L. Fisher

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The assessment of inter-rater reliability is a topic that is infrequently addressed in Caenorhabditis elegans research, despite the existence of sophisticated statistical methods and the strong interest in the field in obtaining reliable and accurate data. This study applies statistical modeling as a robust means of analyzing the performance of worm researchers measuring the stage of worm development in terms of the two independent factors that comprise "agreement", which are (1) accuracy, representing trueness, a lack of systematic differences, or lack of bias, and (2) precision, representing reliability or the extent to which random differences are small. In our study, multiple raters assessed the same sample of worms to determine the developmental stage of each animal, and we collected data linking each scorer with their assessment for each worm. To describe the agreement of the raters, we developed a structural equation model with latent variables and thresholds, which assumes that all the raters are jointly scoring each worm. This common factor model separately quantifies the two aspects of agreement. The stage-specific thresholds examine accuracy and characterize the relative biases of each rater during the scoring process. The factor loadings for each rater examine the precision and characterizes the random error of the rater. Within our group, we found that the overall agreement was good, while certain adjustments in particular raters would have decreased systematic differences. Hence, the use of developmental stage as an experimental outcome can be both accurate and precise.

Original languageEnglish (US)
Article numbere0132365
JournalPloS one
Volume10
Issue number7
DOIs
StatePublished - Jul 14 2015

Fingerprint

Structural Models
Caenorhabditis elegans
Research Personnel
Random errors
Research
developmental stages
Statistical methods
Animals
statistical analysis
researchers
animals
sampling

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • General

Cite this

How well do raters agree on the development stage of Caenorhabditis elegans? / Ferguson, Annabel A.; Bilonick, Richard A.; Buchanich, Jeanine M.; Marsh, Gary M.; Fisher, Alfred L.

In: PloS one, Vol. 10, No. 7, e0132365, 14.07.2015.

Research output: Contribution to journalArticle

Ferguson, Annabel A. ; Bilonick, Richard A. ; Buchanich, Jeanine M. ; Marsh, Gary M. ; Fisher, Alfred L. / How well do raters agree on the development stage of Caenorhabditis elegans?. In: PloS one. 2015 ; Vol. 10, No. 7.
@article{7478e1ef12734f4f86d0ec2b758c0153,
title = "How well do raters agree on the development stage of Caenorhabditis elegans?",
abstract = "The assessment of inter-rater reliability is a topic that is infrequently addressed in Caenorhabditis elegans research, despite the existence of sophisticated statistical methods and the strong interest in the field in obtaining reliable and accurate data. This study applies statistical modeling as a robust means of analyzing the performance of worm researchers measuring the stage of worm development in terms of the two independent factors that comprise {"}agreement{"}, which are (1) accuracy, representing trueness, a lack of systematic differences, or lack of bias, and (2) precision, representing reliability or the extent to which random differences are small. In our study, multiple raters assessed the same sample of worms to determine the developmental stage of each animal, and we collected data linking each scorer with their assessment for each worm. To describe the agreement of the raters, we developed a structural equation model with latent variables and thresholds, which assumes that all the raters are jointly scoring each worm. This common factor model separately quantifies the two aspects of agreement. The stage-specific thresholds examine accuracy and characterize the relative biases of each rater during the scoring process. The factor loadings for each rater examine the precision and characterizes the random error of the rater. Within our group, we found that the overall agreement was good, while certain adjustments in particular raters would have decreased systematic differences. Hence, the use of developmental stage as an experimental outcome can be both accurate and precise.",
author = "Ferguson, {Annabel A.} and Bilonick, {Richard A.} and Buchanich, {Jeanine M.} and Marsh, {Gary M.} and Fisher, {Alfred L.}",
year = "2015",
month = "7",
day = "14",
doi = "10.1371/journal.pone.0132365",
language = "English (US)",
volume = "10",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "7",

}

TY - JOUR

T1 - How well do raters agree on the development stage of Caenorhabditis elegans?

AU - Ferguson, Annabel A.

AU - Bilonick, Richard A.

AU - Buchanich, Jeanine M.

AU - Marsh, Gary M.

AU - Fisher, Alfred L.

PY - 2015/7/14

Y1 - 2015/7/14

N2 - The assessment of inter-rater reliability is a topic that is infrequently addressed in Caenorhabditis elegans research, despite the existence of sophisticated statistical methods and the strong interest in the field in obtaining reliable and accurate data. This study applies statistical modeling as a robust means of analyzing the performance of worm researchers measuring the stage of worm development in terms of the two independent factors that comprise "agreement", which are (1) accuracy, representing trueness, a lack of systematic differences, or lack of bias, and (2) precision, representing reliability or the extent to which random differences are small. In our study, multiple raters assessed the same sample of worms to determine the developmental stage of each animal, and we collected data linking each scorer with their assessment for each worm. To describe the agreement of the raters, we developed a structural equation model with latent variables and thresholds, which assumes that all the raters are jointly scoring each worm. This common factor model separately quantifies the two aspects of agreement. The stage-specific thresholds examine accuracy and characterize the relative biases of each rater during the scoring process. The factor loadings for each rater examine the precision and characterizes the random error of the rater. Within our group, we found that the overall agreement was good, while certain adjustments in particular raters would have decreased systematic differences. Hence, the use of developmental stage as an experimental outcome can be both accurate and precise.

AB - The assessment of inter-rater reliability is a topic that is infrequently addressed in Caenorhabditis elegans research, despite the existence of sophisticated statistical methods and the strong interest in the field in obtaining reliable and accurate data. This study applies statistical modeling as a robust means of analyzing the performance of worm researchers measuring the stage of worm development in terms of the two independent factors that comprise "agreement", which are (1) accuracy, representing trueness, a lack of systematic differences, or lack of bias, and (2) precision, representing reliability or the extent to which random differences are small. In our study, multiple raters assessed the same sample of worms to determine the developmental stage of each animal, and we collected data linking each scorer with their assessment for each worm. To describe the agreement of the raters, we developed a structural equation model with latent variables and thresholds, which assumes that all the raters are jointly scoring each worm. This common factor model separately quantifies the two aspects of agreement. The stage-specific thresholds examine accuracy and characterize the relative biases of each rater during the scoring process. The factor loadings for each rater examine the precision and characterizes the random error of the rater. Within our group, we found that the overall agreement was good, while certain adjustments in particular raters would have decreased systematic differences. Hence, the use of developmental stage as an experimental outcome can be both accurate and precise.

UR - http://www.scopus.com/inward/record.url?scp=84940764389&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940764389&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0132365

DO - 10.1371/journal.pone.0132365

M3 - Article

C2 - 26172989

AN - SCOPUS:84940764389

VL - 10

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 7

M1 - e0132365

ER -