Skip to main content

Home/ CTLT and Friends/ Group items tagged inter rater reliability

Rss Feed Group items tagged

Gary Brown

Matthew Lombard - 0 views

  • Which measure(s) of intercoder reliability should researchers use? [TOP] There are literally dozens of different measures, or indices, of intercoder reliability. Popping (1988) identified 39 different "agreement indices" for coding nominal categories, which excludes several techniques for interval and ratio level data. But only a handful of techniques are widely used. In communication the most widely used indices are: Percent agreement Holsti's method Scott's pi (p) Cohen's kappa (k) Krippendorff's alpha (a)
  • 5. Which measure(s) of intercoder reliability should researchers use? [TOP] There are literally dozens of different measures, or indices, of intercoder reliability. Popping (1988) identified 39 different "agreement indices" for coding nominal categories, which excludes several techniques for interval and ratio level data. But only a handful of techniques are widely used. In communication the most widely used indices are: Percent agreement Holsti's method Scott's pi (p) Cohen's kappa (k) Krippendorff's alpha (a) Just some of the indices proposed, and in some cases widely used, in other fields are Perreault and Leigh's (1989) Ir measure; Tinsley and Weiss's (1975) T index; Bennett, Alpert, and Goldstein's (1954) S index; Lin's (1989) concordance coefficient; Hughes and Garrett’s (1990) approach based on Generalizability Theory, and Rust and Cooil's (1994) approach based on "Proportional Reduction in Loss" (PRL). It would be nice if there were one universally accepted index of intercoder reliability. But despite all the effort that scholars, methodologists and statisticians have devoted to developing and testing indices, there is no consensus on a single, "best" one. While there are several recommendations for Cohen's kappa (e.g., Dewey (1983) argued that despite its drawbacks, kappa should still be "the measure of choice") and this index appears to be commonly used in research that involves the coding of behavior (Bakeman, 2000), others (notably Krippendorff, 1978, 1987) have argued that its characteristics make it inappropriate as a measure of intercoder agreement.
  • 5. Which measure(s) of intercoder reliability should researchers use? [TOP] There are literally dozens of different measures, or indices, of intercoder reliability. Popping (1988) identified 39 different "agreement indices" for coding nominal categories, which excludes several techniques for interval and ratio level data. But only a handful of techniques are widely used. In communication the most widely used indices are: Percent agreement Holsti's method Scott's pi (p) Cohen's kappa (k) Krippendorff's alpha (a) Just some of the indices proposed, and in some cases widely used, in other fields are Perreault and Leigh's (1989) Ir measure; Tinsley and Weiss's (1975) T index; Bennett, Alpert, and Goldstein's (1954) S index; Lin's (1989) concordance coefficient; Hughes and Garrett’s (1990) approach based on Generalizability Theory, and Rust and Cooil's (1994) approach based on "Proportional Reduction in Loss" (PRL). It would be nice if there were one universally accepted index of intercoder reliability. But despite all the effort that scholars, methodologists and statisticians have devoted to developing and testing indices, there is no consensus on a single, "best" one. While there are several recommendations for Cohen's kappa (e.g., Dewey (1983) argued that despite its drawbacks, kappa should still be "the measure of choice") and this index appears to be commonly used in research that involves the coding of behavior (Bakeman, 2000), others (notably Krippendorff, 1978, 1987) have argued that its characteristics make it inappropriate as a measure of intercoder agreement.
  •  
    for our formalizing of assessment work
  •  
    inter-rater reliability
1 - 1 of 1
Showing 20 items per page