Annals of Emergency Medicine
Volume 54, Issue 6 , Pages 843-853 , December 2009

A Consideration of the Measurement and Reporting of Interrater Reliability: Answers to the July 2009 Journal Club Questions

References 

  1. Cruz CO, Meshberg EG, Shofer FS, et al. Interrater reliability and accuracy of clinicians and trained research assistants performing prospective data collection in emergency department patients with potential acute coronary syndrome. Ann Emerg Med. 2009;54:1–7
  2. Landis JR, Koch GC. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174
  3. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Measure. 1960;20:37–46
  4. Uebersax J. Statistical methods for rater agreement. http://ourworld.compuserve.com/homepages/jsuebersax/agree.htmAccessed May 31, 2009
  5. Wuensch KL. Inter-rater agreement. http://core.ecu.edu/psyc/wuenschk/docs30/InterRater.docAccessed May 18, 2009
  6. Kendall M. A new measure of rank correlation. Biometrika. 1938;30:81–89
  7. Spearman C. The proof and measurement of association between two things. Am J Psychol. 1904;15:72–101
  8. Noether GE. Why Kendall tau?. http://rsscse.org.uk/ts/bts/noether/text.htmlAccessed May 18, 2009
  9. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–428
  10. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–310
  11. Dewitte K, Fierens C, Stockl D, et al. Application of the Bland-Altman plot for interpretation of method-comparison studies: a critical investigation of its practice. Clin Chem. 2002;48:799–801
  12. Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed.. New York, NY: John Wiley & Sons; 1981;
  13. Feinstein AR, Cicchetti DV. High agreement but low kappa, I: the problems of two paradoxes. J Clin Epidemiol. 1990;43:543–549
  14. Cicchetti DV, Feinstein AR. High agreement but low kappa, II: resolving the paradoxes. J Clin Epidemiol. 1990;43:551–558

 Section editors: Tyler W. Barrett, MD; David L. Schriger, MD, MPH

 Editor's Note: This 10th installment of Annals of Emergency Medicine Journal Club departs slightly from previous installments by focusing on a single methodological issue, the measurement of reliability. We use the Cruz et al article as a jumping-off point for our discussion.1 Although this installment may be appropriate for some residency journal clubs (particularly if they use our more basic questions and add some clinical questions about the article), we suspect that it will be of greater value to research fellows and researchers.Readers should recognize that these are suggested answers and, although it is hoped that they are correct, are by no means comprehensive. There are many other points that could be made about these questions or about the article in general. Questions are rated “novice,” () “intermediate,” () and “advanced” ().

PII: S0196-0644(09)01258-X

doi: 10.1016/j.annemergmed.2009.07.013

Annals of Emergency Medicine
Volume 54, Issue 6 , Pages 843-853 , December 2009