language-iconOld Web
English
Sign In

Cohen's kappa

Cohen's kappa coefficient (κ) is a statistic that is used to measure inter-rater reliability (and also Intra-rater reliability) for qualitative (categorical) items . It is generally thought to be a more robust measure than simple percent agreement calculation, as κ takes into account the possibility of the agreement occurring by chance. There is controversy surrounding Cohen's kappa due to the difficulty in interpreting indices of agreement. Some researchers have suggested that it is conceptually simpler to evaluate disagreement between items. See the Limitations section for more detail. Cohen's kappa coefficient (κ) is a statistic that is used to measure inter-rater reliability (and also Intra-rater reliability) for qualitative (categorical) items . It is generally thought to be a more robust measure than simple percent agreement calculation, as κ takes into account the possibility of the agreement occurring by chance. There is controversy surrounding Cohen's kappa due to the difficulty in interpreting indices of agreement. Some researchers have suggested that it is conceptually simpler to evaluate disagreement between items. See the Limitations section for more detail. The first mention of a kappa-like statistic is attributed to Galton (1892); see Smeeton (1985).. The seminal paper introducing kappa as a new technique was published by Jacob Cohen in the journal Educational and Psychological Measurement in 1960. Cohen's kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories. The definition of κ { extstyle kappa } is: where po is the relative observed agreement among raters (identical to accuracy), and pe is the hypothetical probability of chance agreement, using the observed data to calculate the probabilities of each observer randomly seeing each category. If the raters are in complete agreement then κ = 1 { extstyle kappa =1} . If there is no agreement among the raters other than what would be expected by chance (as given by pe), κ = 0 { extstyle kappa =0} . It is possible for the statistic to be negative, which implies that there is no effective agreement between the two raters or the agreement is worse than random. For categories k, number of items N and n k i {displaystyle n_{ki}} the number of times rater i predicted category k:

[ "Statistics", "Machine learning", "Kappa", "chance agreement" ]
Parent Topic
Child Topic
    No Parent Topic
Baidu
map