Classifier chains

Classifier chains is a machine learning method for problem transformation in multi-label classification. It combines the computational efficiency of the Binary Relevance method while still being able to take the label dependencies into account for classification. Classifier chains is a machine learning method for problem transformation in multi-label classification. It combines the computational efficiency of the Binary Relevance method while still being able to take the label dependencies into account for classification. Problem transformation methods transform a multi-label classification problem in one or more single-label classification problems. In such a way existing single-label classification algorithms such as SVM and Naive Bayes can be used without modification. Several problem transformation methods exist. One of them is Binary Relevance method (BR). Given a set of labels L {displaystyle {mathit {L}},} and a data set with instances of the form ( x , Y ) {displaystyle {mathit {(x,Y)}},} where x {displaystyle {mathit {x}},} is a feature vector and Y ⊆ L {displaystyle Ysubseteq L} is a set of labels assigned to the instance. BR transforms the data set into | L | {displaystyle leftvert L ightvert } data sets and learns | L | {displaystyle leftvert L ightvert } binary classifiers H : X → { l , ¬ l } {displaystyle H:X ightarrow {l, eg l}} for each label l ∈ L {displaystyle lin L} . During this process the information about dependencies between labels is not preserved. This can lead to a situation where a set of labels is assigned to an instance although these labels never co-occur together in the data set. Thus, information about label co-occurrence can help to assign correct label combinations. Loss of this information can in some cases lead to decrease of the classification performance. Other approach, which takes into account label correlations is Label Powerset method (LP). Each different combination of labels in a data set is considered to be a single label. After transformation a single-label classifier H : X → P ( L ) {displaystyle H:X ightarrow {mathcal {P}}(L)} is trained where P ( L ) {displaystyle {mathcal {P}}(L)} is the power set of all labels in L {displaystyle {mathit {L}},} . The main drawback of this approach is that the number of label combinations grows exponentionally with the number of labels. For example, a multi-label data set with 10 labels can have up to 2 10 = 1024 {displaystyle 2^{10}=1024} label combinations. This increases the run-time of classification.

Parent Topic

Child Topic

No Parent Topic