Feature selection and data sampling methods for learning reputation dimensions: The University of Amsterdam at RepLab 2014
2014
We report on our participation in the reputation dimension task of the
CLEFRepLab 2014 evaluation initiative, i.e., to classify social media updates into eight predefined categories. We address the task by using corpus-based
meth- ods to extract textual features from the labeled training data to train two classifiers in a supervised way. We explore three sampling strategies for selecting training examples, and probe their effect on classification performance. We find that all our submitted runs outperform the baseline, and that elaborate feature selection methods coupled with balanced datasets help improve classification accuracy.
Keywords:
-
Correction
-
Source
-
Cite
-
Save
20
References
3
Citations
NaN
KQI