Feature selection and data sampling methods for learning reputation dimensions: The University of Amsterdam at RepLab 2014

2014
We report on our participation in the reputation dimension task of the CLEFRepLab 2014 evaluation initiative, i.e., to classify social media updates into eight predefined categories. We address the task by using corpus-based meth- ods to extract textual features from the labeled training data to train two classifiers in a supervised way. We explore three sampling strategies for selecting training examples, and probe their effect on classification performance. We find that all our submitted runs outperform the baseline, and that elaborate feature selection methods coupled with balanced datasets help improve classification accuracy.
    • Correction
    • Source
    • Cite
    • Save
    20
    References
    3
    Citations
    NaN
    KQI
    []
    Baidu
    map