Feature selection and data sampling methods for learning reputation dimensions: The University of Amsterdam at RepLab 2014

Cristina Gârbacea,Manos Tsagkias,M. de Rijke

Feature selection and data sampling methods for learning reputation dimensions: The University of Amsterdam at RepLab 2014

2014

Cristina Gârbacea
Manos Tsagkias
M. de Rijke

We report on our participation in the reputation dimension task of the CLEFRepLab 2014 evaluation initiative, i.e., to classify social media updates into eight predefined categories. We address the task by using corpus-based meth- ods to extract textual features from the labeled training data to train two classifiers in a supervised way. We explore three sampling strategies for selecting training examples, and probe their effect on classification performance. We find that all our submitted runs outperform the baseline, and that elaborate feature selection methods coupled with balanced datasets help improve classification accuracy.

Keywords:

Social media
Training set
Feature selection
Sampling (statistics)
Data mining
Computer science
Reputation
data sampling
Artificial intelligence
Machine learning
Clef

Correction
Source
Cite
Save

References

Citations