Incorporate Lexicon into Self-training: A Distantly Supervised Chinese Medical NER.

Zhen Gan,Zhucong Li,Baoli Zhang,Jing Wan,Yubo Chen,Kang Liu,Jun Zhao,Yafei Shi,Shengping Liu

Incorporate Lexicon into Self-training: A Distantly Supervised Chinese Medical NER.

2021

Medical named entity recognition (NER) tasks usually lack sufficient annotation data. Distant supervision is often used to alleviate this problem, which can quickly and automatically generate annotated training datasets through dictionaries. However, the current distantly supervised method suffers from noisy labeling due to limited coverage of the dictionary, which will cause a large number of unlabeled entities. We call this phenomenon an incomplete annotation problem. To tackle the incomplete annotation problem, we propose a novel distantly supervised method for Chinese medical NER. Specifically, we propose a high recall self-training mechanism to recall potential unlabeled entities in the distant supervision dataset. To reduce error in the high recall self-training, we propose a fine-grained lexicon enhanced scoring and ranking mechanism. Our method improves 3.2% and 5.03% compared to the baseline models on the dataset we proposed and a benchmark dataset for Chinese medical NER.

Keywords:

Correction
Source
Cite
Save

References

Citations