Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model.
2019
In this work, we introduce a simple yet efficient post-processing model for
automatic speechrecognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and semantically correct text. We investigate different strategies for regularizing and optimizing the model and show that extensive data augmentation and the initialization with pre-trained weights are required to achieve good performance. On the LibriSpeech benchmark, our method demonstrates significant improvement in
word error rateover the baseline
acoustic modelwith greedy decoding, especially on much noisier
dev-other and test-other portions of the evaluation dataset. Our model also outperforms baseline with 6-gram
language modelre-scoring and approaches the performance of re-scoring with Transformer-XL neural
language model.
Keywords:
-
Correction
-
Source
-
Cite
-
Save
29
References
1
Citations
NaN
KQI