Offline to online speaker adaptation for real-time deep neural network based LVCSR systems

2018
In this study, we investigate an offline to online strategy for speaker adaptation of automatic speechrecognition systems. These systems are trained using the traditional feed-forward and the recent proposed lattice-free maximum mutual information (MMI) time-delay deep neural networks. In this strategy, the test speaker identity is modeled as an iVector which is offline estimated and then used in an online style during speech decoding. In order to ensure the quality of iVectors, we introduce a speaker enrollment stage which can ensure sufficient reliable speech for estimating an accurate and stable offline iVector. Furthermore, different iVector estimation techniques are also reviewed and investigated for speaker adaptation in large vocabulary continuous speech recognition(LVCSR) tasks. Experimental results on several real-time speech recognitiontasks demonstrate that, the proposed strategy can not only provide a fast decoding speed, but also can result in significant reductions in word error rates(WERs) than traditional iVector based speaker adaptation frameworks.
    • Correction
    • Source
    • Cite
    • Save
    46
    References
    0
    Citations
    NaN
    KQI
    []
    Baidu
    map