Methods for integrating rule-based and statistical systems for Arabic to English machine translation

2012
This article presents several techniques for integrating information from a rule-based machine translation(RBMT) system into a statistical machine translation(SMT) framework. These techniques are grouped into three parts that correspond to the type of information integrated: the morphological, lexical, and system levels. The first part presents techniques that use information from a rule-based morphological tagger to do morphemesplitting of the Arabic source text. We also compare with the results of using a statistical morphological tagger. In the second part, we present two ways of using Arabic diacriticsto improve SMT results, both based on binary decision trees. The third part presents a system combination method that combines the outputs of the RBMT and the SMT systems, leveraging the strength of each. This article shows how language specific informationobtained through a deterministic rule-based process can be used to improve SMT, which is mostly language-independent.
    • Correction
    • Source
    • Cite
    • Save
    30
    References
    9
    Citations
    NaN
    KQI
    []
    Baidu
    map