Methods for integrating rule-based and statistical systems for Arabic to English machine translation
2012
This article presents several techniques for integrating information from a
rule-based machine translation(RBMT) system into a statistical
machine translation(SMT) framework. These techniques are grouped into three parts that correspond to the type of
information integrated: the morphological, lexical, and system levels. The first part presents techniques that use information from a rule-based morphological tagger to do
morphemesplitting of the Arabic
source text. We also compare with the results of using a statistical morphological tagger. In the second part, we present two ways of using
Arabic diacriticsto improve SMT results, both based on binary decision trees. The third part presents a system combination method that combines the outputs of the RBMT and the SMT systems, leveraging the strength of each. This article shows how language
specific informationobtained through a deterministic rule-based process can be used to improve SMT, which is mostly language-independent.
Keywords:
- Natural language processing
- Machine translation
- Computer science
- Arabic diacritics
- Artificial intelligence
- Rule-based system
- Machine translation software usability
- Example-based machine translation
- Source text
- Transfer-based machine translation
- Rule-based machine translation
- Computational linguistics
-
Correction
-
Source
-
Cite
-
Save
30
References
9
Citations
NaN
KQI