Project Details
Abstract
Differences in word order between languages in a translation pair have cause difficulties for machine
translation. In particular, when involving a translation pair like Mandarin and English where the differences
in word order not only require local reordering, but also long-range reordering, the translation task becomes
challenging. With a state-of-the-art statistical machine translation system (SMT) like Google Translate, local
reordering can be handled using sufficient language data, and achieving a natural-sounding translation would
be possible. However, the reordering limit applied to SMT for the purpose of controlling the overall
translation quality and reducing computational complexity actually discourages long-range reordering.
Consequently, long-range reordering issues manifest in SMT for some difficult translation pairs. Although
measures have been proposed to tackle reordering issues, the results have not been overwhelmingly positive.
One of the key reasons is that the methods targeting long-range reordering issues are less efficient or even
less accurate in translating sentences requiring either no reordering or merely local reordering. If sentences
requiring different reordering levels can be separated, measures targeting long-range reordering issues can be
used on the right sentences without damaging other acceptable translations.
We plan to use machine learning algorithms to predict whether the to-be translated sentence requires
long-distance reordering treatments based on the recent performance of the MT system under investigation.
The sentences of the source language will be categorised into “without long-range reordering issues” or “with
long-range reordering issues”. The features used for the categorisation task are mostly linguistically
motivated, including the types of oblique phrases that modify the verb, types of subordinate clauses, and
types of adpositional phrases. The machine learning method will be compared with methods that calculate
translatability based solely on the features of the source text without the knowledge of current MT capability.
In addition, the annotated data used to build the model will also be analysed in order to gain a better
understanding of the reordering issues in SMT and provide directions for developing comprehensive
solutions to long-range reordering problems.
Although the proposed research will be based on the outputs of English-to-Chinese translation generated
by Google Translate, by changing the training data and adapting lexicon-specific features for the source
language, the same method can be applied to other MT systems for other language pairs with long-range
reordering issues.
Project IDs
Project ID:PB10406-1540
External Project ID:MOST104-2410-H182-008
External Project ID:MOST104-2410-H182-008
Status | Finished |
---|---|
Effective start/end date | 01/08/15 → 31/07/16 |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.