Beam Search

Download 242.04 Kb.

1 2 3 4 5 6 7 8

Bog'liq
cs344-beam-search-2feb11

Machine Translation

Goal is to find out the English sentence e given foreign language sentence f whose p(e|f) is maximum.
Translations are generated on the basis of statistical model.
Parameters are estimated using bilingual parallel corpora.

During decoding, the foreign input sentence f is segmented into a sequence of I phrases f1I. We assume a uniform probability distribution over all possible segmentations.
Each foreign phrase fi in f1I is translated into an English phrase ei. The English phrases may be reordered.
Phrase translation is modeled by a probability distribution φ(fi|ei).
Reordering of the English output phrases is modeled by a relative distortion probability distribution d(starti,endi-1)

We use a simple distortion model d(starti,endi-1) = α|starti-endi-1-1| with an appropriate value for the parameter α.
In order to calibrate the output length, we introduce a factor ω (called word cost) for each generated English word in addition to the trigram language model pLM.
This is a simple means to optimize performance. Usually, this factor is larger than 1, biasing toward longer output.
In summary, the best English output sentence ebest given a foreign input sentence f according to our model is

Download 242.04 Kb.

Do'stlaringiz bilan baham:

1 2 3 4 5 6 7 8