Машинный перевод


Download 7.9 Kb.
Sana30.04.2023
Hajmi7.9 Kb.
#1403120
Bog'liq
machine translation

Machine Translation

MT

Introduction

  • sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another (http://en.wikipedia.org/)
  • Use: translation of large amount of date in the shortest possible time
    • Standard documents
    • Instructions and manuals
    • Web sites, multilingual search
    • Reference information(addresses, recipes, etc.)
  • Aim: to understand the main contents of the document in a foreign language unknown to the user
  • NOT to be used instead of human translation !!!

Approaches to machine translation

  • Rule-based approach
  • Statistical
  • Example-based approach
  • Hybrid machine translation

Rule-based translation

Stages

  • Morphological analyses of source language
  • Parsing source language (syntactic groups)
  • Getting syntactic information about each word
  • Dictionary based translation
  • example:

  • A girl eats an apple. (Eng.-Ger.)
  • stages of translation:
  • 1st: getting basic part-of-speech information of each source word: a = ind.art.; girl = n.; eats = v.; an = ind.art.; apple = n.
  • 2nd: getting syntactic information about the verb “to eat”: here: eat – Pr. Simple, 3rd Pers. Sing., Act. V.
  • 3rd: parsing the source sentence:(an apple) = the object of eat
  • 4th: translate English words into Germana (category = indef.article) => ein (category = indef.article)girl (category = noun) => Mädchen…
  • 5th: finding appropriate inflected forms: A girl eats an apple. => Ein Mädchen isst einen Apfel.

Statistical translation

  • Translations are generated according to probability distribution on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora
  • Benefits

  • Better use of resources
  • More natural translations
  • No programmers or linguists* involved
  • Shortcomings

  • Corpus creation can be costly for users with limited resources.
  • The results are unexpected. Superficial fluency can be deceiving.
  • Statistical machine translation does not work well between languages that have significantly different word orders

Статистический перевод

  • Основа - параллельный корпус
  • Вероятности назначаются подсчетом наиболее вероятного варианта перевода
  • Оценки вероятности зависят от объема и качества обучающего корпуса
  • Лингвистическая информация: разбиение на предложения, графематический анализ, морфология
  • При наличии корпуса простейшая система перевода может быть сделана на 2 недели

Rule-based vs. statistical


news:
document:

Rule-based translation

Types

  • Dictionary-based (direct)
  • Transfer-based
  • Interlingual

Dictionary-based (direct)

  • word by word translation
  • with or without morphological analysis or lemmatisation
  • Application

    translation of long lists of phrases on the subsentential (i.e., not a full sentence) level, e.g. lists, inventories or simple catalogs of products and services.

Direct translation example

Transfer-based machine translation

1. Analyzing the input text for morphology and syntax (and sometimes semantics)

2. Creating an internal representation

3. Generating translation using both bilingual dictionaries and grammatical rules


Sentence in a source language
Source language structure
Sentence in a target language
Target language structure
analysis
transfer
synthesis

Interlingua machine translation

  • the source language is transformed into an interlingua, i.e., an abstract language-independent representation
  • the target language is generated from the interlingua.

Transfer vs. interlingua

Hybrid machine translation 

  • method of machine translation characterized by the use of multiple approaches within a single machine translation system.
  • Types:

  • RBMT guided by statistics
  • Statistical method guided by RBMT

MT software


Name

Platform

Freeware/commercial

Type

Google Translate

Cross-platform (Web application)

Freeware

Statistical

SYSTRAN

Cross-platform (Web application)

Commercial

Hybrid rules-based and SMT

Promt

Cross-platform

Commercial

Hybrid rules-based and SMT

Download 7.9 Kb.

Do'stlaringiz bilan baham:




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling