MT Introduction - sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another (http://en.wikipedia.org/)
- Use: translation of large amount of date in the shortest possible time
- Standard documents
- Instructions and manuals
- Web sites, multilingual search
- Reference information(addresses, recipes, etc.)
- Aim: to understand the main contents of the document in a foreign language unknown to the user
- NOT to be used instead of human translation !!!
- Rule-based approach
- Statistical
- Example-based approach
- Hybrid machine translation
Rule-based translation Stages - Morphological analyses of source language
- Parsing source language (syntactic groups)
- Getting syntactic information about each word
- Dictionary based translation
example: - A girl eats an apple. (Eng.-Ger.)
- stages of translation:
- 1st: getting basic part-of-speech information of each source word: a = ind.art.; girl = n.; eats = v.; an = ind.art.; apple = n.
- 2nd: getting syntactic information about the verb “to eat”: here: eat – Pr. Simple, 3rd Pers. Sing., Act. V.
- 3rd: parsing the source sentence:(an apple) = the object of eat
- 4th: translate English words into Germana (category = indef.article) => ein (category = indef.article)girl (category = noun) => Mädchen…
- 5th: finding appropriate inflected forms: A girl eats an apple. => Ein Mädchen isst einen Apfel.
Statistical translation - Translations are generated according to probability distribution on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora
Benefits - Better use of resources
- More natural translations
- No programmers or linguists* involved
Shortcomings - Corpus creation can be costly for users with limited resources.
- The results are unexpected. Superficial fluency can be deceiving.
- Statistical machine translation does not work well between languages that have significantly different word orders
Статистический перевод - Основа - параллельный корпус
- Вероятности назначаются подсчетом наиболее вероятного варианта перевода
- Оценки вероятности зависят от объема и качества обучающего корпуса
- Лингвистическая информация: разбиение на предложения, графематический анализ, морфология
- При наличии корпуса простейшая система перевода может быть сделана на 2 недели
Rule-based vs. statistical
news:
document:
Rule-based translation Types - Dictionary-based (direct)
- Transfer-based
- Interlingual
Dictionary-based (direct) - word by word translation
- with or without morphological analysis or lemmatisation
Application translation of long lists of phrases on the subsentential (i.e., not a full sentence) level, e.g. lists, inventories or simple catalogs of products and services. Direct translation example Transfer-based machine translation 1. Analyzing the input text for morphology and syntax (and sometimes semantics) 2. Creating an internal representation 3. Generating translation using both bilingual dictionaries and grammatical rules
Sentence in a source language
Source language structure
Sentence in a target language
Target language structure
analysis
transfer
synthesis
Interlingua machine translation - the source language is transformed into an interlingua, i.e., an abstract language-independent representation
- the target language is generated from the interlingua.
Transfer vs. interlingua Hybrid machine translation - method of machine translation characterized by the use of multiple approaches within a single machine translation system.
Types: - RBMT guided by statistics
- Statistical method guided by RBMT
MT software
Name
|
Platform
|
Freeware/commercial
|
Type
|
Google Translate
|
Cross-platform (Web application)
|
Freeware
|
Statistical
|
SYSTRAN
|
Cross-platform (Web application)
|
Commercial
|
Hybrid rules-based and SMT
|
Promt
|
Cross-platform
|
Commercial
|
Hybrid rules-based and SMT
|
Do'stlaringiz bilan baham: |