Plan. 1 Morphological analyzers of Russian Morphology of East Slavonic languages


Download 445 b.
Sana14.08.2018
Hajmi445 b.



Plan.1

  • Morphological analyzers of Russian

  • Morphology of East Slavonic languages

  • Multilingual information-retrieval thesauri

  • Electronic bilingual dictionaries

  • Russian and bilingual text collections



Plan.2

  • Machine Translation systems

  • Example-Based Machine Translation system and conceptual information retrieval

  • Bilingual ontologies

    • Russian WordNet
    • Sociopolitical Thesaurus for automatic text processing


Morphological analysis of Russian language

  • No problem: a lot of qualitative morphological analyzers of Russian

  • Based on classification in “Grammatical dictionary of Russian language” by A.A. Zalizniak (the first edition was published in 1983)



Morphological analyzers:

  • Zalizniak dictionary and a morphological analyzer http://starling.rinet.ru/morpho.htm

  • With license LGPL http://www.aot.ru/download.html (site in Russian)

  • http://linguist.nm.ru/index.htm (Russian and Ukrainian) - paid resources, used in several known commercial Russian systems



Russian Internet Search Engines use with Russian morphology analysis

  • Yandex – www.yandex.ru

  • Rambler – www.rambler.ru

  • Aport – www.aport.ru



Morphology of East Slavonic Languages in Search Engines

  • Ukrainian Internet search engine Meta (www.meta.ua)

    • Russian, English and Ukrainian morphology
  • Byelorussian search engine (www.akavita.by)

    • Russian, English and Byelorussian morphology (will be added)


Traditional multilingual information-retrieval thesauri



Thesaurus of European Union: EUROVOC

  • Translated into 9 languages

  • Translated into Russian language by specialists of Parliamentary library

  • Added with Russian specific terms (9646 descriptors in Russian version)

  • Used for manual indexing of documents in the library



Electronic dictionaries



MultiLex dictionaries

  • www.medialingua.com

  • English, French, Spanish, German, Italian

  • Licenced versions of dictionaries from publishers

  • Usually includes a general dictionary and several domain-specific dictionaries



Lingvo dictionaries

  • www.abbyy.co.uk

  • Abbyy Lingvo 8.0 Multilingual edition: Eight translation directions – 41 general and specialised dictionaries

  • FineReader – the best Russian OCR-system. Support more than 100 languages. Winner in 70 comparative tests worldwide



Polyglossum dictionaries

  • ETS publishing house

  • www.ets.ru

  • Electronic (plain text format is possible) versions and traditional printed versions

  • Bilingual English, German, French, Spanish, + Finnish languages



Russian Text Collections



Internet Library of Moshkov

  • www.lib.ru

  • Fiction in Russian including classic works

  • 3300 Mb Text-files and 300 Mb other files

  • Free access

  • No copyright



Internet library - www.public.ru

  • More than 1000 names of periodic press after 1990.

  • Free access

  • No copyright

  • License to librarian activity



Morphologically tagged corpus of Russian “Russian Standard”

  • Creation of a morpologically tagged corpus of Russian in Russia has been begun

  • Russian fiction 583,814 words

  • Serge Sharoff http://corpus.leeds.ac.uk/



Parallel collections



Parallel translation of news reports

  • ITAR-TASS agency: news reports in 6 languages (http://corp.itar-tass.com/english/about/)

  • RIA-Novosti agency: news reports in 12 languages (http://en.rian.ru/rian/index.cfm)

  • Internet newspaper PRAVDA On-Line http://english.pravda.ru/ - translation into English



Translation of Russian Legislation

  • GARANT company – legal information systems

  • http://www.garant.ru/nav.php?pid=286&ssid=89

  • Translated more than 25 thousand Russian legal acts into English

  • is disseminated via the network of the American company LEXIS/NEXIS.



Machine translation systems



ETAP machine translation system

  • Based on Meaning-Text Theory by I.Melchuk and Y. Apresyan. Detailed rule-based syntactic analysis.

  • English-Russian

  • http://cl.iitp.ru/etap/index.html



Most known commercial machine translation system: PROMT

  • www.e-prompt.com

  • Russian - English, French, German, Spanish, Italian

  • English-German

  • Development of domain-specific systems

  • Online translation: www.translate.ru



Example-Based Machine Translation: ETRANS, RTRANS

  • Gerold Belonogov

  • Idea was published in 1975

  • VINITI - All-Russian Scientific and Technical Information Institute of Russian Academy of Sciences (www.viniti.ru)



Example –based machine translation in VINITI -2

  • VINITI: manual indexing – search images of technical literature, abstracts, collected for many years

  • 900 thousand Russian terms were extracted (length 1-13 words)

  • Parallel collection of English abstracts and their translation into Russian => 800 thousand English terms



Conceptual indexing in VINITI

  • Bilingual base of terms can serve as a resource for bilingual search

  • It is not an ontology, only bilingual pairs

  • An important tool for VINITI: access of foreign researchers to Russian technical literature, but

  • (as I know) not implemented yet



Multilingual ontologies



Russian WordNet - RussNet

  • Saint-Petersburg State University

  • 2003: 15000 words – 5000 synsets – 8000 relations

  • Adding of several types of new relations such as derivative synonyms, derivative semantic roles





UIS RUSSIA

  • Collections of documents in English

  • - RePEc (Research Papers in Economics, www.repec.org) abstracts and full texts

  • - collection of Council of Europe documents.

  • access to parallel collections of legislation. Harmonization of legislation



Approach to Organization of Bilingual Search in UIS RUSSIA

  • Development of a bilingual ontology in sociopolitical domain based on Russian Sociopolitical Thesaurus for automatic text processing





Use of Thesaurus in Information Retrieval applications

  • Flexible knowledge-based categorization systems (9 systems)

  • - Automatic text categorization of Russian legislation (200 000 documents) – 3000 categories

  • Knowledge-based text summarization system - SUMMAC conference

  • Thesaurus-based information retrieval

    • - a specially constructed thesaurus can significantly improve efficiency of information retrieval (3-point average precision)


English-Russian Sociopolitical Thesaurus

  • Hierarchical conceptual net of 63 thousand English terms

  • Manual work

    • Use of general and special English-Russian dictionaries
    • Study of conventional American and British dictionaries and information-retrieval thesauri.
    • Cross-checking of translations. Addition multiword variants. Internet checks.


Bilingual Search in UIS RUSSIA





English-Russian Sociopolitical Thesaurus: testing and use in new applications

  • Automatic text categorization of economic papers and abstracts using JEL subject headings (700 categories) (supported by Ford Foundation, USA)

  • Automatic text processing of statistical tables (in cooperation with Berkeley University, USA)

  • Automatic text processing of European documents (European Court of Human Rights, Council of Europe, European Union) – problems of harmonization of Russian Legislation



Adding languages to Sociopolitical Thesaurus

  • It is a challenge to develop multilingual Sociopolitical thesaurus, to describe terms of Sociopolitical domain from different languages in the same hierarchical net.

  • A project under discussion – to add Tatar language to the bilingual thesaurus. Tatars are the second nation in Russia



Russian Information Retrieval Evaluation Seminar -2003

  • Web Collection – 7 Gb

    • (www.narod.yandex.ru)
  • Thematic classification of Web-sites

  • Web Search

  • 8 Russian participants



Download 445 b.

Do'stlaringiz bilan baham:




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2020
ma'muriyatiga murojaat qiling