Plan. 1 Morphological analyzers of Russian Morphology of East Slavonic languages

  • Morphological analyzers of Russian

  • Morphology of East Slavonic languages

  • Multilingual information-retrieval thesauri

  • Electronic bilingual dictionaries

  • Russian and bilingual text collections


  • Machine Translation systems

  • Example-Based Machine Translation system and conceptual information retrieval

  • Bilingual ontologies

    • Russian WordNet
    • Sociopolitical Thesaurus for automatic text processing

Morphological analysis of Russian language

  • No problem: a lot of qualitative morphological analyzers of Russian

  • Based on classification in “Grammatical dictionary of Russian language” by A.A. Zalizniak (the first edition was published in 1983)

Morphological analyzers:

  • Zalizniak dictionary and a morphological analyzer

  • With license LGPL (site in Russian)

  • (Russian and Ukrainian) - paid resources, used in several known commercial Russian systems

Russian Internet Search Engines use with Russian morphology analysis

  • Yandex –

  • Rambler –

  • Aport –

Morphology of East Slavonic Languages in Search Engines

  • Ukrainian Internet search engine Meta (

    • Russian, English and Ukrainian morphology
  • Byelorussian search engine (

    • Russian, English and Byelorussian morphology (will be added)

Traditional multilingual information-retrieval thesauri

Thesaurus of European Union: EUROVOC

  • Translated into 9 languages

  • Translated into Russian language by specialists of Parliamentary library

  • Added with Russian specific terms (9646 descriptors in Russian version)

  • Used for manual indexing of documents in the library

Electronic dictionaries

MultiLex dictionaries


  • English, French, Spanish, German, Italian

  • Licenced versions of dictionaries from publishers

  • Usually includes a general dictionary and several domain-specific dictionaries

Lingvo dictionaries


  • Abbyy Lingvo 8.0 Multilingual edition: Eight translation directions – 41 general and specialised dictionaries

  • FineReader – the best Russian OCR-system. Support more than 100 languages. Winner in 70 comparative tests worldwide

Polyglossum dictionaries

  • ETS publishing house


  • Electronic (plain text format is possible) versions and traditional printed versions

  • Bilingual English, German, French, Spanish, + Finnish languages

Russian Text Collections

Internet Library of Moshkov


  • Fiction in Russian including classic works

  • 3300 Mb Text-files and 300 Mb other files

  • Free access

  • No copyright

Internet library -

  • More than 1000 names of periodic press after 1990.

  • Free access

  • No copyright

  • License to librarian activity

Morphologically tagged corpus of Russian “Russian Standard”

  • Creation of a morpologically tagged corpus of Russian in Russia has been begun

  • Russian fiction 583,814 words

  • Serge Sharoff

Parallel collections

Parallel translation of news reports

  • ITAR-TASS agency: news reports in 6 languages (

  • RIA-Novosti agency: news reports in 12 languages (

  • Internet newspaper PRAVDA On-Line - translation into English

Translation of Russian Legislation

  • GARANT company – legal information systems


  • Translated more than 25 thousand Russian legal acts into English

  • is disseminated via the network of the American company LEXIS/NEXIS.

Machine translation systems

ETAP machine translation system

  • Based on Meaning-Text Theory by I.Melchuk and Y. Apresyan. Detailed rule-based syntactic analysis.

  • English-Russian


Most known commercial machine translation system: PROMT


  • Russian - English, French, German, Spanish, Italian

  • English-German

  • Development of domain-specific systems

  • Online translation:

Example-Based Machine Translation: ETRANS, RTRANS

  • Gerold Belonogov

  • Idea was published in 1975

  • VINITI - All-Russian Scientific and Technical Information Institute of Russian Academy of Sciences (

Example –based machine translation in VINITI -2

  • VINITI: manual indexing – search images of technical literature, abstracts, collected for many years

  • 900 thousand Russian terms were extracted (length 1-13 words)

  • Parallel collection of English abstracts and their translation into Russian => 800 thousand English terms

Conceptual indexing in VINITI

  • Bilingual base of terms can serve as a resource for bilingual search

  • It is not an ontology, only bilingual pairs

  • An important tool for VINITI: access of foreign researchers to Russian technical literature, but

  • (as I know) not implemented yet

Multilingual ontologies

Russian WordNet - RussNet

  • Saint-Petersburg State University

  • 2003: 15000 words – 5000 synsets – 8000 relations

  • Adding of several types of new relations such as derivative synonyms, derivative semantic roles


  • Collections of documents in English

  • - RePEc (Research Papers in Economics, abstracts and full texts

  • - collection of Council of Europe documents.

  • access to parallel collections of legislation. Harmonization of legislation

Approach to Organization of Bilingual Search in UIS RUSSIA

  • Development of a bilingual ontology in sociopolitical domain based on Russian Sociopolitical Thesaurus for automatic text processing

Use of Thesaurus in Information Retrieval applications

  • Flexible knowledge-based categorization systems (9 systems)

  • - Automatic text categorization of Russian legislation (200 000 documents) – 3000 categories

  • Knowledge-based text summarization system - SUMMAC conference

  • Thesaurus-based information retrieval

    • - a specially constructed thesaurus can significantly improve efficiency of information retrieval (3-point average precision)

English-Russian Sociopolitical Thesaurus

  • Hierarchical conceptual net of 63 thousand English terms

  • Manual work

    • Use of general and special English-Russian dictionaries
    • Study of conventional American and British dictionaries and information-retrieval thesauri.
    • Cross-checking of translations. Addition multiword variants. Internet checks.

Bilingual Search in UIS RUSSIA

English-Russian Sociopolitical Thesaurus: testing and use in new applications

  • Automatic text categorization of economic papers and abstracts using JEL subject headings (700 categories) (supported by Ford Foundation, USA)

  • Automatic text processing of statistical tables (in cooperation with Berkeley University, USA)

  • Automatic text processing of European documents (European Court of Human Rights, Council of Europe, European Union) – problems of harmonization of Russian Legislation

Adding languages to Sociopolitical Thesaurus

  • It is a challenge to develop multilingual Sociopolitical thesaurus, to describe terms of Sociopolitical domain from different languages in the same hierarchical net.

  • A project under discussion – to add Tatar language to the bilingual thesaurus. Tatars are the second nation in Russia

Russian Information Retrieval Evaluation Seminar -2003

  • Web Collection – 7 Gb

    • (
  • Thematic classification of Web-sites

  • Web Search

  • 8 Russian participants

