Microsoft Word Tezis-Salayeva-ict


Download 38.26 Kb.
Pdf ko'rish
bet2/3
Sana18.06.2023
Hajmi38.26 Kb.
#1580780
1   2   3
Bog'liq
Tezis-Salayeva-ICT

 
Related Work 
In recent years, there has been a rapid growth in the creation of natural language 
processing (NLP) resources for the Uzbek language. This includes the development of 
sentiment analysis [1] and semantic analysis datasets [2], as well as NLP tools such as 
transliterator [3] and part-of-speech taggers [4].
For sentiment analysis, researchers have created datasets of Uzbek text labeled with 
sentiment polarity, which can be used to train sentiment analysis models. These datasets 
have allowed for the development of sentiment analysis models for Uzbek, which can be 
used in various applications such as social media analysis and opinion mining. 
In terms of NLP tools, there have been efforts to develop transliteration systems for 
Uzbek, which can convert Uzbek text written in the Cyrillic script to the Latin script. 
Additionally, there are also part-of-speech taggers for Uzbek, which can automatically 
assign grammatical tags to Uzbek text. 
However, despite these advancements in Uzbek NLP, the development of ASR 
models for Uzbek has lagged behind. This is primarily due to the lack of resources and 
data for training ASR models for Uzbek. Also, the limited research on developing ASR 
models for low-resource languages makes it challenging to apply existing techniques to 
the Uzbek language. 
The present work aims to fill the gap in the literature by studying the state of ASR 
models for Uzbek and investigating methods to improve their performance. The study will 
also leverage the recent advancements in Uzbek NLP resources and tools in the process 
of creating ASR models for low-resource Uzbek language. 
 
Methodology 
Here are the steps of creating ASR models for Uzbek language, and its ways of 
creating models for the Uzbek language: 
 Evaluation of pre-existing ASR models for Uzbek: 
o
Comparison of the performance of different pre-existing ASR models for Uzbek, 
such as those developed by Google, Microsoft, or other companies[5]. 
o
Evaluation of the accuracy and speed of these models in recognizing Uzbek 
speech [6,7]. 
 Training of ASR model using a dataset of Uzbek speech: 
o
Collection of a dataset of Uzbek speech, which will be used to train the ASR 
model [8]. 
o
Training of an ASR model using this dataset and comparing its performance to 
the pre-existing models. 
 Fine-tuning of pre-trained model on a smaller dataset of Uzbek speech: 
o
Using transfer learning techniques to fine-tune a pre-trained model on a smaller 
dataset of Uzbek speech. 
o
Comparison of the performance of the fine-tuned model with the pre-existing 
models. 
 Data augmentation for increasing the size of the dataset: 
o
Use of data augmentation techniques such as adding background noise or 
varying the speed of the speech to increase the size of the dataset. 
o
Comparison of the performance of the model trained on augmented data with 
the pre-existing models. 
 Investigating the use of unsupervised learning algorithms for training ASR models on 
low-resource languages: 


3
TATU, 2023
o
Implementing unsupervised learning algorithms such as Autoencoder and 
Generative models to train ASR models on low-resource languages. 
o
Comparison of the performance of the models trained with unsupervised 
algorithms with the pre-existing models. 
 Evaluation: 
o
Evaluation of the models based on different metrics such as word error rate 
(WER), character error rate (CER) and so on. 
o
Comparison of the performance of the different models and discussion of the 
results. 

Download 38.26 Kb.

Do'stlaringiz bilan baham:
1   2   3




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling