Parallel processing of big data using Hadoop MapReduce Kh. Sh. Kuzibaev T. K. Urazmatov


Download 230.49 Kb.
bet2/4
Sana11.03.2023
Hajmi230.49 Kb.
#1261478
1   2   3   4
Bog'liq
Maqola ready english

Introduction: The amount of digitized data in the world is growing rapidly. This, in turn, creates problems such as storing digital data, sorting, processing and drawing conclusions based on them. Big data, Data science, Data mining, Machine learning, Deep learning are used in the field of information technology to study these problems and offer solutions. , scientific networks such as artificial neural networks have emerged. The problem that we have studied in this article belongs to the field of BigData. Against the background of a sharp increase in the current data, the issue of their storage and rapid processing shows the urgency of the matter. As an object of research, we have identified the work of the russian writer Leo Tolstoy " War and Peace" as a large volume of information. As the subject of research, we identified Apache Hadoop HDFS, which is used to store large volumes of data, and Hadoop MapReduce, which processes data in parallel. The goal of our research is to prove that large volumes of data cannot be processed by traditional computing methods, and that processing by parallel computing is effective and fast.
We have identified the following as the tasks of our research:

  • Storing large volumes of data in distributed file systems

  • Obtain results by processing large volumes of information in a traditional way

  • Processing large volumes of data using parallel computing and obtaining results

  • Draw conclusions by comparing the obtained results

What we have identified as an object is the electronic interpretation of Leo Tolstoy’s " War and Peace". We can convert the 2300-page electronic text work into .txt format. We write a program in the Java programming language that determines the frequency of words in the text, that is, the number of repetitions of each word. We process our selected object in the traditional way. We will return the result and processing time. Now we will process this large volume of data in parallel using the Hadoop MapReduce model. We will return the results and the time spent on processing. When we compared the recorded results and the time spent on rework, we observed a clear difference. As a result of comparing the obtained results, we made appropriate conclusions.

Download 230.49 Kb.

Do'stlaringiz bilan baham:
1   2   3   4




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling