A hidden Markov Model (hmm) based speaker identification system using mobile phone database of North Atlantic Treaty Organization (nato) words
monophones1 –I dlog dict dict.txt
Download 247,06 Kb. Pdf ko'rish
|
monophones1 –I dlog dict dict.txt is invoked to generate the list of words in the file called monophones1.
Paramaterization: During this step, the data recorded was parameterized into a sequence of features. The technique used for parameterization of the data is Mel Frequency Cepstral Coefficient (MFCC).[6] For model preparation, First of all, a proto file is defined which defines the model topology. Then a new version of file proto and Vfloor is created. The model formed which is saved in the file proto is placed against each word and placed in “hmmdefs” file. Also copy the content of “Vfloors” to a file named “macros”. Then the re-estimation i s d o n e which will generate “hmmdefs” and “proto” files in successive directory. Executing the command two or more times, the file “hmmdefs” and macros can be generated in consecutive directories. Now we need to create a 1 state short pause (sp model) by copying the contents of (sil model) and placing it in the sp model [7].For making the model more robust, it is required to add an extra transition in the sil model which absorbs the various impulsive noise in the training data by using monophones1 and sil.hed files. The sil.hed file contains the data including: AT 2 4 0.2 {sil.transP} AT 4 2 0.3 {sil.transP} AT 1 3 0.3 {sp.transP} TI silst {sil.state [3], sp. state [2] } Since the dictionary contains multiple pronunciations of some words, the phone models created so far can be used to re-align training data and create new transcriptions by using Viterbi algorithm which uses the hmm stored in directories to transform the input word level transcription to the new word level transcription .using the pronunciation stored in the dictionary .Now for recognizing the word, the token passing algorithm is used to perform Viterbi based speech recognition. When we execute the process, it first measures the speech and background silence level by prompting the user to speak an arbitrary word. After that it will repeatedly recognize the word and output into the terminal. Download 247,06 Kb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2025
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling