A hidden Markov Model (hmm) based speaker identification system using mobile phone database of North Atlantic Treaty Organization (nato) words

monophones1 –I dlog dict dict.txt

bet	4/7
Sana	28.12.2022
Hajmi	247,06 Kb.
	#1024547

1 2 3 4 5 6 7

monophones1 –I dlog dict dict.txt is invoked to generate the list of words in the file called monophones1.
Paramaterization:
During this step, the data recorded was parameterized into a sequence of features.
The technique used for parameterization of the data is Mel Frequency Cepstral Coefficient (MFCC).[6]
For model preparation, First of all, a
proto file is defined which defines the model topology. Then a new
version of file proto and Vfloor is created. The model formed which is saved in the file proto is placed
against each word and placed in “hmmdefs” file. Also copy the content of “Vfloors” to a file named
“macros”. Then the re-estimation i s d o n e which will generate “hmmdefs” and “proto” files in successive
directory. Executing the command two or more times, the file “hmmdefs” and macros can be generated in
consecutive directories. Now we need to create a 1 state short pause (sp model) by copying the contents of (sil
model) and placing it in the sp model [7].For making the model more robust, it is required to add an extra
transition in the sil model which absorbs the various impulsive noise in the training data by using
monophones1 and sil.hed files. The sil.hed file contains the data including:
AT 2 4 0.2 {sil.transP}
AT 4 2 0.3 {sil.transP}
AT 1 3 0.3 {sp.transP}
TI silst
{sil.state [3], sp. state [2] }
Since the dictionary contains multiple pronunciations of some words, the phone models created so far can be
used to re-align training data and create new transcriptions by using Viterbi algorithm which uses the hmm
stored in directories to transform the input word level transcription to the new word level transcription .using
the pronunciation stored in the dictionary .Now for recognizing the word, the token passing algorithm is used
to perform Viterbi based speech recognition.
When we execute the process, it first measures the speech and background silence level by prompting the
user to speak an arbitrary word. After that it will repeatedly recognize the word and output into the terminal.

Download 247,06 Kb.

Do'stlaringiz bilan baham:

1 2 3 4 5 6 7