A hidden Markov Model (hmm) based speaker identification system using mobile phone database of North Atlantic Treaty Organization (nato) words


Download 247,06 Kb.
Pdf ko'rish
bet2/7
Sana28.12.2022
Hajmi247,06 Kb.
#1024547
1   2   3   4   5   6   7
Agrawal et al.
© 2013 Acoustical Society of America [DOI: 10.1121/1.4800721]
Received 29 Jan 2013; published 2 Jun 2013
Proceedings of Meetings on Acoustics, Vol. 19, 060019 (2013) Page 1


Introduction:
Speaker recognition, which can be classified into identification and verification, is the 
process of automatically recognizing a speaker on the basis of individual information embedded in speech 
waves. This technique makes it possible to use the speaker's voice to verify their identity and control access
to services such as voice dialing, banking by telephone, telephone shopping, database access services,
information services, voice mail, security control for confidential information areas, and remote access to
computers. It is useful to distinguish between text-dependent speaker verification, where the decision is
made using speech corresponding to known text, and text-independent speaker verification, where the speech is
unconstrained. [1]
In this work, text dependent speaker
identification 
technique has been considered and Hidden Markov 
Model (HMM) has been used as a classification technique. HTK tool kit[2] using HMM tool box has been used
for Hidden Markov Models (HMMs).Individual 23 NATO words(Appendix[1]) spoken by a corpus of 100
speakers have been used to identify the speakers.
Data Preparation:
Training and testing a speaker recognition system needs a collection of utterances of 
different speakers. The present system uses a data-set of 23 North Atlantic Treaty Organization [NATO]
words [3]. The collected data was recorded by 100 speakers using three channels i.e. a Lapel microphone, a 
head held microphone and a cell phone. Recording was carried out in a sound treated room environment
having S/N=40 db.Recording of 100 speakers (Both Male & Female) of age group between 23 years to 60
years was sampled at the rate of 16kHz. Each speaker uttered each word twenty times. In total 46000
(23*20*100) words have been used to conduct the experiment. Seventy speakers were used for training the
system and the other 30 speakers for testing the system.

Download 247,06 Kb.

Do'stlaringiz bilan baham:
1   2   3   4   5   6   7




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2025
ma'muriyatiga murojaat qiling