Контрольная работа Идентификация говорящего Проверка динамика Индивидуальная практика


Download 0.68 Mb.
bet2/2
Sana15.07.2023
Hajmi0.68 Mb.
#1660375
TuriКонтрольная работа
1   2
Bog'liq
2019 LG SpeakerRecognition tutorial

configure.py

  • ^^ 3^ ^ ?|Е|- ^>

D B_wav_rea der.py

  • pandas £|-0|—^al^ dataf » Д-Ё^^# ^^

  • $Е, >^ ^ Е||^е^^ dataframe> 0|£$Щ Е||0^* M3S



train.py

  • Д^ Ч|О|Е^> Ж^^ (“DB_wav_reader.py” 0|^)

  • 90% Д^ (train) Е||О|Е1 / 10% (validation) Щ0|Е

  • background model Д^ .q4

enroll.py

  • ^Д Ч|0|Е^> >^Д (“DB_wav_reader.py” 0|Д)

  • Д^ё! SIS о|£*ю ^Д ад (WS)°I Ж ^>

  • “enroll_embeddings” #ЧЙ1 dictionary оЕН^- *1^



identification.py

  • X&S S4>l SUS (101)> >3S

  • E-II^e Е||ОЕ* >&|£ (“DB_wav_reader.py" 0|§)

  • E±e^^|S4§^>

  • ^4^ ¥4^> 44 3144^ 44 &4£4 e W $4» >3 (944Я)

verification.py

  • X4S $m 2Mig aw >3s

  • EE^e Ч|О|ЕК <3§ (“DB_wav_reader.py” 0|§)

  • ee^e sm ^ig ^e

  • ¥ S4IS L^ ^4^ #4£< 4ltt Ж threshold ОТ н|НВД 3> ¥^ (W3§)

("train.ру")
def main() :

  • Set hyperparameters

use cuda = True # use gpm or cpu
valratio = 10 # Percentage of validation set embedding_size = 128 start = 1 # Start epoch
|n epochs = 30 # How many epochs: end = start + n_epochs # Last epoch
Ir = le-1 # Initial learning rate
wd = le-4 # Weight decay (L2 penalty)
optimizertype = ' sgd 1 # ex) sgd, ad am, adagrad
batchsize = 64 # Batch size for training
validbatchsize = 16 # Batch size for validation use_shuffle = True # Shuffle for training or not

  • Load dataset

train_dataset, valid_dataset, nclasses = load_dataset(valratio)



# instantiate model and initialize weights
model = background_resnet(embedding_size=embedding_size, num_classes=n_classes)

  • learning rate# 2§^tu scheduler

if usecuda:



  • Loss^f plateau ohEHS learning rate#
    S^—^ validation set£| loss# S^#

# define loss function (criterion) criterion = nn.CrossEntropyLoss() optimizer = create_optinnizer (optim
[scheduler = optim.Irscheduler.ReduceLROnPlateau(optimizer trainloader = torch.utils.data.DataLoader(dataset=train_dataset?
batch_size=batch_size, shuffle=use_shuffle) validloader = torch.utils.data.DataLoader(dataset=valid_dataset?
batch_size=valid_batch_size, shuffle=False, collate_fn = collatefn_feat_padded)




“loss_plot.png”




(pytorch3) admin@administrator-8:~/Desktop/LG_SR$ python train.py
Training set 21600 utts (90.0%)
Validation set 2400 utts (10.0%)
Total 24000 utts
Number of classes (speakers): 240

T rain

Epoch: 1 [

0/

21600

(

0%)]

Time

0.134

(0.134)

Loss

5.5429

Acc

0.

0000

T rain

Epoch: 1 [

5376/

21600

(

25%)]

Time

0.064

(0.070)

Loss

5.1952

Acc

1.

0659

T rain

Epoch: 1 [

10752/

21600

(

50%)]

Time

0.075

(0.069)

Loss

4.7067

Acc

1.

9083

T rain

Epoch: 1 [

16128/

21600

(

75%)]

Time

0.084

(0.069)

Loss

4.1807

Acc

3.

7277

T rain

Epoch: 1 [

21504/

21600

(

99%)]

Time

0.063

(0.068)

Loss

3.7057

Acc

6.

3190

* Validation: Loss 2.1903

Acc

41

.9541

























T rain

Epoch: 2 [

0/

21600

(

0%)]

Time

0.037

(0.037)

Loss

1.8758

Acc

50

.0000

T rain

Epoch: 2 [

5376/

21600

(

25%)]

Time

0.032

(0.031)

Loss

1.6363

Acc

51

.0857

T rain

Epoch: 2 [

10752/

21600

(

50%)]

Time

0.033

(0.031)

Loss

1.5071

Acc

53

.3294

T rain

Epoch: 2 [

16128/

21600

(

75%)]

Time

0.031

(0.031)

Loss

1.3909

Acc

55

. 1330

T rain

Epoch: 2 [

21504/

21600

(

99%)]

Time

0.028

(0.031)

Loss

1.2962

Acc

56

.7118

* Validation: Loss 1.1327

Acc

68

i.6079




























test frames


# Get the dataframe for enroll DE


# Settings


use cuda


embedding dir =


n classes


def main():


log dir = model saved


embedding size = 128


num = 24 # Which checkpoint to use?


# Load model from checkpoint


model = load model(use cuda


# Where to save embeddings


enroll embedjdings


XjSS checkpoint § ^ ffl^> й


c.TEST_FEAT_DiR“S^3 с1|0|ЕК


Д c||o|^ ЧЛЕ c||o|q<






pandas B|*O|M^E|^ dataframe X^S

  • qq^S^^q^^f Ч^й§

  • column, row, index°| А1 Я^^ ^^^

  • DB_wav_reader.py^|Ai §2|

import pandas as pd DB = pd.DataFrame()

Д ("enroll.py”)

“embeddings''^^- dictionary 41 sm 24195*1$ key : ДЧ9 value : 249
def enroll_per_spk(use_cudaj testframes, model, DE, embeddingdir)
Output the averaged d-vector for each speaker (enrollment)
Return the dictionary (length of nspk)
nfiles = len(DB) # 10 enroll_speaker_list = sorted(set(DB[’speakerid"]))
— - embeddings = {}
# Aggregates all the activations
print("Start to aggregate all the d-vectors per enroll speaker"



print("Aggregates the activation (spk : %s)“ X (spk))





(pytorch3) admin(cladmin istra tor-8:-/Desktop/LGSR$ python enroll.py => loading checkpoint
Start to aggregate all the d-vectors per enroll speaker

Aggregates

the

activation

(spk

225M4062)

Aggregates

the

activation

(spk

230M4087)

Aggregates

the

activation

(spk

240M3063)

Aggregates

the

activation

(spk

229M2031)

Aggregates

the

activation

(spk

213F5100)

Aggregates

the

activation

(spk

233F4013)

Aggregates

the

activation

(spk

217F3038)

Aggregates

the

activation

(spk

207F2088)

Aggregates

the

activation

(spk

236M3043)

Aggregates

the

activation

(spk

103F3021)




Save

the

embeddings

for

103F3021

Save

the

embeddings

for

207F2088

Save

the

embeddings

for

213F5100

Save

the

embeddings

for

217F3038

Save

the

embeddings

for

225M4062

Save

the

embeddings

for

229M2031

Save

the

embeddings

for

230M4087

Save

the

embeddings

for

233F4013

Save

the

embeddings

for

236M3043

Save

the

embeddings

for

240M3063

def main():
log_dir = 1modelsaved1 # Where the checkpoints are saved embedding_dir = 'enrollembeddings’ # Where embeddings are saved test_dir = feat_logfbank_nfilt40/test/' # Where test features are saved

  • Settings

usecuda = True # Use cuda or not
embedding_size = 128 # Dimension of speaker embeddings cp_num = 24 # Which checkpoint to use?
□classes = 240 # How many speakers in training data? testframes = 100 # Split the test utterance

  • Load model from checkpoint model = load_model(use_cuda? log_dir, cp_num? embed di ng_size_, nclasses)



  • “enroll_embeddings” #ВД tioh^
    >< $m ^|g> #3£Cf
    Get the dataframe for test DE> enroll_DBj |test DB~|= s|plit_enroll_and_test(c. TEST_FEAT_DIR)

  • Load enroll embeddings embeddings = load enroll embeddings (embedding dir)


$ma 4^
def perforeidentification(use cuda, model, embeddings, testfilenaee, test fnames, spklist): testembedding = get embeddings(use cuda, test filename, model, testframes) ■ >
maxscore = -10**8 best_spk = None



for spk in spklist:
score = F. cosine_siinilarity (test_embedding, embeddings[spk]) score = score.data.cpu().numpy() if score > max score:

ч—е $^49^ >< snmia 4°i
34^ -Л-4Е 44


maxscore = score bestspk = spk
#print("Speaker identification result : %s” Xbestspk)
truespk = testfilename.split('/')[-2].split()[0]
print("\n=== Speaker identification ===")
print("True speaker : 5Cs\nPredicted speaker : %s\nResult : %s\n" X(true_spk, bestspk, trae_spk==best_spk)) return bestspk
(pytorch3) admin(3admin i st rator-8:-/Desktop/LG_SR$ python identification.py => loading checkpoint
=== Speaker identification ===
True speaker : 230M4087
Predicted speaker : 230M4087
Result : True
def main():
log_dir = 1modelsaved1 # Where the checkpoints are saved embedding_dir = 'enrollembeddings’ # Where embeddings are saved test_dir = feat_logfbank_nfilt40/test/' # Where test features are saved

  • Settings

usecuda = True # Use cuda or not
embedding_size = 128 # Dimension of speaker embeddings cp_num = 24 # Which checkpoint to use?
□classes = 240 # How many speakers in training data? testframes = 100 # Split the test utterance

  • Load model from checkpoint model = load_model(use_cuda? log_dir, cp_num? embed di ng_size_, nclasses)



  • “enroll_embeddings” #ВД tioh^
    >< $m ^|g> #3£Cf
    Get the dataframe for test DE> enroll_DBj |test DB~|= s|plit_enroll_and_test(c. TEST_FEAT_DIR)

  • Load enroll embeddings embeddings = load enroll embeddings (embedding dir)




(“verification.py”)

$w^








44°l 949 44

* 349 #4£
(score) 44


if score >

thres:

result

= 'Accept'

else:




result

= 'Reject'




*• scored thresh tl|H


def perform verification(use cjda, model, embeddings, enroll_speaker, test_filename, test_fraires, thres):
enrollembedding = embeddings[enrolIspeaker]
testembedding = get_embeddings(use_cuda, testfilename, model, testframes)
score = F. cosine_siinilarity(test_enibeddingJ enroll embedding) score = score.data.cpu().numpy()
test_spk = test filename.split('/’)[-2]- split[0]
print("\n=== Speaker verification ===“)
print("True speaker: %s\nClaimed speaker : %s\n\nResult : %s\n" %(enroll_speaker, test_spk, result)) print("Score : .4f\nThreshold : 3!0.2f\n“ %(score, thres))
(pytorch3) admin(cladmin i st rator-8:-/Desktop/LG_SR$ python verification.py => loading checkpoint
=== Speaker verification ===
True speaker: 230M4087
Claimed speaker : 230M4087
Result : Accept
Score : 0.9556
Threshold : 0.95
(pytorch3) admin(9administrator-8:-/Desktop/LG_SR$ python verification.py => loading checkpoint
=== Speaker verification ===
True speaker: 230M4087
Claimed speaker : 207F2088
Result : Reject
Score : 0.8026
Threshold : 0.95

  • £A1

ф train.py
ф enroll.py
ф identification.py
ф verification.py


  • ^ед&

  1. £AW ^^ ОД 2C S^^l

  2. hyperparameter Uffi^l^ S0°H-S-7l

  • Loss function й^ 49

  • ^S. Э4 49

  1. 4i44 4x2 34^T< 249 #4£4

££X| 49

  1. threshold M ОД 443§ ^4 HR

4l$Wfc

  1. voxceleb DB> ^^

http://www.robots.ox.ac.uk/~vgg/data/voxceleb/

  1. Q^ loss function ^^

ex) center loss, angular softmax loss,...

  1. Q^ £g 3£

ex) VGGNet, LSTM,.

Download 0.68 Mb.

Do'stlaringiz bilan baham:
1   2




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling