Контрольная работа Идентификация говорящего Проверка динамика Индивидуальная практика

Download 0.68 Mb.

bet	2/2
Sana	15.07.2023
Hajmi	0.68 Mb.
	#1660375
Turi	Контрольная работа

1 2

Bog'liq
2019 LG SpeakerRecognition tutorial

configure.py

^^ 3^ ^ ?|Е|- ^>

D B_wav_rea der.py

pandas £|-0|—^^al^ dataf » Д-Ё^^# ^^
$Е, >^ ^ Е||^е^^ dataframe> 0|£$Щ Е||0^* M3S

train.py

Д^ Ч|О|Е^> Ж^^ (“DB_wav_reader.py” 0|^)

90% Д^ (train) Е||О|Е1 / 10% (validation) Щ0|Е
background model Д^ .q⁴

enroll.py

^Д Ч|0|Е^> >^Д (“DB_wav_reader.py” 0|Д)
Д^ё! SIS о|£*ю ^Д ад (WS)°I Ж ^>
“enroll_embeddings” #ЧЙ1 dictionary о^ЕН^- *1^

identification.py

X&S ^Д S4>l SUS (101)> >3S
E-II^e Е||ОЕ* >&|£ (“DB_wav_reader.py" 0|§)
E±e^^|S4§^>
^4^ ¥4^> 44 3144^ 44 &4£4 e W $4» >3 (944Я)

verification.py

X4S ^д $m 2Mig aw >3s
EE^e Ч|О|ЕК <3§ (“DB_wav_reader.py” 0|§)
ee^e sm ^ig ^e
¥ S4IS L^ ^4^ #4£< 4ltt Ж threshold ОТ н|НВД 3> ¥^ (W3§)

("train.ру")
def main() :

Set hyperparameters

use cuda = True # use gpm or cpu
valratio = 10 # Percentage of validation set embedding_size = 128 start = 1 # Start epoch
|n epochs = 30 # How many epochs: end = start + n_epochs # Last epoch
Ir = le-1 # Initial learning rate
wd = le-4 # Weight decay (L2 penalty)
optimizertype = ' sgd ¹ # ex) sgd, ad am, adagrad
batchsize = 64 # Batch size for training
validbatchsize = 16 # Batch size for validation use_shuffle = True # Shuffle for training or not

Load dataset

train_dataset, valid_dataset, nclasses = load_dataset(valratio)

# instantiate model and initialize weights
model = background_resnet(embedding_size=embedding_size, num_classes=n_classes)

learning rate# 2§^tu scheduler

if usecuda:

Loss^f plateau o^hEHS learning rate#
S^—^ validation set£| loss# S^#

# define loss function (criterion) criterion = nn.CrossEntropyLoss() optimizer = create_optinnizer (optim
[scheduler = optim.Irscheduler.ReduceLROnPlateau(optimizer trainloader = torch.utils.data.DataLoader(dataset=train_dataset_?
batch_size=batch_size, shuffle=use_shuffle) validloader = torch.utils.data.DataLoader(dataset=valid_dataset_?
batch_size=valid_batch_size, shuffle=False, collate_fn = collatefn_feat_padded)

“loss_plot.png”

(pytorch3) admin@administrator-8:~/Desktop/LG_SR$ python train.py
Training set 21600 utts (90.0%)
Validation set 2400 utts (10.0%)
Total 24000 utts
Number of classes (speakers): 240

T rain	Epoch: 1 [	0/	21600	(	0%)]	Time	0.134	(0.134)	Loss	5.5429	Acc	0.	0000
T rain	Epoch: 1 [	5376/	21600	(	25%)]	Time	0.064	(0.070)	Loss	5.1952	Acc	1.	0659
T rain	Epoch: 1 [	10752/	21600	(	50%)]	Time	0.075	(0.069)	Loss	4.7067	Acc	1.	9083
T rain	Epoch: 1 [	16128/	21600	(	75%)]	Time	0.084	(0.069)	Loss	4.1807	Acc	3.	7277
T rain	Epoch: 1 [	21504/	21600	(	99%)]	Time	0.063	(0.068)	Loss	3.7057	Acc	6.	3190
* Validation: Loss 2.1903			Acc	41	.9541
T rain	Epoch: 2 [	0/	21600	(	0%)]	Time	0.037	(0.037)	Loss	1.8758	Acc	50	.0000
T rain	Epoch: 2 [	5376/	21600	(	25%)]	Time	0.032	(0.031)	Loss	1.6363	Acc	51	.0857
T rain	Epoch: 2 [	10752/	21600	(	50%)]	Time	0.033	(0.031)	Loss	1.5071	Acc	53	.3294
T rain	Epoch: 2 [	16128/	21600	(	75%)]	Time	0.031	(0.031)	Loss	1.3909	Acc	55	. 1330
T rain	Epoch: 2 [	21504/	21600	(	99%)]	Time	0.028	(0.031)	Loss	1.2962	Acc	56	.7118
* Validation: Loss 1.1327			Acc	68	i.6079

test frames

# Get the dataframe for enroll DE

# Settings

use cuda

embedding dir =

n classes

def main():

log dir = model saved

embedding size = 128

num = 24 # Which checkpoint to use?

# Load model from checkpoint

model = load model(use cuda

# Where to save embeddings

enroll embedjdings

XjSS checkpoint § ^ ffl^> й

c.TEST_FEAT_DiR“S^3 ^с1|0|ЕК

Д c||o|^ ЧЛЕ c||o|q<

pandas B|*O|M^E|^ dataframe X^S

qq^S^^q^^f Ч^й§
column, row, index°| А1 Я^^ ^^^
DB_wav_reader.py^|Ai §2|

import pandas as pd DB = pd.DataFrame()

Д ("enroll.py”)

“embeddings''^^- dictionary 41 >Д sm 24195*1$ key : ДЧ9 value : 249
def enroll_per_spk(use_cudaj testframes, model, DE, embeddingdir)
Output the averaged d-vector for each speaker (enrollment)
Return the dictionary (length of nspk)
nfiles = len(DB) # 10 enroll_speaker_list = sorted(set(DB[’speakerid"]))
— - embeddings = {}
# Aggregates all the activations
print("Start to aggregate all the d-vectors per enroll speaker"

print("Aggregates the activation (spk : %s)“ X (spk))

(pytorch3) admin(cladmin istra tor-8:-/Desktop/LGSR$ python enroll.py => loading checkpoint
Start to aggregate all the d-vectors per enroll speaker

Aggregates	the	activation	(spk	225M4062)
Aggregates	the	activation	(spk	230M4087)
Aggregates	the	activation	(spk	240M3063)
Aggregates	the	activation	(spk	229M2031)
Aggregates	the	activation	(spk	213F5100)
Aggregates	the	activation	(spk	233F4013)
Aggregates	the	activation	(spk	217F3038)
Aggregates	the	activation	(spk	207F2088)
Aggregates	the	activation	(spk	236M3043)
Aggregates	the	activation	(spk	103F3021)

Save	the	embeddings	for	103F3021
Save	the	embeddings	for	207F2088
Save	the	embeddings	for	213F5100
Save	the	embeddings	for	217F3038
Save	the	embeddings	for	225M4062
Save	the	embeddings	for	229M2031
Save	the	embeddings	for	230M4087
Save	the	embeddings	for	233F4013
Save	the	embeddings	for	236M3043
Save	the	embeddings	for	240M3063

def main():
log_dir = ¹modelsaved¹ # Where the checkpoints are saved embedding_dir = 'enrollembeddings’ # Where embeddings are saved test_dir = ’feat_logfbank_nfilt40/test/' # Where test features are saved

Settings

usecuda = True # Use cuda or not
embedding_size = 128 # Dimension of speaker embeddings cp_num = 24 # Which checkpoint to use?
□classes = 240 # How many speakers in training data? testframes = 100 # Split the test utterance

Load model from checkpoint model = load_model(use_cuda_? log_dir, cp_num_? embed di ng_size_, nclasses)
“enroll_embeddings” #ВД tio^h^
>< $m ^|g> #3£Cf
Get the dataframe for test DE> enroll_DBj |test DB~|= s|plit_enroll_and_test(c. TEST_FEAT_DIR)
Load enroll embeddings embeddings = load enroll embeddings (embedding dir)

$ma 4^
def perforeidentification(use cuda, model, embeddings, testfilenaee, test fnames, spklist): testembedding = get embeddings(use cuda, test filename, model, testframes) ■ >
maxscore = -10**8 best_spk = None

for spk in spklist:
score = F. cosine_siinilarity (test_embedding, embeddings[spk]) score = score.data.cpu().numpy() if score > max score:

ч—е $^49^ >< snmia 4°i
34^ -Л-4Е 44

maxscore = score bestspk = spk
#print("Speaker identification result : %s” Xbestspk)
truespk = testfilename.split('/')[-2].split()[0]
print("\n=== Speaker identification ===")
print("True speaker : 5Cs\nPredicted speaker : %s\nResult : %s\n" X(true_spk, bestspk, trae_spk==best_spk)) return bestspk
(pytorch3) admin(3admin i st rator-8:-/Desktop/LG_SR$ python identification.py => loading checkpoint
=== Speaker identification ===
True speaker : 230M4087
Predicted speaker : 230M4087
Result : True
def main():
log_dir = ¹modelsaved¹ # Where the checkpoints are saved embedding_dir = 'enrollembeddings’ # Where embeddings are saved test_dir = ’feat_logfbank_nfilt40/test/' # Where test features are saved

Settings

Load model from checkpoint model = load_model(use_cuda_? log_dir, cp_num_? embed di ng_size_, nclasses)
“enroll_embeddings” #ВД tio^h^
>< $m ^|g> #3£Cf
Get the dataframe for test DE> enroll_DBj |test DB~|= s|plit_enroll_and_test(c. TEST_FEAT_DIR)
Load enroll embeddings embeddings = load enroll embeddings (embedding dir)

(“verification.py”)

$w^

44°l 949 44

* 349 #4£
(score) 44

if score >	thres:
result	= 'Accept'
else:
result	= 'Reject'

*• scored thresh tl|H

def perform verification(use cjda, model, embeddings, enroll_speaker, test_filename, test_fraires, thres):
enrollembedding = embeddings[enrolIspeaker]
testembedding = get_embeddings(use_cuda, testfilename, model, testframes)
score = F. cosine_siinilarity(test_enibedding_J enroll embedding) score = score.data.cpu().numpy()
test_spk = test filename.split('/’)[-2]- split[0]
print("\n=== Speaker verification ===“)
print("True speaker: %s\nClaimed speaker : %s\n\nResult : %s\n" %(enroll_speaker, test_spk, result)) print("Score : 3®.4f\nThreshold : 3!0.2f\n“ %(score, thres))
(pytorch3) admin(cladmin i st rator-8:-/Desktop/LG_SR$ python verification.py => loading checkpoint
=== Speaker verification ===
True speaker: 230M4087
Claimed speaker : 230M4087
Result : Accept
Score : 0.9556
Threshold : 0.95
(pytorch3) admin(9administrator-8:-/Desktop/LG_SR$ python verification.py => loading checkpoint
=== Speaker verification ===
True speaker: 230M4087
Claimed speaker : 207F2088
Result : Reject
Score : 0.8026
Threshold : 0.95