Контрольная работа Идентификация говорящего Проверка динамика Индивидуальная практика
Download 0.68 Mb.
|
1 2
Bog'liq2019 LG SpeakerRecognition tutorial
- Bu sahifa navigatsiya:
- train.py
- background model Д^
- ^Д ад (WS)°I Ж ^>
- ^4^ ¥4^> 44 3144^ 44 4£4 e W $4» >3 (944Я)
- ¥ S4IS L^ ^4^ 4£ Ж threshold ОТ н|НВД 3> ¥^ (W3§)
- pandas B|*O|M^E|^ dataframe X^S
- 4l$Wfc
configure.py
^^ 3^ ^ ?|Е|- ^> D B_wav_rea der.py pandas £|-0|—^al^ dataf » Д-Ё^^# ^^ $Е, >^ ^ Е||^е^^ dataframe> 0|£$Щ Е||0^* M3S train.py Д^ Ч|О|Е^> Ж^^ (“DB_wav_reader.py” 0|^) 90% Д^ (train) Е||О|Е1 / 10% (validation) Щ0|Е background model Д^ .q4 enroll.py ^Д Ч|0|Е^> >^Д (“DB_wav_reader.py” 0|Д) Д^ё! SIS о|£*ю ^Д ад (WS)°I Ж ^> “enroll_embeddings” #ЧЙ1 dictionary оЕН^- *1^ identification.py X&S ^Д S4>l SUS (101)> >3S E-II^e Е||ОЕ* >&|£ (“DB_wav_reader.py" 0|§) E±e^^|S4§^> ^4^ ¥4^> 44 3144^ 44 &4£4 e W verification.py X4S ^д $m 2Mig aw >3s EE^e Ч|О|ЕК <3§ (“DB_wav_reader.py” 0|§) ee^e sm ^ig ^e ¥ S4IS L^ ^4^ #4£< 4ltt Ж threshold ОТ н|НВД 3> ¥^ (W3§) ("train.ру") def main() : Set hyperparameters use cuda = True # use gpm or cpu valratio = 10 # Percentage of validation set embedding_size = 128 start = 1 # Start epoch |n epochs = 30 # How many epochs: end = start + n_epochs # Last epoch Ir = le-1 # Initial learning rate wd = le-4 # Weight decay (L2 penalty) optimizertype = ' sgd 1 # ex) sgd, ad am, adagrad batchsize = 64 # Batch size for training validbatchsize = 16 # Batch size for validation use_shuffle = True # Shuffle for training or not Load dataset train_dataset, valid_dataset, nclasses = load_dataset(valratio) # instantiate model and initialize weights model = background_resnet(embedding_size=embedding_size, num_classes=n_classes) learning rate# 2§^tu scheduler if usecuda: Loss^f plateau ohEHS learning rate# S^—^ validation set£| loss# S^# # define loss function (criterion) criterion = nn.CrossEntropyLoss() optimizer = create_optinnizer (optim [scheduler = optim.Irscheduler.ReduceLROnPlateau(optimizer trainloader = torch.utils.data.DataLoader(dataset=train_dataset? batch_size=batch_size, shuffle=use_shuffle) validloader = torch.utils.data.DataLoader(dataset=valid_dataset? batch_size=valid_batch_size, shuffle=False, collate_fn = collatefn_feat_padded) “loss_plot.png” (pytorch3) admin@administrator-8:~/Desktop/LG_SR$ python train.py Training set 21600 utts (90.0%) Validation set 2400 utts (10.0%) Total 24000 utts Number of classes (speakers): 240
test frames # Get the dataframe for enroll DE # Settings use cuda embedding dir = n classes def main(): log dir = model saved embedding size = 128 num = 24 # Which checkpoint to use? # Load model from checkpoint model = load model(use cuda # Where to save embeddings enroll embedjdings XjSS checkpoint § ^ ffl^> й c.TEST_FEAT_DiR“S^3 с1|0|ЕК Д c||o|^ ЧЛЕ c||o|q< pandas B|*O|M^E|^ dataframe X^S qq^S^^q^^f Ч^й§ column, row, index°| А1 Я^^ ^^^ DB_wav_reader.py^|Ai §2| import pandas as pd DB = pd.DataFrame() Д ("enroll.py”) “embeddings''^^- dictionary 41 >Д sm 24195*1$ key : ДЧ9 value : 249 def enroll_per_spk(use_cudaj testframes, model, DE, embeddingdir) Output the averaged d-vector for each speaker (enrollment) Return the dictionary (length of nspk) nfiles = len(DB) # 10 enroll_speaker_list = sorted(set(DB[’speakerid"])) — - embeddings = {} # Aggregates all the activations print("Start to aggregate all the d-vectors per enroll speaker" print("Aggregates the activation (spk : %s)“ X (spk)) (pytorch3) admin(cladmin istra tor-8:-/Desktop/LGSR$ python enroll.py => loading checkpoint Start to aggregate all the d-vectors per enroll speaker
def main(): log_dir = 1modelsaved1 # Where the checkpoints are saved embedding_dir = 'enrollembeddings’ # Where embeddings are saved test_dir = ’feat_logfbank_nfilt40/test/' # Where test features are saved Settings usecuda = True # Use cuda or not embedding_size = 128 # Dimension of speaker embeddings cp_num = 24 # Which checkpoint to use? □classes = 240 # How many speakers in training data? testframes = 100 # Split the test utterance Load model from checkpoint model = load_model(use_cuda? log_dir, cp_num? embed di ng_size_, nclasses) “enroll_embeddings” #ВД tioh^ >< $m ^|g> #3£Cf Get the dataframe for test DE> enroll_DBj |test DB~|= s|plit_enroll_and_test(c. TEST_FEAT_DIR) Load enroll embeddings embeddings = load enroll embeddings (embedding dir) $ma 4^ def perforeidentification(use cuda, model, embeddings, testfilenaee, test fnames, spklist): testembedding = get embeddings(use cuda, test filename, model, testframes) ■ > maxscore = -10**8 best_spk = None for spk in spklist: score = F. cosine_siinilarity (test_embedding, embeddings[spk]) score = score.data.cpu().numpy() if score > max score: ч—е $^49^ >< snmia 4°i 34^ -Л-4Е 44 maxscore = score bestspk = spk #print("Speaker identification result : %s” Xbestspk) truespk = testfilename.split('/')[-2].split()[0] print("\n=== Speaker identification ===") print("True speaker : 5Cs\nPredicted speaker : %s\nResult : %s\n" X(true_spk, bestspk, trae_spk==best_spk)) return bestspk (pytorch3) admin(3admin i st rator-8:-/Desktop/LG_SR$ python identification.py => loading checkpoint === Speaker identification === True speaker : 230M4087 Predicted speaker : 230M4087 Result : True def main(): log_dir = 1modelsaved1 # Where the checkpoints are saved embedding_dir = 'enrollembeddings’ # Where embeddings are saved test_dir = ’feat_logfbank_nfilt40/test/' # Where test features are saved Settings usecuda = True # Use cuda or not embedding_size = 128 # Dimension of speaker embeddings cp_num = 24 # Which checkpoint to use? □classes = 240 # How many speakers in training data? testframes = 100 # Split the test utterance Load model from checkpoint model = load_model(use_cuda? log_dir, cp_num? embed di ng_size_, nclasses) “enroll_embeddings” #ВД tioh^ >< $m ^|g> #3£Cf Get the dataframe for test DE> enroll_DBj |test DB~|= s|plit_enroll_and_test(c. TEST_FEAT_DIR) Load enroll embeddings embeddings = load enroll embeddings (embedding dir) (“verification.py”) $w^ 44°l 949 44 * 349 #4£ (score) 44
*• scored thresh tl|H def perform verification(use cjda, model, embeddings, enroll_speaker, test_filename, test_fraires, thres): enrollembedding = embeddings[enrolIspeaker] testembedding = get_embeddings(use_cuda, testfilename, model, testframes) score = F. cosine_siinilarity(test_enibeddingJ enroll embedding) score = score.data.cpu().numpy() test_spk = test filename.split('/’)[-2]- split[0] print("\n=== Speaker verification ===“) print("True speaker: %s\nClaimed speaker : %s\n\nResult : %s\n" %(enroll_speaker, test_spk, result)) print("Score : 3®.4f\nThreshold : 3!0.2f\n“ %(score, thres)) (pytorch3) admin(cladmin i st rator-8:-/Desktop/LG_SR$ python verification.py => loading checkpoint === Speaker verification === True speaker: 230M4087 Claimed speaker : 230M4087 Result : Accept Score : 0.9556 Threshold : 0.95 (pytorch3) admin(9administrator-8:-/Desktop/LG_SR$ python verification.py => loading checkpoint === Speaker verification === True speaker: 230M4087 Claimed speaker : 207F2088 Result : Reject Score : 0.8026 Threshold : 0.95 £A1 ф train.py ф enroll.py ф identification.py ф verification.py
^ед& £AW ^^ ОД 2C S^^l hyperparameter Uffi^l^ S0°H-S-7l Loss function й^ 49 ^S. Э4 49 4i44 4x2 34^T< 249 #4£4 ££X| 49 threshold M ОД 443§ ^4 HR 4l$Wfc voxceleb DB> ^^ http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ Q^ loss function ^^ ex) center loss, angular softmax loss,... Q^ £g 3£ ex) VGGNet, LSTM,. Download 0.68 Mb. Do'stlaringiz bilan baham: |
1 2
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling