Контрольная работа Идентификация говорящего Проверка динамика Индивидуальная практика


Download 0.68 Mb.
bet1/2
Sana15.07.2023
Hajmi0.68 Mb.
#1660375
TuriКонтрольная работа
  1   2
Bog'liq
2019 LG SpeakerRecognition tutorial


LG Электроника
Речевой интеллект — распознавание говорящего
[Практика распознавания
TA: Young-Moon Jung
Советник: профессор
Хве-Лин Ким,

[Контактная информация. ] Юнг Ён Мун: dudans@kaist.ac.kr
порядок практики

  • Описание концепции

  • распознавание говорящего

  • метод d-вектора

  • Описание практики

  • подготовка

  • Регистрация

  • контрольная работа

  • Идентификация говорящего

  • Проверка динамика

Индивидуальная практика

Это голос говорящего А?

»

«Нет (отказ)»


распознавание говорящего (speaker recognition)
идентификация говорящего (speaker identification) и проверка динамика
(speaker verification)
неизвестный спикер —^входной голос
процесс распознавания говорящего
<Шаг регистрации>

// X f) I н
d-vector У-
. .^*1

  • У^У al (DNN, CNN, LSTM)> 01-g-

  • ^^ сШ# LfE|-^£- Ф^ИЧ!9 (speaker embedding)^ ^W

  • ^-*Ье-#7| (speaker classifier) ^^

■&^0| ^y ^•*l’£-#7|^ nF^4 hidden layer activation ил^ 0|§^
<4 d-vector ^W'


  • Average pooling

  • Spatial pyramid pooling

  • Learnable dictionary encoding

  • LSTM

d-vector <0-

VGGNet
ResNet



CNN 7|У^
:S@ 2*
—— : ^ E||^e 2S
ы^ §^ mX feature —

  • MFCC

  • Fbank

  • Spectrogram

Fully-connected layer with




git clonehttps://github.com/jymsuper/SpeakerRecognition tutorial

  • Pytorch 7|У (v1.0.0, pytho

  • pandas S0|“^ dataframe 40|§оЩ E||0|El ^H

^Я^ 40|“3S|

  • python 3.5+, pytorch, pandas, numpy, pickle, matplotlib

  • pip Ж^ Дс^З (anaconda,...)^- 0|§o^ ^X|

44^4 4


git clonehttps://github.com/jymsuper/SpeakerRecognition tutorial
DB
Ж< §4 т2@ DB# 0|§ (dean 44)

  • SNS33 УЯ- ££ DB (ETRI 4с-х||)

  • 1m W ¥SW, 0£ ' ’ 6kHz, 16bits

  • S3 : 240 g 44, 4 44 & 1001 ° I 4^

Feature (log mel filterbank energy feature)^ ы^—
4±E : 1093 44, 4 44 & 213 44

python_speech_features 40—^^# 0S44 feature S^

$Й ^щ


(1094 44)


11 103F3021
■l 207F2088


eg щ Члв
feature


ЧАЕ wav 4 s


[=1 enrol I. p

[=] test.p


(44 §д ^ чае §)


0 SNR166M2MIC035O51_ch01.p
0 SNR166M2MIC035O52_ch01.p
* © SNR166M2MICO35O53_chO1.p
0 SNR166M2MIC035054_ch01.p
0 SNR166M2MIC035O55_ch01.p
4 #4 & ioo?H°l 4^





АН ЕИ


!■ enroll embeddings


Й feat_logfbank_nfilt40


A configure.ру


“resnet.py”^^ §—IS ResNet ^—> UE-|2|-A-| custom model ^^


A enrol l.py


Pytorch^H *il^ofe §^ ResNet ^—


A identification.py
A loss_plot.png
A train.py


A verification.py


“resnet.py”




resnet-18, 34, 50, 101, 152


def res ne tl8(|pretrained = False, **kwargs):
"""Constructs
a ResNet-18 model.






class ResNet(nn.Module):
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
def init (self, block, layers, num_classes=1000, in_channels=l):

self.inplanes = 16
model = ResNet(BasicBlock, [2, 2, 2, 2], *xkwargs)



self.convl =
self.bnl = n self.relu = self.maxpool self.layerl self.layer? self.layerl self.Iayer4

nn.Conv2d(in_channels, 16, kernel_size=7, stride=l, bias=False) # ori : stride = 2
i.BatchNorm2d(16)
in.ReLU(inplace=True)
= nn.MaxPool2d(kernel_size=3, stride=2, padding=l)

= self. make_layer(block, = self._make_layer(block, = self._make_layer(block, = self._make_layer(block,

16,
32,
64,
128

layers[0])
layers[l], stride=2)
layers[2], stride=2)
layers[3], stride=2)

self.avgpool
self.fc = nn

= nn.AvgPool2d(l, stride=l
.Linear(128 * block.expansi

Д

numclasses)




super(ResNet, self). init ()

padding=3,

oo.load_url(model_urls[’resnetlS']))


conv layer°| channel Ht


if pretrained:
model.load_state_dict(model_
return model
. 44 layerl, 2, 3, 4^1 4^

  • Residual block£| Ht

layerl ЭД^ block 2H, layer2^H block 2H, layer3^H block 2H, layer4^H block 2H

  • 4 blocks 2H—I conv layers ^^

  • > layer Ht

convl 1H + layerl 4H + layer2 4H +
layer3 4H + layer4 4H + fC layer 1H = 18H



"BasicBlock"


"resnet.py"
2 conv layers
+
Residual connection

class BasicBlock(nn.Module): expansion = 1
def init (self, inplanes, planes, stride=l, downsample=None):
super(BasicBlock, self). init ()
self.convl = conv3x3(inplanes, planes, stride) self.bnl = nn.BatchNormid(planes) self.reLu = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bnl = nn.BatchNorm2d(planes) self.downsample = downsample self.stride = stride
def forwardfself, x): residual = x out = self.convl(x) out = self.bnl(out) out = self.relu(out) out = self.conv2(out)
out = self.bnl(out)
if self.downsample is not None: residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
"model.ру"
class background_resnet(nn.Module): def init (self, embedding_size, numclasses, backbone='resnetlS'):
super(background_resnetself). init ()
self.backbone = backbone # copying modules from pretrained models if backbone == 'resnet50':
self.pretrained = resnet. resnet50(pretraiified=False) elif backbone == 'resnetl01':
self.pretrained = resnet.resnetl01(pretrained=False) elif backbone == 1resnetl52':
self.pretrained = resnet.resnetl52(pretrained=False) elif backbone == 1resnetlS’:
self.pretrained = resnet.resnetl8(pretrained=False) elif backbone == 'resnet34':
self.pretrained = resnet.resnet34(pretrained=False) else:
raise RuntimeErrar('unknown backbone: {}'.format(backbone))
self.fcO = nn.Linear(128, embedding_size) self.bnO = nn.BatchNormld(embedding_size) self.relu = nn.ReLU() self.last = nn. Linear(embedding_size_, numclasses)

100 frames


40dim




АН ЕИ
!■ enroll embeddings
В feat_logfbank_nfilt40
!■ model
М model saved

Q checkpoint 24.ptfi

*|§Я£- а^°| checkpoint
§{ E||^e A| 0| checkpoint# #3>S
Bl test wavs
|=| DB_wav_reader.py
0 README.md a SRDataset.py
a configure.py
a enrol l.py
a identification.py
a loss_plot.png
a train.py a verification.py

$Й ^щ


!■ enroll embeddings
В feat_logfbank_nfilt40
!■ model
Bl model saved
Bl test wavs
|=|
DB_wav_reader.py
0 README.md
§ SRDataset.py
a configure.py
a enrol l.py
a identification.py
a loss_plot.png
a train.py


a verification.py


B 103F3021.pth
В 207F2088.pth
В 213F51OO.pth
В 217F3038.pth
В 225M4062.pth
В 229M2031.pth
В 230M4087.pth
В 233F4O13.pth
В 236M3043.pth
В 240M3063.pth


"enroll.py”^^ ^^S ^!Sf0, 1093 949 43» JfS-
“identification.py” (3^^s) ^ “verification.py” (3^9§)4^ Olo-







Download 0.68 Mb.

Do'stlaringiz bilan baham:
  1   2




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling