Контрольная работа Идентификация говорящего Проверка динамика Индивидуальная практика
Download 0.68 Mb.
|
1 2
Bog'liq2019 LG SpeakerRecognition tutorial
- Bu sahifa navigatsiya:
- Ф^ИЧ! 9 (speaker embedding)
- Average pooling
- ResNet
- Fbank
LG Электроника Речевой интеллект — распознавание говорящего [Практика распознавания TA: Young-Moon Jung Советник: профессор Хве-Лин Ким, [Контактная информация. ] Юнг Ён Мун: dudans@kaist.ac.kr порядок практики Описание концепции распознавание говорящего метод d-вектора Описание практики подготовка Регистрация контрольная работа Идентификация говорящего Проверка динамика Индивидуальная практика Это голос говорящего А? » «Нет (отказ)» распознавание говорящего (speaker recognition) • идентификация говорящего (speaker identification) и проверка динамика (speaker verification) неизвестный спикер —^входной голос процесс распознавания говорящего <Шаг регистрации> // X f) I н d-vector У- . .^*1 У^У al (DNN, CNN, LSTM)> 01-g- ^^ сШ# LfE|-^£- Ф^ИЧ!9 (speaker embedding)^ ^W ^-*Ье-#7| (speaker classifier) ^^ ■&^0| ^y ^•*l’£-#7|^ nF^4 hidden layer activation ил^ 0|§^ <4 d-vector ^W' Average pooling Spatial pyramid pooling Learnable dictionary encoding LSTM d-vector <0- • VGGNet • ResNet CNN 7|У^ ► :S@ 2* —— : ^Д ^ E||^e 2S ы^ §^ mX feature — MFCC Fbank Spectrogram Fully-connected layer with Cross entropy loss Center loss Angular softmax loss git clonehttps://github.com/jymsuper/SpeakerRecognition tutorial Pytorch 7|У (v1.0.0, pytho pandas S0|“^ dataframe 40|§оЩ E||0|El ^H • ^Я^ 40|“3S| python 3.5+, pytorch, pandas, numpy, pickle, matplotlib pip Ж^ Дс^З (anaconda,...)^- 0|§o^ ^X| 44^4 4 git clonehttps://github.com/jymsuper/SpeakerRecognition tutorial • DB • Ж< §4 т2@ DB# 0|§ (dean 44) SNS33 УЯ- ££ DB (ETRI 4с-х||) 1m W ¥SW, 0£ ' ’ 6kHz, 16bits S3 : 240 g 44, 4 44 & 1001 ° I 4^ • Feature (log mel filterbank energy feature)^ ы^— • 4±E : 1093 44, 4 44 & 213 44 44 44 ^ q|^E S, 15£ S^ wav 44 ^ feature HS ы^— • python_speech_features 40—^^# 0S44 feature S^ $Й ^щ (1094 44) 11 103F3021 ■l 207F2088 eg щ Члв feature ЧАЕ wav 4 s [=1 enrol I. p ► [=] test.p (44 §д ^ чае §) 0 SNR166M2MIC035O51_ch01.p 0 SNR166M2MIC035O52_ch01.p * © SNR166M2MICO35O53_chO1.p 0 SNR166M2MIC035054_ch01.p 0 SNR166M2MIC035O55_ch01.p 4 #4 & ioo?H°l 4^ АН ЕИ !■ enroll embeddings Й feat_logfbank_nfilt40 A configure.ру “resnet.py”^^ §—IS ResNet ^—> UE-|2|-A-| custom model ^^ A enrol l.py Pytorch^H *il^ofe §^ ResNet ^— A identification.py A loss_plot.png A train.py A verification.py
resnet-18, 34, 50, 101, 152 def res ne tl8(|pretrained = False, **kwargs): """Constructs a ResNet-18 model. class ResNet(nn.Module): Args: pretrained (bool): If True, returns a model pre-trained on ImageNet def init (self, block, layers, num_classes=1000, in_channels=l): self.inplanes = 16 model = ResNet(BasicBlock, [2, 2, 2, 2], *xkwargs)
super(ResNet, self). init () padding=3, oo.load_url(model_urls[’resnetlS'])) conv layer°| channel Ht if pretrained: model.load_state_dict(model_ return model . 44 layerl, 2, 3, 4^1 4^ Residual block£| Ht layerl ЭД^ block 2H, layer2^H block 2H, layer3^H block 2H, layer4^H block 2H 4 blocks 2H—I conv layers ^^ > layer Ht convl 1H + layerl 4H + layer2 4H + layer3 4H + layer4 4H + fC layer 1H = 18H "BasicBlock" "resnet.py" 2 conv layers + Residual connection class BasicBlock(nn.Module): expansion = 1 def init (self, inplanes, planes, stride=l, downsample=None): super(BasicBlock, self). init () self.convl = conv3x3(inplanes, planes, stride) self.bnl = nn.BatchNormid(planes) self.reLu = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bnl = nn.BatchNorm2d(planes) self.downsample = downsample self.stride = stride def forwardfself, x): residual = x out = self.convl(x) out = self.bnl(out) out = self.relu(out) out = self.conv2(out) out = self.bnl(out) if self.downsample is not None: residual = self.downsample(x) out += residual out = self.relu(out) return out "model.ру" class background_resnet(nn.Module): def init (self, embedding_size, numclasses, backbone='resnetlS'): super(background_resnetself). init () self.backbone = backbone # copying modules from pretrained models if backbone == 'resnet50': self.pretrained = resnet. resnet50(pretraiified=False) elif backbone == 'resnetl01': self.pretrained = resnet.resnetl01(pretrained=False) elif backbone == 1resnetl52': self.pretrained = resnet.resnetl52(pretrained=False) elif backbone == 1resnetlS’: self.pretrained = resnet.resnetl8(pretrained=False) elif backbone == 'resnet34': self.pretrained = resnet.resnet34(pretrained=False) else: raise RuntimeErrar('unknown backbone: {}'.format(backbone)) self.fcO = nn.Linear(128, embedding_size) self.bnO = nn.BatchNormld(embedding_size) self.relu = nn.ReLU() self.last = nn. Linear(embedding_size_, numclasses) 100 frames 40dim АН ЕИ !■ enroll embeddings В feat_logfbank_nfilt40 !■ model М model saved Q checkpoint 24.ptfi *|§Я£- а^°| checkpoint §{ E||^e A| 0| checkpoint# #3>S Bl test wavs |=| DB_wav_reader.py 0 README.md a SRDataset.py a configure.py a enrol l.py a identification.py a loss_plot.png a train.py a verification.py $Й ^щ !■ enroll embeddings В feat_logfbank_nfilt40 !■ model Bl model saved Bl test wavs |=| DB_wav_reader.py 0 README.md § SRDataset.py a configure.py a enrol l.py a identification.py a loss_plot.png a train.py a verification.py B 103F3021.pth В 207F2088.pth В 213F51OO.pth В 217F3038.pth В 225M4062.pth В 229M2031.pth В 230M4087.pth В 233F4O13.pth В 236M3043.pth В 240M3063.pth "enroll.py”^^ >Д ^^S ^!Sf0, 1093 949 43» JfS- “identification.py” (3^^s) ^ “verification.py” (3^9§)4^ Olo- 4> Download 0.68 Mb. Do'stlaringiz bilan baham: |
1 2
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling