Speaker Recognition

Speaker Recognition
The BGU 2018 NIST Speaker Recognition Evaluation System Git '''Basic knowledge of speaker recognition: additional reading:
 * Speaker recognition by machine and human
 * AUTOMATIC SPEECH RECOGNITION (ASR) lecture about Speaker verification
 * A tutorial on speaker verification

Frontend: '''Feature Extraction (MFCC):
 * The dummy’s guide to MFCC
 * Mel Frequency Cepstral Coefficient (MFCC) tutorial

Voice Activity Detection (VAD):
 * Voice Activity Detection (VAD) Tutorial

X-VECTOR and TDNN architecture-in this section is best to watch each YouTube tutorial before reading, or read and watch tutorial simultaneously
 * A time delay neural network (TDNN) architecture for efficient modeling of long temporal contexts
 * YouTube explanation on TDNN
 * X-VECTORS: ROBUST DNN EMBEDDINGS FOR SPEAKER RECOGNITION
 * YouTube summary of X-VECTORS: ROBUST DNN EMBEDDINGS FOR SPEAKER RECOGNITION article

Factorized TDNN-
 * Using SVD for compressed DNN weight matrix
 * SEMI-ORTHOGONAL LOW-RANK MATRIX FACTORIZATION FOR DEEP NEURAL NETWORKS

Backend:  Linear Discriminant Analysis(LDA) - in this section is best to watch the YouTube tutorial before reading.
 * YouTube explanation of Linear Discriminant Analysis (LDA)
 * LINEAR DISCRIMINANT ANALYSIS - A BRIEF TUTORIAL

 Probabilistic Linear Discriminant Analysis(PLDA)- in this section first article is enough for understanding PLDA, read addition articles to get more extensive knowledge. Mandatory reading: additional reading:
 * Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification
 * Probabilistic Linear Discriminant Analysis- good explanation but most example for image recognition and not speaker recognition
 * Nonparametrically Trained Probabilistic Linear Discriminant Analysis for i-Vector Speaker Verification
 * From single to multiple enrollment i-vectors: practical PLDA scoring variants for speaker verification

Speaker diarization:
 * Domain Adaptation and Speaker Diarization for Speaker Recognition

DOCKER for Speaker recognition
Here are steps to build a DOCKER that support KALDI, SOX, Python and other speaker recognition tools It is very recommended to buil the docker on a server and mounted it to disk with enough space to include the KALDI project, the data and the Automatic speaker recognition components

1. Download Image docker pull kaldiasr/kaldi

2. Docker Initialization - In order to open a new docker use the following command sudo nvidia-docker run -it --mount type=bind,source=/*path to the disk*,target=/common_space_docker/ -p 1234:22 --name  --runtime=nvidia kaldiasr/kaldi:gpu-latest
 * -p 1234:22 is mean the connected ports in the server and in the virtual machine. check first if it free

3. Elias to python3 - the "python" command on the docker refer to python2, we will change it to python3 alias python='python3'

4. ssh-server activation

Enable the ssh command, run the following command inside the docker container. apt update && apt install -y openssh-server mkdir /var/run/sshd echo 'root:123456' | chpasswd

The last command changes root password to 123456. sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd echo "export VISIBLE=now" >> /etc/profile service ssh restart

5. Using the SSH to run the docker - connect the docker from your personal computer or via VPN ssh -p  root@ for example: ssh -p 1234 root@132.72.48.87

6. If the server is down, repeat the steps from step 2