Problem being addressed
While voice recognition is an area with a wide application potential, choosing a suitable model is a difficult task, especially when the training resources are scarce.
A training technique for speaker identification models, presenting a new independent text and multilingual model, making important contributions mainly to languages with few available resources. The suggested model shows its generalization abilities for other datasets and languages.
Advantages of this solution
The results show that embeddings generated by artificial neural networks are competitive when compared to classical approaches for the task. Results suggest that the models can perform language independent speaker identification. Finally, the models can scale and can handle more speakers than they were trained for, identifying 150% more speakers while still maintaining 55% accuracy.
Possible New Application of the Work
Voice recognition is widely used in many applications, such as intelligent personal assistants, telephone- banking systems, automatic question response, among others. Additionally, the model proposed here can be used in tasks such as speech synthesis, voice cloning, and cross-lingual voice conversion. In these tasks, speaker identification system embeddings are used to represent the speaker. The model presented is useful for cross-lingual voice conversion due to its language-independent feature.
Source URL: #############