Go home now Header Background Image
Submission Procedure
share: |
Follow us
Volume 26 / Issue 4

available in:   PDF (411 kB) PS (1 MB)
Similar Docs BibTeX   Write a comment
Links into Future


Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding

Milan Sečujski (University of Novi Sad, Serbia)

Darko Pekar (AlfaNum Speech Technologies Ltd., Serbia)

Siniša Suzić (University of Novi Sad, Serbia)

Anton Smirnov (AlfaNum Speech Technologies Ltd., Serbia)

Tijana Nosek (University of Novi Sad, Serbia)

Abstract: The paper presents a novel architecture and method for training neural networks to produce synthesized speech in a particular voice and speaking style, based on a small quantity of target speaker/style training data. The method is based on neural network embedding, i.e. mapping of discrete variables into continuous vectors in a low-dimensional space, which has been shown to be a very successful universal deep learning technique. In this particular case, different speaker/style combinations are mapped into different points in a low-dimensional space, which enables the network to capture the similarities and differences between speakers and speaking styles more efficiently. The initial model from which speaker/style adaptation was carried out was a multi-speaker/multi-style model based on 8.5 hours of American English speech data which corresponds to 16 different speaker/style combinations. The results of the experiments show that both versions of the obtained system, one using 10 minutes and the other as little as 30 seconds of target data, outperform the state of the art in parametric speaker/style-dependent speech synthesis. This opens a wide range of application of speaker/style dependent speech synthesis based on small quantities of training data, in domains ranging from customer interaction in call centers to robot-assisted medical therapy.

Keywords: deep neural networks, embedding, speaker adaptation, text-to-speech synthesis

Categories: H.1.2, H.5.2, I.2.4, I.2.6