End-to-End Mandarin Text-to-Speech
Note: Pipeline: acoustic + vocoder
Following examples are 4 speakers from the AISHELL3 dataset, aim to test the capability of TTS reconstration.
Model | 较二零一三年同期增长二十一万约百分之零九一 | 并不肯定智能手机战略是否将被用于笔记本 | 黄伟文歌曲有什么 | 请求依法撤销中卫市中级人民法院的民事裁定书 |
---|---|---|---|---|
Ground truth | ||||
Grad-tts Transformer Encoder | ||||
Grad-tts Transformer Encoder 255 epoch | ||||
(M1) Grad-tts Conformer Encoder | ||||
(M2) Grad-tts Conformer Encoder w/o Blank |
End-to-End Accent Text-to-Speech
Note: Pipeline: acoustic + vocoder
Following examples are 4 speakers from the SG dataset, aim to test the capability of accent TTS reconstration.
Model | 甚至是希望,尤其是现在的讲华语行动啊 | 而去断定他是一个怎么样的人 | 没有我的意思是,就假如说 | 我也不知道,我知道乾隆就是是真的啦,但是还珠格格,这个人我就不知道 |
---|---|---|---|---|
Ground truth | ||||
(M1) SG data | ||||
(M2) Mandarin + SG data w/ accent embedding | ||||
(M3) Mandarin + SG data w/ blank + conformer encoder + Global accent token | ||||
(M4) Mandarin + SG data w/ blank + conformer encoder + Global accent token + GAT loss |
Following examples are inputing SG style text 我也不知道,我知道乾隆就是是真的啦,但是还珠格格,这个人我就不知道, generating examples conditioned on SG and CN speakers, and SG and CN accent, testing the generative capability and disentangle between text, speaker and accent
Model | Speaker | SG-text | CN-text | ||
---|---|---|---|---|---|
SG-acc | CN-acc | SG-acc | CN-acc | ||
(E8) | SG-G0001 | ||||
(E8) | SG-G0002 | ||||
(E8) | SG-G0004 | ||||
(E8) | CN-SSB0590 | ||||
(E8) | CN-SSB0632 | ||||
(E8) | CN-SSB1837 |
Following examples are inputing CN style text 请求依法撤销中卫市中级人民法院的民事裁定书
Model | CN-spk SSB1837 | SG-acc | SG-spk G0001 | SG-acc | CN-spk SSB1837 | CN-acc | SG-spk G0001 | CN-acc |
---|---|---|---|---|
(M3) Mandarin + SG data w/ accent embedding timestampes=50 | ||||
(M4) Mandarin + SG data w/ accent embedding timestampes=10 + gradient reversal (spk, acc) |
Vocoder:
Note: Following examples are inputing gt log mel-spectrogram, generating audio sample for verifying audio quaility
Text | bake famle | aishell3 | SG |
---|---|---|---|
Ground truth | |||
(D0) bakefemale + aishell3 + SG data training | |||
(D1) bakefemale training | |||
(D2) bakefemale training + low band + feat | |||
(D3) SG finetuning on D1 | |||
(D4) aishell3 training | -->|||
(D5) SG finetuning on D4 |