Accent (zh-SG) TTS Demos

End-to-End Mandarin Text-to-Speech

Note: Pipeline: acoustic + vocoder

Following examples are 4 speakers from the AISHELL3 dataset, aim to test the capability of TTS reconstration.

Model 较二零一三年同期增长二十一万约百分之零九一 并不肯定智能手机战略是否将被用于笔记本 黄伟文歌曲有什么 请求依法撤销中卫市中级人民法院的民事裁定书
Ground truth
Grad-tts Transformer Encoder
Grad-tts Transformer Encoder 255 epoch
(M1) Grad-tts Conformer Encoder
(M2) Grad-tts Conformer Encoder w/o Blank

End-to-End Accent Text-to-Speech

Note: Pipeline: acoustic + vocoder

Following examples are 4 speakers from the SG dataset, aim to test the capability of accent TTS reconstration.

Model 甚至是希望,尤其是现在的讲华语行动啊 而去断定他是一个怎么样的人 没有我的意思是,就假如说 我也不知道,我知道乾隆就是是真的啦,但是还珠格格,这个人我就不知道
Ground truth
(M1) SG data
(M2) Mandarin + SG data w/ accent embedding
(M3) Mandarin + SG data w/ blank + conformer encoder + Global accent token
(M4) Mandarin + SG data w/ blank + conformer encoder + Global accent token + GAT loss

Following examples are inputing SG style text 我也不知道,我知道乾隆就是是真的啦,但是还珠格格,这个人我就不知道, generating examples conditioned on SG and CN speakers, and SG and CN accent, testing the generative capability and disentangle between text, speaker and accent

Model Speaker SG-text CN-text
SG-acc CN-acc SG-acc CN-acc
(E8) SG-G0001
(E8) SG-G0002
(E8) SG-G0004
(E8) CN-SSB0590
(E8) CN-SSB0632
(E8) CN-SSB1837

Following examples are inputing CN style text 请求依法撤销中卫市中级人民法院的民事裁定书

Model CN-spk SSB1837 | SG-acc SG-spk G0001 | SG-acc CN-spk SSB1837 | CN-acc SG-spk G0001 | CN-acc
(M3) Mandarin + SG data w/ accent embedding timestampes=50
(M4) Mandarin + SG data w/ accent embedding timestampes=10 + gradient reversal (spk, acc)

Vocoder:

Note: Following examples are inputing gt log mel-spectrogram, generating audio sample for verifying audio quaility

-->
Text bake famle aishell3 SG
Ground truth
(D0) bakefemale + aishell3 + SG data training
(D1) bakefemale training
(D2) bakefemale training + low band + feat
(D3) SG finetuning on D1
(D4) aishell3 training
(D5) SG finetuning on D4