PhD student at NUS

Xintong Wang 王心童

I am 28.77 years old, a third-year PhD student at the National University of Singapore, supervised by Prof. Wang Ye. My research focuses on a wide range of tasks, from speech understanding to generation.

I earned my B.S. from Beijing Forestry University in 2022. Email: xintongwang9709 at gmail dot com.

CV Scholar GitHub HuggingFace

Automatic Speech Recognition (ASR) Text-to-Speech (TTS) Speech Language Model (Speech LM)

Interactive

Playaround

A small place for interactive demos around speech models.

Running Agents 1 Whisper-Pinyin Demo Transcribe Mandarin speech to Pinyin text

Selected papers

Publications

Conference Articles

Xintong Wang, Mingqian Shi, and Ye Wang, "Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis," Interspeech 2024. Oral PDF
Junchuan Zhao, Xintong Wang, and Ye Wang, "Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning," Interspeech 2025. arXiv

Workshop Articles

Xintong Wang, Chang Zeng, Jun Chen, and Chunhui Wang, "Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers," 2023 IEEE ASRU Workshop. Demo arXiv

Journal Articles

Xintong Wang, Chuangang Zhao, "A 2D Convolutional Gating Mechanism for Mandarin Streaming Speech Recognition," Information, 12.4 (2021): 165. PDF

Experience

Work Experience

Oct 2023 - Aug 2024 Research Assistant

Sound and Music Computing Lab, School of Computing, National University of Singapore, Singapore

Jul 2022 - Oct 2023 Machine Learning Engineer

X Studio, Xiaoice, Beijing

May 2021 - Jul 2022 Intern

AI Being BU, Xiaoice, Beijing

Visitors

Visitor Map

Interests

I often boulder at FitBloc in Singapore. Feel free to reach out if you would like to chat about research or climbing.