PhD student at NUS

Xintong Wang 王心童

I am 28.77 years old, a third-year PhD student at the National University of Singapore, supervised by Prof. Wang Ye. My research focuses on real-time and data-efficient methods for audio and audio-language models.

I earned my B.S. from Beijing Forestry University in 2022. Email: .

Automatic Speech Recognition (ASR) Text-to-Speech (TTS) Speech Language Model (Speech LM)
Xintong Wang
Interactive

Playaround

A small place for interactive demos around speech models.

Running Agents 1 Whisper-Pinyin Demo Transcribe Mandarin speech to Pinyin text walston Hugging Face Space
Selected papers

Publications

Conference Articles

  1. Xintong Wang, Mingqian Shi, and Ye Wang, "Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis," Interspeech 2024. Oral PDF
  2. Junchuan Zhao, Xintong Wang, and Ye Wang, "Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning," Interspeech 2025. arXiv

Workshop Articles

  1. Xintong Wang, Chang Zeng, Jun Chen, and Chunhui Wang, "Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers," 2023 IEEE ASRU Workshop. Demo arXiv

Journal Articles

  1. Xintong Wang, Chuangang Zhao, "A 2D Convolutional Gating Mechanism for Mandarin Streaming Speech Recognition," Information, 12.4 (2021): 165. PDF
Experience

Work Experience

Oct 2023 - Aug 2024 Research Assistant

Sound and Music Computing Lab, School of Computing, National University of Singapore, Singapore

Jul 2022 - Oct 2023 Machine Learning Engineer

X Studio, Xiaoice, Beijing

May 2021 - Jul 2022 Intern

AI Being BU, Xiaoice, Beijing

Visitors

Visitor Map

Interests

Interests

I often boulder at FitBloc in Singapore. Feel free to reach out if you would like to chat about research or climbing.