Jiang Chen, Yu Jun, Luo Changwei, Li Rui, Wang Zengfu. Speech visualization system based on physiological tongue model[J]. Journal of Image and Graphics, 2015, 20(9): 1237-1246. DOI: 10.11834/jig.20150911.
Speech synchronized tongue animation remains lacking in research. Under this background
this paper proposes a physiology-based tongue animation system. First
an accurate physiology-based tongue model is created
the deformation of which can be driven by muscle activations. Second
the model is utilized to produce numerous tongue deformation samples according to numerous designed muscle activations. With these samples
a neural network that can transform muscle activations to tongue deformation is trained. Then
from the 2D tongue deformation results on tongue X-ray data
the corresponding physemes (muscle activation and rigid movement sequences) are estimated with this neural network. Lastly
speech synchronized tongue animation is synthesized by inputting these physemes into the tongue model for simulation. Experiment results demonstrate that the proposed system can produce realistic-sounding voices and visually realistic speech synchronized tongue animation. The system can be used to build a phonemes-physemes database from collected 2D tongue movement data on Mandarin Chinese or other languages and can synthesize highly realistic tongue animation corresponding to the language.