Abstract
In this paper, we develop a real time lip-synch system that activates 2-D avatar?s lip motion in synch with incoming speech utterance. To realize the "real time" operation of the system, we contain the processing time by invoking merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply the support vector machine (SVM) to constrain the computational load while attaining the desirable accuracy. The coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with a 2-D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.