Abstract
Multimodal speech processing in which visual facial features are jointly processed with audio features is a rapidly advancing field. Lip movements and configurations provide useful information to improve speech and speaker recognition. However, the use of this visual information requires accurate and fast lip tracking algorithms. A new technique is outlined that is able to estimate the outer lip contour directly from a given lip intensity image via linear regression. An active shape model that is able to track speaker's lips without requiring time-consuming iterative energy minimization techniques can improve this estimate. Results of performance are presented against known tracking algorithms using the M2VTS database.