Abstract
Extracting moving objects from a video shot provides a good low-level representation of videos. It provides object trajectory, color, shape characteristics. Combined with specific domain knowledge, it can be a powerful cue as what is going in a video shot. This paper proposes an unsupervised moving object extraction/tracking system that attempts to capture salient moving objects from an image sequence. The novelty of the proposed system lies in that it requires no object initialization and it is aimed to tolerate noisy segmentations at individual frame level. A temporal stack structure is used as a memory device to filter and learn salient objects. The learning of moving object takes a bottom-up approach, moving from independent motion segmentation results at each frame level to a learned whole object characteristics.