Abstract
Sensor fusion is one of the fundamental issues to develop intelligent systems that recognize the scene around them precisely and robustly. Previous approaches of sensor fusion combined different kind of sensors after feature extraction and abstraction ("task-level fusion"). This paper proposes a new approach that combines sensory signals from different kind of sensors before abstraction ("signal-level fusion"). By formalizing sensory fusion as an optimization that maximizes mutual information between sensory signals, a target in a changing scene is detected by heuristic search algorithm. As an example, experimental results of a sound source detection with one video camera one microphone are shown.