Stochastic Systems Group  

There is considerable interest in multimodal signal processing (alternatively sensor fusion). Mixed modal processing problems are often formulated within a maximum a posteriori (MAP) or maximum likelihood (ML) estimation framework. Simplifying assumptions about the joint statistics are often made in order to yield tractable analytic forms. These assumptions may not be appropriate for fusing modalities such as video and audio. The joint statistics for these and many other mixed modal signals are not well understood and are not wellmodeled by simple densities such as multivariate exponential distributions. This suggests that a nonparametric statistical approach may be warranted. It can be shown that in the nonparametric statistical framework MAP and ML are equivalent to the information theoretic concepts of mutual information and entropy. We briefly discuss a previously reported nonparametric learning approach which incorporates concepts from information theory. In order to demonstrate the efficacy of the approach, we present examples of mixedmodal sensor fusion for audio/video data. A consequence of learning a joint statistical model is that we are able to both localize the source within the video and enhance the associated audio. I conjecture that by fusing the measurements in such a manner one may infer "something" about the independent causes of the measurements.
Problems with this site should be emailed to jonesb@mit.edu