|Stochastic Systems Group|
There is considerable interest in multi-modal signal processing (alternatively sensor fusion). Mixed modal processing problems are often formulated within a maximum a posteriori (MAP) or maximum likelihood (ML) estimation framework. Simplifying assumptions about the joint statistics are often made in order to yield tractable analytic forms. These assumptions may not be appropriate for fusing modalities such as video and audio. The joint statistics for these and many other mixed modal signals are not well understood and are not well-modeled by simple densities such as multi-variate exponential distributions. This suggests that a nonparametric statistical approach may be warranted. It can be shown that in the nonparametric statistical framework MAP and ML are equivalent to the information theoretic concepts of mutual information and entropy. We briefly discuss a previously reported nonparametric learning approach which incorporates concepts from information theory. In order to demonstrate the efficacy of the approach, we present examples of mixed-modal sensor fusion for audio/video data. A consequence of learning a joint statistical model is that we are able to both localize the source within the video and enhance the associated audio. I conjecture that by fusing the measurements in such a manner one may infer "something" about the independent causes of the measurements.
Problems with this site should be emailed to firstname.lastname@example.org