Stochastic Systems Group  

Prof. Michael I. Jordan
Departments of EECS and Statistics
University of California at Berkeley
Latent variables are both a boon and a bane to the enterprise of probabilistic graphical modeling, providing an essential notion of abstraction, but requiring what are often rather arbitrary choices of distributions and parameterizations. It would be preferable to be able to treat graphical models involving latent variables "semiparametrically," avoiding such arbitrary choices. In the graphical model setting, the semiparametric likelihood generally involves some form of mutual information as its infinitedimensional component, characterizing departures from (conditional) independence. Thus carrying out a semiparametric programme for graphical models requires us to develop methods for approximating and minimizing mutual information.
Independent component analysis (ICA) refers to an important class of bipartite graphical models appropriate for problems in which a set of latent "sources" must be recovered from an unknown set of linear mixtures of these sources. ICA is meaningful only for nonGaussian latent variables, but beyond the nonGaussianity the distribution of the latent variables is assumed to be unknown. The problem is thus best posed as a semiparametric problemwe wish to estimate the mixing matrix under any (nonGaussian) distribution for the sources.
We present a novel, semiparametric approach to ICA that is based on the optimization of canonical correlations in a reproducing kernel Hilbert space (RKHS). We show that this approach leads to ICA algorithms that significantly improve the stateoftheart, handling a wide variety of source distributions, nearGaussian distributions, and data contamination. We also make the link to mutual information, showing that one of our RKHS contrast functions can be viewed as a general approximation to mutual information "around independence".
[Joint work with Francis Bach]
Problems with this site should be emailed to jonesb@mit.edu