Stochastic Systems Group  

Learning Deep Boltzmann Machines
Ruslan Salakhutdinov
CSAIL, MIT
Building intelligent systems that are capable of extracting highlevel representations from highdimensional sensory data lies at the core of solving many AI related tasks, including object recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires deep architectures that involve many layers of nonlinear processing.
In this talk I will present a new learning algorithm for Deep Boltzmann Machines (Markov Random Fields) that contain many layers of hidden variables. Inferring the states of the hidden variables given some input can be performed using variational approaches, such as meanfield. Learning can then be carried out by applying a stochastic approximation procedure that uses Markov chain Monte Carlo (MCMC) to approximate a model's expected sufficient statistics, which is needed for maximum likelihood learning. The MCMC based approximation procedure provides nice asymptotic convergence guarantees and belongs to the general class of approximation algorithms of RobbinsMonro type. This unusual combination of variational methods and MCMC is essential for creating a fast learning algorithm for Deep Boltzmann Machines. I will further relate Deep Boltzmann Machines to a different class of probabilistic generative models called Deep Belief Networks.
Problems with this site should be emailed to jonesb@mit.edu