Stochastic Systems Group  

Dr. Harald Steck
AI Lab, MIT
In this talk, we present a principled approach to structure learning of Bayesian networks. The first part of our presentation focuses on how Bayesian regularization using a product of independent Dirichlet priors over the model parameters affects the learned model structure in a domain with discrete variables. We show that a small scale parameter  often interpreted as "equivalent sample size" or "prior strength"  leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In particular, the empty graph is obtained in the limit of a vanishing scale parameter. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parameters as expected, the scale parameter balances a tradeoff between regularizing the parameters vs. the structure of the model. We demonstrate the benefits of optimizing this tradeoff in the sense of predictive accuracy.
In the second part of our talk, we develop an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensure the optimal tradeoff between goodness of fit and model complexity (including the number of discretization levels). Using the socalled finest grid implied by the data, the resulting scoring function depends only on the number of data points in the various discretization levels. Because of that, this scoring function cannot only be computed efficiently, but it is  surprisingly  also independent of the metric used in the continuous space. In our experiments, we show the crucial impact of discretization on graph structure.
Joint work with Tommi Jaakkola.
Problems with this site should be emailed to jonesb@mit.edu