|Stochastic Systems Group|
Describing Objects and Scenes using Transformed Dirichlet Processes
SSG, LIDS, MIT
Object recognition systems use the features composing a visual scene to localize and categorize the objects depicted in an image. In this talk, we describe a family of hierarchical generative models for objects, the parts composing them, and the scenes surrounding them. By augmenting topic models developed for text analysis with spatial transformations, we share information between related recognition tasks while avoiding coarse "bag of words" approximations.
Our models are based on the transformed Dirichlet process (TDP), an extension of the hierarchical DP which shares stochastically transformed mixtures of variable size. For images of isolated objects, TDPs allow the number of parts underlying an object's appearance to be estimated from training data, improving robustness. For visual scenes, mixture components describe the spatial structure of visual features in an object--centered coordinate frame, while transformations model the object positions underlying a particular image. Empirical results on several datasets demonstrate the benefits of transferring information between related recognition tasks, and the importance of incorporating spatial structure into models of object and scene appearance.
Problems with this site should be emailed to firstname.lastname@example.org