|Stochastic Systems Group|
The right measure of similarity between examples is crucial for any example-based approach to learning. Similarity is commonly defined in terms of a distance functions applied to two objects: they are deemed similar if the distance between them is below a threshold. Such a definition does not necessarily capture the inherent meaning of similarity, which tends to be application-specific. Moreover, the requirement to compute the distance everywhere in the space is unnecessarily difficult. When two points are sufficiently far apart it may be irrelevant exactly how far they are, and under some models, it may be irrelevant exactly how similar two similar points are.
We propose an approach to learning similarity that can be tailored to the task at hand. It is based on the intuition that while modeling a distance between objects is difficult, detecting whether two objects are similar or not is often easier. We pose the problem as a classificion task on pairs of examples, and propose an efficient learning algorithm for solving it. The algorithm is inspired by the ideas of boosting and locality-sensitive hashing, and builds the similarity detector as an ensemble of simple classifiers. In contrast to AdaBoost, the greedy selection process is optimized with the similarity search in mind, thus leading to a different selection criteria and weighting scheme. From the resulting ensemble classifier of example pairs we derive an embedding of individual examples into a space where state-of-the-art randomized algorithms, such as Locality-Sensitive Hashing, can be applied to efficiently find examples similar to a query. We demonstrate the benefits of the proposed framework in experiments on data sets arising in computer vision, where defining the right similarity and the ability to quickly search large, high-dimensional data sets are both important for success.
(Joint work with Trevor Darrell)
Problems with this site should be emailed to firstname.lastname@example.org