|Stochastic Systems Group|
Labeling Sequential Data Using Local Consistencies
Myung Jin Choi
In this talk, we discuss the problem of extracting entities from semi-structured text such as web pages or paper bibliographies. While Conditional Random Fields (CRF) provide great flexibilities in using many overlapping and dependent features, they only allow to learn global patterns that occur in a large collection of documents. However, some of these features have "limited scope" and are applicable only to a certain subset of the data. For example, in bibliographies, one paper may use quotations to indicate paper titles while another paper may use an italics font. We introduce scoped learning (Blei, Bagnell, and McCallum 2002) that takes advantage of such local consistencies in previously unseen data. Then, we investigate several directions to incorporate local consistencies into CRF.
Joint work with Mukund Narasimhan and Paul Viola.
Problems with this site should be emailed to email@example.com