Stochastic Systems Group
Home Research Group Members Programs  
Demos Calendar Publications Mission Statement Alumni

SSG Seminar Abstract


Informal Group Discussion:
Issues with Nonparametric Density Estimations

John Fisher, Alex Ihler, Junmo Kim, and Andrew Kim
SSG


Nonparametric methods are rapidly gaining popularity in statistical analysis. This is in large part due to their exemption from the strict restraints imposed by parametric models allowing a more accurate characterization of the underlying distribution generating a set of data. Previously, these methods were too computationally intensive to be of practical use, but in many cases, this is no longer a constraint with the advent of faster computers.

Our discussion will focus on issues related to nonparametric density estimation and techniques built around them. The discussion will probably ask as many questions as it answers (perhaps more) since our objective is not to merely summarize published results, but to discuss how appropriate their analysis is and what, if anything, is lacking. The discussion will naturally start with the quintessential Parzen-Rosenblatt kernel-based density estimator with some discussion on the k-Nearest Neighbor density estimator. As is usually the case with nonparametric techniques, these estimators do not easily lend themselves to finite sample analysis and most of the results are for the asymptotic case. We will discuss these results and comment on the suitability of these criteria in characterizing performance of the estimator.

Once we have a set of density estimates (characterizing different sources), one natural progression is to use them for the problem of classification. Issues involved in such plug-in methods will be discussed including performance considerations and estimation of Bayes error.

A major issue in density estimation is the selection of the bandwidth (regularization) parameter. Historically, bandwidth was initially chosen according to a squared error criterion. However, for most applications, this is not an appropriate choice. We will dicuss recent developments with other more reasonable criteria such as the L1 or maximum likelihood criterion.

Density estimates also lend themselves to estimating information theoretic qualities such as entropy and mutual information. Among other things, these estimates can be used for classification. In addition to presenting estimators for entropy, we will discuss their performance and the effect of density estimator parameters on the entropy estimate.

A bibliography of some of the papers covered can be found here.



Problems with this site should be emailed to jonesb@mit.edu