Importance Sampling Estimate for POMDPs

Christian Shelton
Doctoral Student, AI Lab and CBCL, MIT

In this talk, I will present an importance sampling estimator for partially observable Markov decision processes (POMDPs). The estimator makes no assumptions about the structure of the POMDP and allows for training data to be collected under any sequence of policies. It estimates the return for any untried policy. Furthermore, the policies can be either reactive (memoryless) or finite-state controllers. The talk will conclude with the bias and variance properties of the estimator and its performance on some simulated environments.

