|Stochastic Systems Group|
Doctoral Student, AI Lab and CBCL, MIT
In this talk, I will present an importance sampling estimator for partially observable Markov decision processes (POMDPs). The estimator makes no assumptions about the structure of the POMDP and allows for training data to be collected under any sequence of policies. It estimates the return for any untried policy. Furthermore, the policies can be either reactive (memoryless) or finite-state controllers. The talk will conclude with the bias and variance properties of the estimator and its performance on some simulated environments.
Problems with this site should be emailed to email@example.com