Simultaneous estimation of rewards and dynamics from noisy expert demonstrations
Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, Wolfram Burgard
European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2016, Bruges, Belgium, April 27 – 29, 2016
Abstract:
Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from demonstrations of an expert. Current approaches typically require the system dynamics to be known or additional demonstrations of state transitions to be available to solve the inverse problem accurately. If these assumptions are not satisfied, heuristics can be used to compensate the lack of a model of the system dynamics. However, heuristics can add bias to the solution. To overcome this, we present a gradient-based approach, which simultaneously estimates rewards, dynamics, and the parameterizable stochastic policy of an expert from demonstrations, while the stochastic policy is a function of optimal Q-values.
@INPROCEEDINGS{Herman2016ESANN, author={Michael Herman and Tobias Gindele and Jörg Wagner and Felix Schmitt and Wolfram Burgard}, booktitle={24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)}, title={Simultaneous estimation of rewards and dynamics from noisy expert demonstrations}, year={2016}, pages={677-682}, month={April}, }