Ian Osband
Ian Osband
DeepMind
Verified email at google.com - Homepage
TitleCited byYear
Deep exploration via bootstrapped DQN
I Osband, C Blundell, A Pritzel, B Van Roy
Advances in neural information processing systems, 4026-4034, 2016
3662016
Deep q-learning from demonstrations
T Hester, M Vecerik, O Pietquin, M Lanctot, T Schaul, B Piot, D Horgan, ...
Thirty-Second AAAI Conference on Artificial Intelligence, 2018
2502018
Noisy networks for exploration
M Fortunato, MG Azar, B Piot, J Menick, I Osband, A Graves, V Mnih, ...
arXiv preprint arXiv:1706.10295, 2017
1942017
A tutorial on thompson sampling
DJ Russo, B Van Roy, A Kazerouni, I Osband, Z Wen
Foundations and Trends® in Machine Learning 11 (1), 1-96, 2018
1352018
Generalization and exploration via randomized value functions
I Osband, B Van Roy, Z Wen
arXiv preprint arXiv:1402.0635, 2014
1032014
Minimax regret bounds for reinforcement learning
MG Azar, I Osband, R Munos
Proceedings of the 34th International Conference on Machine Learning-Volume …, 2017
922017
Deep learning for time series modeling
E Busseti, I Osband, S Wong
Technical report, Stanford University, 1-5, 2012
812012
Why is posterior sampling better than optimism for reinforcement learning?
I Osband, B Van Roy
Proceedings of the 34th International Conference on Machine Learning-Volume …, 2017
612017
Deep Exploration via Randomized Value Functions
I Osband
https://searchworks.stanford.edu/view/11891201, 2016
522016
Randomized prior functions for deep reinforcement learning
I Osband, J Aslanides, A Cassirer
Advances in Neural Information Processing Systems, 8617-8629, 2018
442018
Near-optimal reinforcement learning in factored MDPs
I Osband, B Van Roy
Advances in Neural Information Processing Systems, 604-612, 2014
432014
Model-based reinforcement learning and the eluder dimension
I Osband, B Van Roy
Advances in Neural Information Processing Systems, 1466-1474, 2014
402014
The uncertainty bellman equation and exploration
B O'Donoghue, I Osband, R Munos, V Mnih
arXiv preprint arXiv:1709.05380, 2017
392017
Bootstrapped thompson sampling and deep exploration
I Osband, B Van Roy
arXiv preprint arXiv:1507.00300, 2015
302015
(More) efficient reinforcement learning via posterior sampling
I Osband, D Russo, B Van Roy
Advances in Neural Information Processing Systems, 3003-3011, 2013
262013
On lower bounds for regret in reinforcement learning
I Osband, B Van Roy
arXiv preprint arXiv:1608.02732, 2016
222016
Risk versus Uncertainty in Deep Learning: Bayes, Bootstrap and the Dangers of Dropout
I Osband
http://bayesiandeeplearning.org/papers/BDL_4.pdf, 0
22*
Posterior sampling for reinforcement learning without episodes
I Osband, B Van Roy
arXiv preprint arXiv:1608.02731, 2016
142016
On optimistic versus randomized exploration in reinforcement learning
I Osband, B Van Roy
arXiv preprint arXiv:1706.04241, 2017
82017
Scalable coordinated exploration in concurrent reinforcement learning
M Dimakopoulou, I Osband, B Van Roy
Advances in Neural Information Processing Systems, 4219-4227, 2018
52018
The system can't perform the operation now. Try again later.
Articles 1–20