| Emergence of locomotion behaviours in rich environments N Heess, D TB, S Sriram, J Lemmon, J Merel, G Wayne, Y Tassa, T Erez, ... arXiv preprint arXiv:1707.02286, 2017 | 930 | 2017 |
| Learning to reinforcement learn JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer, JZ Leibo, R Munos, ... arXiv preprint arXiv:1611.05763, 2016 | 847 | 2016 |
| Prefrontal cortex as a meta-reinforcement learning system JX Wang, Z Kurth-Nelson, D Kumaran, D Tirumala, H Soyer, JZ Leibo, ... Nature neuroscience 21 (6), 860-868, 2018 | 500 | 2018 |
| Distributed distributional deterministic policy gradients G Barth-Maron, MW Hoffman, D Budden, W Dabney, D Horgan, D Tb, ... arXiv preprint arXiv:1804.08617, 2018 | 484 | 2018 |
| Learning human behaviors from motion capture by adversarial imitation J Merel, Y Tassa, D TB, S Srinivasan, J Lemmon, Z Wang, G Wayne, ... arXiv preprint arXiv:1707.02201, 2017 | 201 | 2017 |
| Hierarchical visuomotor control of humanoids J Merel, A Ahuja, V Pham, S Tunyasuvunakool, S Liu, D Tirumala, ... arXiv preprint arXiv:1811.09656, 2018 | 102 | 2018 |
| Information asymmetry in KL-regularized RL A Galashov, SM Jayakumar, L Hasenclever, D Tirumala, J Schwarz, ... arXiv preprint arXiv:1905.01240, 2019 | 89 | 2019 |
| V-mpo: On-policy maximum a posteriori policy optimization for discrete and continuous control HF Song, A Abdolmaleki, JT Springenberg, A Clark, H Soyer, JW Rae, ... arXiv preprint arXiv:1909.12238, 2019 | 80 | 2019 |
| Exploiting hierarchy for learning and transfer in kl-regularized rl D Tirumala, H Noh, A Galashov, L Hasenclever, A Ahuja, G Wayne, ... arXiv preprint arXiv:1903.07438, 2019 | 36 | 2019 |
| Data-efficient hindsight off-policy option learning M Wulfmeier, D Rao, R Hafner, T Lampe, A Abdolmaleki, T Hertweck, ... International Conference on Machine Learning, 11340-11350, 2021 | 29 | 2021 |
| Learning to reinforcement learn. ArXiv 1611.05763 JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer, JZ Leibo, R Munos, ... | 28 | 2017 |
| Probing physics knowledge using tools from developmental psychology L Piloto, A Weinstein, D TB, A Ahuja, M Mirza, G Wayne, D Amos, C Hung, ... arXiv preprint arXiv:1804.01128, 2018 | 25 | 2018 |
| Behavior priors for efficient reinforcement learning D Tirumala, A Galashov, H Noh, L Hasenclever, R Pascanu, J Schwarz, ... The Journal of Machine Learning Research 23 (1), 9989-10056, 2022 | 24 | 2022 |
| Pick your battles: Interaction graphs as population-level objectives for strategic diversity M Garnelo, WM Czarnecki, S Liu, D Tirumala, J Oh, G Gidel, ... arXiv preprint arXiv:2110.04041, 2021 | 17 | 2021 |
| Learning transferable motor skills with hierarchical latent mixture policies D Rao, F Sadeghi, L Hasenclever, M Wulfmeier, M Zambelli, G Vezzani, ... arXiv preprint arXiv:2112.05062, 2021 | 12 | 2021 |
| Learning to reinforcement learn (2016) JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer, JZ Leibo, R Munos, ... arXiv preprint arXiv:1611.05763, 2016 | 7 | 2016 |
| Meta-reinforcement learning: a bridge between prefrontal and dopaminergic function JX Wang, Z Kurth-Nelson, D Tirumala, J Leibo, H Soyer, D Kumaran, ... Cosyne abstracts, 2017 | 4 | 2017 |
| Mo2: Model-based offline options S Salter, M Wulfmeier, D Tirumala, N Heess, M Riedmiller, R Hadsell, ... Conference on Lifelong Learning Agents, 902-919, 2022 | 3 | 2022 |
| Flexible support for fast parallel commutative updates V Balaji, D Tirumala, B Lucia arXiv preprint arXiv:1709.09491, 2017 | 3 | 2017 |
| Learning agile soccer skills for a bipedal robot with deep reinforcement learning T Haarnoja, B Moran, G Lever, SH Huang, D Tirumala, M Wulfmeier, ... arXiv preprint arXiv:2304.13653, 2023 | 2 | 2023 |