Lior Shani
Lior Shani
Verified email at
Cited by
Cited by
Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps
L Shani, Y Efroni, S Mannor
Thirty-Fourth AAAI Conference on Artificial Intelligence, 5668-5675, 2020
Optimistic Policy Optimization with Bandit Feedback
Y Efroni, L Shani, A Rosenberg, S Mannor
Proceedings of the 37th International Conference on Machine Learning 119 …, 2020
Mirror Descent Policy Optimization
M Tomar, L Shani, Y Efroni, M Ghavamzadeh
The Tenth International Conference on Learning Representations, 2020
Factually consistent summarization via reinforcement learning with textual entailment feedback
P Roit, J Ferret, L Shani, R Aharoni, G Cideron, R Dadashi, M Geist, ...
arXiv preprint arXiv:2306.00186, 2023
Online apprenticeship learning
L Shani, T Zahavy, S Mannor
Proceedings of the AAAI conference on artificial intelligence 36 (8), 8240-8248, 2022
Exploration Conscious Reinforcement Learning Revisited
L Shani, Y Efroni, S Mannor
Proceedings of the 36th International Conference on Machine Learning, 5680--5689, 2019
Demystifying embedding spaces using large language models
G Tennenholtz, Y Chow, CW Hsu, J Jeong, L Shani, A Tulepbergenov, ...
arXiv preprint arXiv:2310.04475, 2023
Reinforcement learning with history dependent dynamic contexts
G Tennenholtz, N Merlis, L Shani, M Mladenov, C Boutilier
International Conference on Machine Learning, 34011-34053, 2023
Reinforcement learning with a terminator
G Tennenholtz, N Merlis, L Shani, S Mannor, U Shalit, G Chechik, ...
Advances in Neural Information Processing Systems 35, 35696-35709, 2022
Multi instance learning for unbalanced data
M Kozdoba, E Moroshko, L Shani, T Takagi, T Katoh, S Mannor, ...
arXiv preprint arXiv:1812.07010, 2018
Offline Regularised Reinforcement Learning for Large Language Models Alignment
PH Richemond, Y Tang, D Guo, D Calandriello, MG Azar, R Rafailov, ...
arXiv preprint arXiv:2405.19107, 2024
Embedding-Aligned Language Models
G Tennenholtz, Y Chow, CW Hsu, L Shani, E Liang, C Boutilier
arXiv preprint arXiv:2406.00024, 2024
Multi-turn Reinforcement Learning from Preference Human Feedback
L Shani, A Rosenberg, A Cassel, O Lang, D Calandriello, A Zipori, ...
arXiv preprint arXiv:2405.14655, 2024
The system can't perform the operation now. Try again later.
Articles 1–13