Stella Biderman
Cited by
Cited by
Multitask prompted training enables zero-shot task generalization
V Sanh, A Webson, C Raffel, SH Bach, L Sutawika, Z Alyafeai, A Chaffin, ...
The Tenth International Conference on Learning Representations (ICLR), 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
L Gao, S Biderman, S Black, L Golding, T Hoppe, C Foster, J Phang, H He, ...
arXiv preprint arXiv:2101.00027, 2020
Bloom: A 176b-parameter open-access multilingual language model
TL Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ...
Transactions on Machine Learning Research, 2022
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
Transactions of Machine Learning Research (TMLR), 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
S Black, S Biderman, E Hallahan, Q Anthony, L Gao, L Golding, H He, ...
ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022
The Language Model Evaluation Harness
L Gao, J Tow, S Biderman, S Black, A DiPofi, C Foster, L Golding, J Hsu, ...
GitHub Repository, 2021
Pythia: A suite for analyzing large language models across training and scaling
S Biderman, H Schoelkopf, Q Anthony, H Bradley, K O'Brien, E Hallahan, ...
International conference on machine learning (ICML), 2023
GPT-Neo: Large scale autoregressive language modeling with Mesh-TensorFlow
S Black, L Gao, P Wang, C Leahy, S Biderman
GitHub Repository, 2021
Crosslingual generalization through multitask finetuning
N Muennighoff, T Wang, L Sutawika, A Roberts, S Biderman, TL Scao, ...
61st Annual Meeting of the Association for Computational Linguistics, 2023
VQGAN-CLIP: Open domain image generation and editing with natural language guidance
K Crowson, S Biderman, D Kornis, D Stander, E Hallahan, L Castricato, ...
European Conference on Computer Vision (ECCV), 2022
RWKV: Reinventing RNNs for the Transformer Era
B Peng, E Alcaide, Q Anthony, A Albalak, S Arcadinho, H Cao, X Cheng, ...
Findings of the Association for Computational Linguistics: EMNLP, 2023
Quality at a glance: An audit of web-crawled multilingual datasets
J Kreutzer, I Caswell, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ...
Transactions of the Association for Computational Linguistics 10, 50-72, 2022
The bigscience roots corpus: A 1.6 tb composite multilingual dataset
H Laurençon, L Saulnier, T Wang, C Akiki, A Villanova del Moral, ...
Advances in Neural Information Processing Systems 35, 31809-31826, 2022
OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization
G Ahdritz, N Bouatta, C Floristean, S Kadyan, Q Xia, W Gerecke, ...
Nature Methods, 1-11, 2024
Llemma: An open language model for mathematics
Z Azerbayev, H Schoelkopf, K Paster, MD Santos, S McAleer, AQ Jiang, ...
NeurIPS Workshop on Math and AI, 2023
trlX: A framework for large scale reinforcement learning from human feedback
A Havrilla, M Zhuravinskyi, D Phung, A Tiwari, J Tow, S Biderman, ...
Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023
The Annotated Transformer
S Rush, A Huang, S Subramanian, J Sum, K Almubarak, S Biderman
Workshop for NLP open source software (NLP-OSS), 2022
Emergent and predictable memorization in large language models
S Biderman, US Prashanth, L Sutawika, H Schoelkopf, Q Anthony, ...
Advances in Neural Information Processing Systems, 2023
What Language Model to Train if You Have One Million GPU Hours?
T Le Scao, T Wang, D Hesslow, L Saulnier, S Bekman, MS Bari, ...
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2022
Eliciting latent predictions from transformers with the tuned lens
N Belrose, Z Furman, L Smith, D Halawi, I Ostrovsky, L McKinney, ...
arXiv preprint arXiv:2303.08112, 2023
The system can't perform the operation now. Try again later.
Articles 1–20