Follow
Hannah Rose Kirk
Title
Cited by
Cited by
Year
Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models
HR Kirk, Y Jun, F Volpin, H Iqbal, E Benussi, F Dreyer, A Shtedritski, ...
Advances in neural information processing systems 34, 2611-2624, 2021
1952021
Auditing large language models: a three-layered approach
J Mökander, J Schuett, HR Kirk, L Floridi
AI and Ethics, 1-31, 2023
1832023
The benefits, risks and bounds of personalizing the alignment of large language models to individuals
HR Kirk, B Vidgen, P Röttger, SA Hale
Nature Machine Intelligence, 1-10, 2024
138*2024
Dataperf: Benchmarks for data-centric ai development
M Mazumder, C Banbury, X Yao, B Karlaš, W Gaviria Rojas, S Diamos, ...
Advances in Neural Information Processing Systems 36, 2024
1192024
Semeval-2023 task 10: Explainable detection of online sexism
HR Kirk, W Yin, B Vidgen, P Röttger
arXiv preprint arXiv:2303.04222, 2023
1182023
A prompt array keeps the bias away: Debiasing vision-language models with adversarial learning
H Berg, SM Hall, Y Bhalgat, W Yang, HR Kirk, A Shtedritski, M Bain
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the …, 2022
962022
Xstest: A test suite for identifying exaggerated safety behaviours in large language models
P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy
arXiv preprint arXiv:2308.01263, 2023
872023
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale
Proceedings of the 2022 Conference of the North American Chapter of the …, 2021
562021
Handling and Presenting Harmful Text in NLP
HR Kirk, A Birhane, B Vidgen, L Derczynski
EMNLP Findings, 2022
43*2022
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
C Borchers, DS Gala, B Gilburt, E Oravkin, W Bounsi, YM Asano, HR Kirk
Proceedings of the 4th workshop on gender bias in natural language …, 2022
392022
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
HR Kirk, A Whitefield, P Röttger, A Bean, K Margatina, J Ciro, R Mosquera, ...
arXiv preprint arXiv:2404.16019, 2024
322024
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset
HR Kirk, Y Jun, P Rauba, G Wachtel, R Li, X Bai, N Broestl, M Doff-Sotta, ...
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 2021
322021
The past, present and better future of feedback learning in large language models for subjective human preferences and values
HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale
arXiv preprint arXiv:2310.07629, 2023
292023
Assessing language model deployment with risk cards
L Derczynski, HR Kirk, V Balachandran, S Kumar, Y Tsvetkov, MR Leiser, ...
arXiv preprint arXiv:2303.18190, 2023
282023
Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models
P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy
arXiv preprint arXiv:2402.16786, 2024
272024
Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution
SM Hall, F Gonçalves Abrantes, H Zhu, G Sodunke, A Shtedritski, HR Kirk
Advances in Neural Information Processing Systems 36, 2024
212024
Casteist but not racist? quantifying disparities in large language model bias between india and the west
K Khandelwal, M Tonneau, AM Bean, HR Kirk, SA Hale
arXiv preprint arXiv:2309.08573, 2023
202023
Introducing v0. 5 of the ai safety benchmark from mlcommons
B Vidgen, A Agrawal, AM Ahmed, V Akinwande, N Al-Nuaimi, N Alfaraj, ...
arXiv preprint arXiv:2404.12241, 2024
192024
Simplesafetytests: a test suite for identifying critical safety risks in large language models
B Vidgen, N Scherrer, HR Kirk, R Qian, A Kannappan, SA Hale, P Röttger
arXiv preprint arXiv:2311.08370, 2023
192023
Balancing the picture: Debiasing vision-language datasets with synthetic contrast sets
B Smith, M Farinha, SM Hall, HR Kirk, A Shtedritski, M Bain
arXiv preprint arXiv:2305.15407, 2023
182023
The system can't perform the operation now. Try again later.
Articles 1–20