Improved training of end-to-end attention models for speech recognition A Zeyer, K Irie, R Schlüter, H Ney Interspeech 2018, 2018 | 309 | 2018 |

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention--w/o Data Augmentation C Lüscher, E Beck, K Irie, M Kitza, W Michel, A Zeyer, R Schlüter, H Ney Interspeech 2019, 2019 | 303 | 2019 |

A Comparison of Transformer and LSTM Encoder Decoder Models for ASR A Zeyer, P Bahar, K Irie, R Schlüter, H Ney ASRU 2019, 2019 | 261 | 2019 |

Linear transformers are secretly fast weight programmers I Schlag*, K Irie*, J Schmidhuber ICML 2021, 2021 | 232* | 2021 |

Language modeling with deep transformers K Irie, A Zeyer, R Schlüter, H Ney Interspeech 2019, 2019 | 216 | 2019 |

Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... Preprint arXiv:1902.08295, 2019 | 212 | 2019 |

LSTM, GRU, highway and a bit of attention: an empirical overview for language modeling in speech recognition K Irie, Z Tuske, T Alkhouli, R Schluter, H Ney Interspeech 2016, 2016 | 127 | 2016 |

The devil is in the detail: Simple tricks improve systematic generalization of transformers R Csordás, K Irie, J Schmidhuber EMNLP 2021, 2021 | 125 | 2021 |

On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition K Irie, R Prabhavalkar, A Kannan, A Bruguier, D Rybach, P Nguyen Interspeech 2019, 2019 | 75* | 2019 |

Going beyond linear transformers with recurrent fast weight programmers K Irie*, I Schlag*, R Csordás, J Schmidhuber NeurIPS 2021, 2021 | 67 | 2021 |

Mindstorms in Natural Language-Based Societies of Mind M Zhuge*, H Liu*, F Faccio*, DR Ashley*, R Csordás, A Gopalakrishnan, ... NeurIPS 2023 Workshop on Robustness of Few-shot and Zero-shot Learning in …, 2023 | 59 | 2023 |

The Neural Data Router: Adaptive control flow in Transformers improves systematic generalization R Csordás, K Irie, J Schmidhuber ICLR 2022, 2021 | 56 | 2021 |

The RWTH/UPB/FORTH system combination for the 4th CHiME challenge evaluation T Menne, J Heymann, A Alexandridis, K Irie, A Zeyer, M Kitza, P Golik, ... CHiME 2016, 2016 | 53 | 2016 |

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment W Zhou, W Michel, K Irie, M Kitza, R Schlüter, H Ney ICASSP 2020, 2020 | 51 | 2020 |

Training language models for long-span cross-sentence evaluation K Irie, A Zeyer, R Schlüter, H Ney ASRU 2019, 2019 | 49 | 2019 |

RADMM: Recurrent Adaptive Mixture Model with Applications to Domain Robust Language Modeling K Irie, S Kumar, M Nirschl, H Liao ICASSP 2018, 2018 | 40 | 2018 |

A Modern Self-Referential Weight Matrix That Learns to Modify Itself K Irie, I Schlag, R Csordás, J Schmidhuber ICML 2022, 2022 | 37 | 2022 |

The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention K Irie*, R Csordás*, J Schmidhuber ICML 2022, 2022 | 33 | 2022 |

On efficient training of word classes and their application to recurrent neural network language models R Botros, K Irie, M Sundermeyer, H Ney Interspeech 2016, 2015 | 22 | 2015 |

Investigation on log-linear interpolation of multi-domain neural network language model Z Tüske, K Irie, R Schlüter, H Ney ICASSP 2016, 2016 | 21 | 2016 |