How do transformers learn in-context beyond simple functions? a case study on learning with representations T Guo, W Hu, S Mei, H Wang, C Xiong, S Savarese, Y Bai arXiv preprint arXiv:2310.10616, 2023 | 42 | 2023 |
What can a single attention layer learn? a study through the random features lens H Fu, T Guo, Y Bai, S Mei Advances in Neural Information Processing Systems 36, 2024 | 24 | 2024 |
How Do Transformers Learn In-Context Beyond Simple Functions T Guo, W Hu, S Mei, H Wang, C Xiong, S Savarese, Y Bai A Case Study on Learning with Representations, 2023 | 5 | 2023 |
Posterior predictive propensity scores and p-values P Ding, T Guo Observational Studies 9 (1), 3-18, 2023 | 3 | 2023 |
Active-dormant attention heads: Mechanistically demystifying extreme-token phenomena in llms T Guo, D Pai, Y Bai, J Jiao, MI Jordan, S Mei arXiv preprint arXiv:2410.13835, 2024 | 1 | 2024 |
Collaborative Heterogeneous Causal Inference Beyond Meta-analysis T Guo, SP Karimireddy, MI Jordan arXiv preprint arXiv:2404.15746, 2024 | 1 | 2024 |