Fast vision transformers with hilo attention Z Pan, J Cai, B Zhuang NeurIPS 2022 (Spotlight), 2022 | 180 | 2022 |
Scalable vision transformers with hierarchical pooling Z Pan, B Zhuang, J Liu, H He, J Cai Proceedings of the IEEE/cvf international conference on computer vision, 377-386, 2021 | 173 | 2021 |
Object-and-action aware model for visual language navigation Y Qi, Z Pan, S Zhang, A van den Hengel, Q Wu European Conference on Computer Vision, 303-317, 2020 | 127 | 2020 |
Less is more: Pay less attention in vision transformers Z Pan, B Zhuang, H He, J Liu, J Cai Proceedings of the AAAI Conference on Artificial Intelligence 36 (2), 2035-2043, 2022 | 98 | 2022 |
The road to know-where: An object-and-room informed sequential bert for indoor vision-language navigation Y Qi, Z Pan, Y Hong, MH Yang, A Van Den Hengel, Q Wu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 91 | 2021 |
A Survey on Efficient Training of Transformers B Zhuang, J Liu, Z Pan, H He, Y Weng, C Shen IJCAI 2023, 2023 | 55 | 2023 |
Pruning self-attentions into convolutional layers in single path H He, J Cai, J Liu, Z Pan, J Zhang, D Tao, B Zhuang IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 46 | 2024 |
An efficient spatio-temporal pyramid transformer for action detection Y Weng, Z Pan, M Han, X Chang, B Zhuang European Conference on Computer Vision, 358-375, 2022 | 36 | 2022 |
Stitchable Neural Networks Z Pan, J Cai, B Zhuang CVPR 2023 (Highlight), 2023 | 31 | 2023 |
Ecoformer: Energy-saving attention with linear complexity J Liu, Z Pan, H He, J Cai, B Zhuang NeurIPS 2022 (Spotlight), 2022 | 30 | 2022 |
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models A Liu, J Liu, Z Pan, Y He, G Haffari, B Zhuang arXiv preprint arXiv:2405.14366, 2024 | 24 | 2024 |
Mesa: A memory-saving training framework for transformers Z Pan, P Chen, H He, J Liu, J Cai, B Zhuang arXiv preprint arXiv:2111.11124, 2021 | 20 | 2021 |
Janus: Decoupling visual encoding for unified multimodal understanding and generation C Wu, X Chen, Z Wu, Y Ma, X Liu, Z Pan, W Liu, Z Xie, X Yu, C Ruan, ... arXiv preprint arXiv:2410.13848, 2024 | 19 | 2024 |
Dynamic Focus-aware Positional Queries for Semantic Segmentation H He, J Cai, Z Pan, J Liu, J Zhang, D Tao, B Zhuang CVPR 2023, 2022 | 18 | 2022 |
T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching Z Pan, B Zhuang, DA Huang, W Nie, Z Yu, C Xiao, J Cai, A Anandkumar arXiv preprint arXiv:2402.14167, 2024 | 13 | 2024 |
Efficient Stitchable Task Adaptation H He, Z Pan, J Liu, J Cai, B Zhuang CVPR 2024, 2023 | 4 | 2023 |
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Z Wu, X Chen, Z Pan, X Liu, W Liu, D Dai, H Gao, Y Ma, C Wu, B Wang, ... arXiv preprint arXiv:2412.10302, 2024 | 2 | 2024 |
Janusflow: Harmonizing autoregression and rectified flow for unified multimodal understanding and generation Y Ma, X Liu, X Chen, W Liu, C Wu, Z Wu, Z Pan, Z Xie, H Zhang, L Zhao, ... arXiv preprint arXiv:2411.07975, 2024 | 2 | 2024 |
DeepSeek-V3 Technical Report A Liu, B Feng, B Xue, B Wang, B Wu, C Lu, C Zhao, C Deng, C Zhang, ... arXiv preprint arXiv:2412.19437, 2024 | 1 | 2024 |
Stitched ViTs are Flexible Vision Backbones Z Pan, J Liu, H He, J Cai, B Zhuang ECCV 2024, 2023 | 1 | 2023 |