In-network aggregation with transport transparency for distributed training S Liu, Q Wang, J Zhang, W Wu, Q Lin, Y Liu, M Xu, M Canini, ... Proceedings of the 28th ACM International Conference on Architectural …, 2023 | 27 | 2023 |
NetReduce: RDMA-compatible in-network reduction for distributed DNN training acceleration S Liu, Q Wang, J Zhang, Q Lin, Y Liu, M Xu, RCC Chueng, J He arXiv preprint arXiv:2009.09736, 2020 | 13 | 2020 |
Scalable fully pipelined hardware architecture for in-network aggregated AllReduce communication Y Liu, J Zhang, S Liu, Q Wang, W Dai, RCC Cheung IEEE Transactions on Circuits and Systems I: Regular Papers 68 (10), 4194-4206, 2021 | 10 | 2021 |
Cepheus: accelerating datacenter applications with high-performance RoCE-capable multicast W Li, J Zhang, Y Liu, G Zeng, Z Wang, C Zeng, P Zhou, Q Wang, K Chen 2024 IEEE International Symposium on High-Performance Computer Architecture …, 2024 | 4 | 2024 |