Follow
Sun Peng
Sun Peng
Shanghai Artificial Intelligence Laboratory
Verified email at pjlab.org.cn - Homepage
Title
Cited by
Cited by
Year
InternLM2 Technical Report
Z Cai, M Cao, H Chen, K Chen, K Chen, X Chen, X Chen, Z Chen, Z Chen, ...
arXiv preprint arXiv:2403.17297, 2024
1972024
Characterization and prediction of deep learning workloads in large-scale GPU datacenters
Q Hu, P Sun, S Yan, Y Wen, T Zhang
Proceedings of the International Conference for High Performance Computing …, 2021
1352021
Gradientflow: Optimizing network performance for large-scale distributed dnn training
P Sun, Y Wen, R Han, W Feng, S Yan
IEEE Transactions on Big Data 8 (2), 495-507, 2019
115*2019
A chunk caching location and searching scheme in content centric networking
Y Li, T Lin, H Tang, P Sun
2012 IEEE International Conference on Communications (ICC), 2655-2659, 2012
932012
Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output
P Zhang, X Dong, Y Zang, Y Cao, R Qian, L Chen, Q Guo, H Duan, ...
arXiv preprint arXiv:2407.03320, 2024
612024
Deep Learning Workload Scheduling in GPU Datacenters: A Survey
Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo, T Zhang, Y Wen
ACM Computing Surveys 56 (6), 1-38, 2024
53*2024
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
P Sun, Y Wen, NBD Ta, S Yan
2017 IEEE International Conference on Smart Computing (SMARTCOMP), 1-6, 2017
502017
Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs
W Gao, Z Ye, P Sun, Y Wen, T Zhang
Proceedings of the ACM Symposium on Cloud Computing, 609-623, 2021
412021
Characterization of large language model development in the datacenter
Q Hu, Z Ye, Z Wang, G Wang, M Zhang, Q Chen, P Sun, D Lin, X Wang, ...
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024
282024
Lucid: A Non-intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs
Q Hu, M Zhang, P Sun, Y Wen, T Zhang
Proceedings of the 28th ACM International Conference on Architectural …, 2023
252023
Timed dataflow: Reducing communication overhead for distributed machine learning systems
P Sun, Y Wen, TNB Duong, S Yan
2016 IEEE 22nd International Conference on Parallel and Distributed Systems …, 2016
202016
Elan: Towards Generic and Efficient Elastic Training for Deep Learning
L Xie, J Zhai, B Wu, Y Wang, X Zhang, P Sun, S Yan
2020 IEEE 40th International Conference on Distributed Computing Systems …, 2020
192020
Cloud3DView: An interactive tool for cloud data center operations
J Yin, P Sun, Y Wen, H Gong, M Liu, X Li, H You, J Gao, C Lin
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, 499-500, 2013
192013
Loongserve: Efficiently serving long-context large language models with elastic sequence parallelism
B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles …, 2024
172024
Astraea: A fair deep learning scheduler for multi-tenant gpu clusters
Z Ye, P Sun, W Gao, T Zhang, X Wang, S Yan, Y Luo
IEEE Transactions on Parallel and Distributed Systems 33 (11), 2781-2793, 2021
162021
GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine
P Sun, Y Wen, TNB Duong, X Xiao
2017 IEEE 23rd International Conference on Parallel and Distributed Systems …, 2017
152017
Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning
C Chen, X Li, Q Zhu, J Duan, P Sun, X Zhang, C Yang
Proceedings of the 29th ACM International Conference on Architectural …, 2024
132024
Graphh: High performance big graph analytics in small clusters
P Sun, Y Wen, TNB Duong, X Xiao
2017 IEEE International Conference on Cluster Computing (CLUSTER), 256-266, 2017
132017
ModelCI-e: Enabling Continual Learning in Deep Learning Serving Systems
Y Huang, H Zhang, Y Wen, P Sun, NBD TA
arXiv preprint arXiv:2106.03122, 2021
122021
{dLoRA}: Dynamically Orchestrating Requests and Adapters for {LoRA}{LLM} Serving
B Wu, R Zhu, Z Zhang, P Sun, X Liu, X Jin
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024
112024
The system can't perform the operation now. Try again later.
Articles 1–20