Siyuan Huang
Siyuan Huang
Shanghai AI Lab && SJTU && MMLab CUHK
Verified email at - Homepage
Cited by
Cited by
Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners
R Zhang, X Hu, B Li, S Huang, H Deng, Y Qiao, P Gao, H Li
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models
P Xu, W Shao, K Zhang, P Gao, S Liu, M Lei, F Meng, S Huang, Y Qiao, ...
arXiv preprint arXiv:2306.09265, 2023
Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models
Z Lin, C Liu, R Zhang, P Gao, L Qiu, H Xiao, H Qiu, C Lin, W Shao, ...
arXiv preprint arXiv:2311.07575, 2023
Multi-modal sensor fusion for auto driving perception: A survey
K Huang, B Shi, X Li, X Li, S Huang, Y Li
arXiv preprint arXiv:2202.02703, 2022
Instruct2act: Mapping multi-modality instructions to robotic actions with large language model
S Huang, Z Jiang, H Dong, Y Qiao, P Gao, H Li
arXiv preprint arXiv:2305.11176, 2023
Sphinx-x: Scaling data and parameters for a family of multi-modal large language models
P Gao, R Zhang, C Liu, L Qiu, S Huang, W Lin, S Zhao, S Geng, Z Lin, ...
arXiv preprint arXiv:2402.05935, 2024
Tiny lvlm-ehub: Early multimodal experiments with bard
W Shao, Y Hu, P Gao, M Lei, K Zhang, F Meng, P Xu, S Huang, H Li, ...
arXiv preprint arXiv:2308.03729, 2023
Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill
W Cai, S Huang, G Cheng, Y Long, P Gao, C Sun, H Dong
ICRA2024, 2023
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification
S Huang, B Zhang, B Shi, H Li, Y Li, P Gao
Proceedings of the 31st ACM International Conference on Multimedia, 8644-8652, 2023
Adas: A simple active-and-adaptive baseline for cross-domain 3d semantic segmentation
B Fei, S Huang, J Yuan, B Shi, B Zhang, T Chen, M Dou, Y Qiao
arXiv preprint arXiv: 2212.10390, 2022
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
X Lu, Q Liu, Y Xu, A Zhou, S Huang, B Zhang, J Yan, H Li
arXiv preprint arXiv:2402.14800, 2024
Manipvqa: Injecting robotic affordance and physically grounded information into multi-modal large language models
S Huang, I Ponomarenko, Z Jiang, X Li, X Hu, P Gao, H Li, H Dong
IROS2024, 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
W Lin, X Wei, R An, P Gao, B Zou, Y Luo, S Huang, S Zhang, H Li
arXiv preprint arXiv:2403.20271, 2024
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Q Lu, W Shao, Z Liu, F Meng, B Li, B Chen, S Huang, K Zhang, Y Qiao, ...
arXiv preprint arXiv:2406.08451, 2024
A3VLM: Actionable Articulation-Aware Vision Language Model
S Huang, H Chang, Y Liu, Y Zhu, H Dong, P Gao, A Boularias, H Li
arXiv preprint arXiv:2406.07549, 2024
The system can't perform the operation now. Try again later.
Articles 1–15