Qingyu Zhang
Qingyu Zhang
Home
News
Publications
Contact
Light
Dark
Automatic
English
English
中文 (简体)
Large Language Models
Inference Acceleration for the 70B LLaMA-2 Large Language Model
As a coaching assistant for the ASC24 Student Supercomputer Challenge, I helped the team optimize the inference performance of the LLaMA-2-70B model using efficient frameworks like vLLM and designed data parallelism strategies to significantly reduce latency.
Inference Acceleration for the 70B LLaMA-2 Large Language Model
As a coaching assistant for the ASC24 Student Supercomputer Challenge, I helped the team optimize the inference performance of the LLaMA-2-70B model using efficient frameworks like vLLM and designed data parallelism strategies to significantly reduce latency.
Last updated on Aug 5, 2025
Training the 10-Billion Parameter Yuan-1.0 LLM
As a key member of the ASC23 team, I trained a 10B-level large language model using the DeepSpeed-Megatron framework, combining tensor, pipeline, and data parallelism. Our work won the
First Prize
.
Training the 10-Billion Parameter Yuan-1.0 LLM
As a key member of the ASC23 team, I trained a 10B-level large language model using the DeepSpeed-Megatron framework, combining tensor, pipeline, and data parallelism. Our work won the
First Prize
.
Last updated on Aug 5, 2025
Cite
×