Distributed Training

Training the 10-Billion Parameter Yuan-1.0 LLM

As a key member of the ASC23 team, I trained a 10B-level large language model using the DeepSpeed-Megatron framework, combining tensor, pipeline, and data parallelism. Our work won the First Prize.