VLLM

Inference Acceleration for the 70B LLaMA-2 Large Language Model

As a coaching assistant for the ASC24 Student Supercomputer Challenge, I helped the team optimize the inference performance of the LLaMA-2-70B model using efficient frameworks like vLLM and designed data parallelism strategies to significantly reduce latency.