Inference Acceleration for the 70B LLaMA-2 Large Language ModelLast updated on Aug 4, 2025Large Language Models Inference Acceleration VLLM ASC24Qingyu ZhangMaster Student of Computer Science and TechnologyI work on AI sales and customer-service agents, with experience across LLM pretraining, post-training, evaluation, and model efficiency.