Model Efficiency

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers

We propose ShortV, a training-free method that reduces computational costs of MLLMs by freezing visual tokens in ineffective layers.

Qianhao Yuan, Qingyu Zhang, Yanjiang Liu, Jiawei Chen, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers