PerVL-Bench: Benchmarking Multimodal Personalization for Large Vision–Language Models
Abstract
In recent years, personalization, which utilizes user-specific data to generate tailored responses, has been increasingly adopted in user-centric domains. However, while Large Language Models (LLMs) are actively researched, the exploration of the personalization capabilities of Large Vision-Language Models (LVLMs) remains limited. To systematically evaluate the personalization ability of LVLMs, we introduce PerVL-Bench, a synthetic benchmark specifically designed for this purpose. PerVL-Bench incorporates user-specific data, including multiple images and long text information, and provides two types of QA pairs. Furthermore, we use PerVL-Bench to comprehensively evaluate the essential capabilities for personalization in current state-of-the-art LVLMs. Through this evaluation, we reveal the limitations of current models in multimodal personalization and provide insights for the development of personalized LVLMs. We release PerVL-Bench, code to advance future research: {link}