ScoreNet: Netting Lightweight Quality Scores for Better Visual Assessment with Large Multi-Modality Models
Abstract
The advancement of general large multi-modal models (LMMs) has transformed many computer vision tasks, shifting image quality assessment (IQA) from specialized algorithms to models built on pre-trained LMM backbones. This evolution raises the question of whether dedicated IQA metrics remain relevant or are becoming obsolete in the age of LMMs. In this paper, we address this challenge by introducing ScoreNet, a novel framework that fuses the strengths of traditional metrics to elevate the IQA capabilities of LMMs. ScoreNet employs a soft prompting mechanism, learning prompts from a curated set of lightweight IQA scores and image embeddings. This context-driven learning strategy enhances the adaptability of LMMs for IQA tasks with a small additional computation cost. We show that ScoreNet serves as a general-purpose extension applicable to modern LMM-based IQA models. We integrate ScoreNet into two high-performing methods—CLIP-IQA and Q-Align—and observe consistent improvements. Experimental results show that ScoreNet not only boosts both models but also surpasses other state-of-the-art IQA approaches. Source code for ScoreNet will be released.