CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
Abstract
Multi-shot generation requires preserving the identity of characters and settings across frames. Cinematic scene composition goes beyond standard multi-shot generation, introducing additional challenges such as expressing complex interactions among multiple characters and visual effects to convey creative narratives—challenges existing datasets cannot fully address. We present CineVerse a large-scale dataset of diverse movie scenes labeled with shot-level annotations tailored for filmmaking. CineVerse includes refined scene descriptions, shot-type information, and newly extracted shot, character, setting descriptions. We validate our dataset by developing a baseline framework that first generates a scene plan containing detailed information for the overall scene and each individual shot, then produces a set of coherent keyframes. Our results show significant improvements in controlling and synthesizing cinematic content through the added context provided by CineVerse.