SeaClips: A Video Dataset for Maritime Object Detection
Abstract
Maritime computer vision is a requirement for autonomous surface vehicles and can improve maritime safety if a high level of robustness is achieved. As deep learning dominates the computer vision community, domain-specific datasets are required to obtain well-generalizing and reliable models. However, maritime datasets, especially those containing videos and temporally dense annotations, are still small compared to other domains, such as autonomous driving or generic computer vision datasets. This paper introduces MaritimeClips, a new maritime video dataset containing 74 videos with an average duration of 14 seconds and with 31k frames in total. Videos were recorded under varying conditions, with three cameras mounted on shore and on boats. MaritimeClips provides frame-by-frame annotations, encompassing 129k bounding boxes of seven categories, containing vessel and non-vessel classes. MaritimeClips contributes to a broader coverage of maritime scenarios and, ultimately, more robust computer vision models. Baseline results on the dataset are established by evaluating six image-based models and three models using temporal context, ranging from lightweight YOLO-based to heavy transformer architectures. It is found that the different scales and shapes at which objects appear in MaritimeClips pose a challenge to state-of-the-art detectors. MaritimeClips is made accessible for research on maritime obstacle detection upon paper acceptance.