Pre-Training Helps When Capacity Allows: Evidence from Ultra-Small ConvNets
Srikanth Muralidharan · Heitor Medeiros · Masih Aminbeidokhti · Eric Granger · Marco Pedersoli
Abstract
Robust visual recognition on embedded platforms requires models that both generalize out‑of‑distribution (OOD) and fit into tiny compute/memory budgets. While pre‑training is a standard route to robustness for mid/large backbones, its value in the ultra‑small regime remains unclear. We present a capacity‑aware study of pre‑training for two efficient ConvNet families (EfficientNet and MobileNetV3) scaled from “small” to “ultra‑small” via a simple, reproducible recipe. We compare three initializations—COCO detection pre‑training, ImageNet classification pre‑training, and training from scratch—on two axes of distribution shift: (i) cross‑dataset RGB$\rightarrow$RGB transfer between LLVIP and FLIR (ii) cross‑modality detection where models are fine‑tuned on RGB and evaluated on infrared (IR). A complementary classification study on DomainNet probes whether the trends extend beyond detection. Across settings, we find that pre‑training’s benefit is conditional on both backbone capacity and shift difficulty. Task‑aligned COCO detection pre‑training is the most reliable starting point at moderate sizes and for the easier transfer direction. In the low‑capacity regimes, differences are typically within run‑to‑run variation, and training from scratch can match or surpass pre‑training. Classification mirrors this capacity gating. Our results test the premise "pre‑training always helps" and instead quantify when task‑aligned pre‑training pays off for ultra‑small backbones and when it likely does not\footnote{The code will be available online after acceptance.}.
Successful Page Load