Zero-Shot Table Extraction in Business Documents: A Unified Benchmark with Error Taxonomy and Ecological Analysis
Eliott THOMAS · Mickael Coustaty · AurĂ©lie JOSEPH · Tri-Cong Pham · Gaspar DELOIN · Elodie CAREL · Vincent d'Andecy · Jean-marc Ogier
Abstract
Tables in business documents power analytics and compliance, yet task-specific datasets are costly to build. Practitioners therefore turn to zero-shot vision--language models (VLMs). We study zero-shot realism for table detection (TD) and table structure recognition (TSR) under a unified protocol on DocILE-QUEST and a private STM154 corpus. We report TD with GIoU, Purity, and Completeness, and TSR with TEDS and TEDS-S, evaluating commercial VLMs (GPT-4o, GPT-5-mini), compact detectors, and supervised YOLO/DETR baselines. Zero-shot VLMs are strong for TSR and competitive for TD, while fine-tuned or from-scratch detectors lead when box quality and robustness to clutter matter. We add an automated error taxonomy that isolates actionable failures (missed, merged/split tables, header--body confusions, cell topology). Finally, we quantify emissions, finding a $10^4$ gap between the lightest and heaviest systems.
Successful Page Load