FujiView: Multimodal Late-Fusion for Predicting Scenic Visibility
Bryce Bible · Shah Hasnaeen · Hairong Qi
Abstract
Visibility of natural landmarks such as Mount Fuji is a defining factor in both tourism planning and visitor experience, yet it remains difficult to predict due to rapidly changing atmospheric conditions. We present **FujiView**, a multimodal learning framework and dataset for predicting scenic visibility by fusing webcam imagery with structured meteorological data. Our late-fusion approach combines image-derived class probabilities with numerical weather features to classify visibility into five categories. The dataset currently comprises over 100,000 webcam images paired with concurrent and forecasted weather conditions from more than 40 cameras around Mount Fuji, and continues to expand; it will be released to support further research in environmental forecasting. Experiments show that YOLO-based vision features dominate short-term horizons such as "nowcasting'' and "samedaycasting'', while weather-driven forecasts increasingly take over as the primary predictive signal beyond $+1$d. Late fusion consistently yields the highest overall accuracy, achieving $\mathrm{ACC} \approx 0.89$ for same-day prediction and up to **84%** for next-day forecasts. These results position **Scenic Visibility Forecasting (SVF)** as a new benchmark task for multimodal learning.
Successful Page Load