Paper details

Title: An Efficient System for Automatic Map Storytelling: A Case Study on Historical Maps

Authors: Ziyi Liu, Claudio Affolter, Sidi Wu, Yizi Chen, Lorenz Hurni

Abstract: Obtained from CrossRef

Abstract. Historical maps provide valuable information and knowledge about the past. However, as they often feature non-standard projections, hand-drawn styles, and artistic elements, it is challenging for non-experts to identify and interpret them. While existing image captioning methods have achieved remarkable success on natural images, their performance on maps is suboptimal as maps are underrepresented in their pre-training process. Despite the recent advance of vision-enabled GPT models in text recognition and map captioning, they still have a limited understanding of maps, as their performance wanes when texts (e.g., titles and legends) in maps are missing or inaccurate. Besides, it is inefficient or even impractical to fine-tune these models with users’ own datasets. To address these problems, we propose a novel and lightweight map-captioning counterpart. Specifically, we fine-tune the state-of-the-art vision-language model CLIP to generate captions relevant to historical maps and enrich the captions with GPT models to tell a brief story regarding where, what, when and why of a given map. We propose a novel decision tree architecture to only generate captions relevant to the specified map type. Our system shows invariance to text alterations in maps. The system can be easily adapted and extended to other map types and scaled to a larger map captioning system.

CODECHECK details

Certificate identifier: 2025-011

Codechecker name: Sophie Teichmann

Time of check: 2025-06-12 12:00:00

Repository: https://osf.io/GT5BW

Full certificate: https://doi.org/10.17605/OSF.IO/GT5BW

Summary:

The main challenge during this reproduction is the combination of a closed source model (Chat GPT) and available models (like CLIP). Using the models provided by the authors, the results were reproducible (Recreated Table 2). Using Ubuntu the results were partly not recreatable on Ubuntu. On Windows the training was successful. Thus, there is a system dependence of the reproducibility when retraining the models. Another challenge was the volume of models and corresponding inference. Thus I only chose a subset of models to retrain and perform inference on. A part of the inference in the paper was done manually in the paper, which I was not able to recreate. I was able to fully recreate Table1, Table 3, Table 5, Table 6 and Table 7 from the original paper. I would count the overall reproduction as a success.


https://codecheck.org.uk/ | GitHub codecheckers

© team@cdchck.science | Published under CC BY-SA 4.0

DOI of Zenodo Deposit

CODECHECK is a process for independent execution of computations underlying scholarly research articles.