Authors: Enkhbold Nyamsuren, Haiqi Xu, Eric J. Top, Simon Scheider, Niels Steenbergen
Abstract: Obtained from OpenAlex
Abstract. There is an increasing trend of applying AIbased automated methods to geoscience problems. An important example is a geographic question answering (geoQA) focused on answer generation via GIS workflows rather than retrieval of a factual answer. However, a representative question corpus is necessary for developing, testing, and validating such generative geoQA systems. We compare five manually constructed geographical question corpora, GeoAnQu, Giki, GeoCLEF, GeoQuestions201, and Geoquery, by applying a conceptual transformation parser. The parser infers geo-analytical concepts and their transformations from a geographical question, akin to an abstract GIS workflow. Transformations thus represent the complexity of geo-analytical operations necessary to answer a question. By estimating the variety of concepts and the number of transformations for each corpus, the five corpora can be compared on the level of geo-analytical complexity, which cannot be done with purely NLP-based methods. Results indicate that the questions in GeoAnQu, which were compiled from GIS literature, require a higher number as well as more diverse geo-analytical operations than questions from the four other corpora. Furthermore, constructing a corpus with a sufficient representation (including GIS) may require an approach targeting a uniquely qualified group of users as a source. In contrast, sampling questions from large-scale online repositories like Google, Microsoft, and Yahoo may not provide the quality necessary for testing generative geoQA systems.
Certificate identifier: 2023-006
Codechecker names: Philipp A. Friese, Jakub Krukar
Time of check: 2023-06-13 12:00:00
Repository: https://osf.io/d2shf
Full certificate: https://doi.org/10.17605/OSF.IO/D2SHF
Type: conference
Venue: AGILEGIS
Summary:
The data and software of the paper under reproduction is published on GitHub under an MIT license. All figures, tables, and embedded data points have been reproduced. The authors showed dedication and concern to support reproducibility of their work. Reproduction was successful.
Cite this certificate: Citation metadata retrieved from data.crosscite.org