Paper details

Title: Semantic complexity of geographic questions - A comparison in terms of conceptual transformations of answers

Authors: Enkhbold Nyamsuren, Haiqi Xu, Eric J. Top, Simon Scheider, Niels Steenbergen

Abstract: Obtained from CrossRef

Abstract. There is an increasing trend of applying AIbased automated methods to geoscience problems. An important example is a geographic question answering (geoQA) focused on answer generation via GIS workflows rather than retrieval of a factual answer. However, a representative question corpus is necessary for developing, testing, and validating such generative geoQA systems. We compare five manually constructed geographical question corpora, GeoAnQu, Giki, GeoCLEF, GeoQuestions201, and Geoquery, by applying a conceptual transformation parser. The parser infers geo-analytical concepts and their transformations from a geographical question, akin to an abstract GIS workflow. Transformations thus represent the complexity of geo-analytical operations necessary to answer a question. By estimating the variety of concepts and the number of transformations for each corpus, the five corpora can be compared on the level of geo-analytical complexity, which cannot be done with purely NLP-based methods. Results indicate that the questions in GeoAnQu, which were compiled from GIS literature, require a higher number as well as more diverse geo-analytical operations than questions from the four other corpora. Furthermore, constructing a corpus with a sufficient representation (including GIS) may require an approach targeting a uniquely qualified group of users as a source. In contrast, sampling questions from large-scale online repositories like Google, Microsoft, and Yahoo may not provide the quality necessary for testing generative geoQA systems.

Codecheck details

Certificate identifier: 2023-006

Codechecker names: Philipp A. Friese, Jakub Krukar

Time of codecheck: 2023-06-13 12:00:00

Repository: https://osf.io/d2shf

Codecheck report: https://doi.org/10.17605/OSF.IO/D2SHF

Summary:

The data and software of the paper under reproduction is published on GitHub under an MIT license. All figures, tables, and embedded data points have been reproduced. The authors showed dedication and concern to support reproducibility of their work. Reproduction was successful.


https://codecheck.org.uk/ | GitHub codecheckers

© Stephen Eglen & Daniel Nüst

Published under CC BY-SA 4.0

DOI of Zenodo Deposit

CODECHECK is a process for independent execution of computations underlying scholarly research articles.