The CODECHECK process describes a workflow for a reproduction of computations as part of a scientific peer review. CODECHECK follows a set of principles that allow many different variations into concrete implementations. The requirements for a successful CODECHECK are intentionally kept to a minimum, as are the requirements on how codechecking is conducted, or how the procedure is documented. At the end of the CODECHECK stands a CODECHECK report document, written by the codechecker and understandable to a person with some expertise in the scientific field of the related article. Besides the human-readable information in the CODECHEK report, there is a small set of metadata elements that are part of a CODECHECK procedure which are worth capturing in a more structured format.
This metadata is saved in the CODECHECK configuration file, which is specified in this document in version 1.0. The CODECHECK configuration file can serve as the identifier of a CODECHECK bundle, i.e. all the files part of a CODECHECK. The CODECHEK bundle is not formally specified, as its contents are largely at the discretion of the codechecker. The CODECHECK configuration file, however, is formally specified to enable automated extraction and development of tools to support codechecking. Both the author and the codechecker contribute information to the configuration file.
In the future, this information enables both meta-research about code within peer-reviews and more user-friendly assitance systems for authors, codecheckers, and publisher’s staff.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” are to be interpreted as described in RFC 2119.
tl;dr for authors
If you are an author and just want to get the minimal
codecheck.yml file prepared to the community review process without reading the whole technical specification, then please use the following template. The template has a bit more than the strictly mandatory fields, but these extra fields are important for you as a creator to get credit. Please validate your final configuration file with a YAML Validator.
--- version: https://codecheck.org.uk/spec/config/1.0/ manifest: - file: name of output file 1 (e.g. figure.pdf) comment: short description of output file, e.g. ('Figure 1 in the paper', 'Result of running model variant A') - file: result.csv comment: short description of output file, e.g. ('Figure 1 in the paper', 'Result of running model variant A') paper: title: "A good paper" authors: - name: Josiah Carberry ORCID: 0000-0002-1825-0097 reference: https://doi.org/preprint.1
Format, name and encoding
Format: YAML 1.1 or later
The file MUST be encoded in UTF-8.
This document specifies version
1.0. The specification uses a
major.minor semantic versioning scheme. Non-breaking changes can be introduced in a minor version release. The latest version of the specification can be found at https://codecheck.org.uk/spec/config/latest.
codecheck.yml is stored at the root of the project folder where all files related to the CODECHECK are saved. The folder where the
codecheck.yml file is stored is called the CODECHECK bundle. It is the folder that also includes a directory
.codecheck) for all files created during codechecking, see CODECHECK bundle documentation.
Explicit document and directive
The file MUST include three dashes (
---), the document start marker, to seperate the directive from document content.
The file SHOULD define the YAML version in the directive. While YAML supports bare (YAML 1.2)/implicit (YAML 1.1) documents, an explicit indication of the format is preferable for the CODECHECK use case. A
codecheck.yml file is therefore an explicit document.
The file SHOULD include a root-level node
version with a URL denoting the used version of the CODECHECK configuration file specification. If no version is provided, the latest version SHOULD be assumed by software tools, but these tools CAN also abort processing the
codecheck.yml with an informative message.
%YAML 1.1 --- version: https://codecheck.org.uk/spec/config/1.0/
The configuration file MUST have a root-level sequence (i.e., a list) of files called
manifest that form the manifest. All files part of the manifest must be recreated during a CODECHECK.
Each manifest sequence item MUST have a node
file providing the relative path to a file that is part of the computational workflow. The relative paths MUST be relative to the location of the
codecheck.yml. Each manifest sequence item MAY have a node
comment with human-readable information about said file.
--- version: https://codecheck.org.uk/spec/config/1.0/ manifest: - file: outputData.csv comment: data/output/one.csv - file: fig1.pdf - file: resultVectors.txt - file: appendix_figures.pdf comment: "appendix of paper, starting at page 12"
Author and submission metadata
The configuration file SHOULD include minimal metadata about the paper, i.e. the title and the author(s) of the paper whose workflow is submitted to the CODECHECK. For this information, the configuration file SHOULD have a root-level sequence
paper. This information might be added after the CODECHECK or edited, e.g., after publication, therefore all sub-elements are optional.
paper SHOULD have a child item
title with the title of the submission or publication.
paper SHOULD have a child sequence
authors. The child nodes of
authors sequence are called “author item”. There MUST be at least one author item, which is the corresponding author of the workflow under review. The corresponding author MUST be the first author item in the
authors sequence. However, “authors” may be used very broadly and should not only list all authors but can include all types of contributors, e.g., software engineers, infrastructure service staff, etc.
Each author item MUST have a child
name with the author’s name. Each author item SHOULD have a child
ORCID with the author’s ORCID identifier. The value of the MUST can be the plain ORCID, e.g.,
0000-0000-0000-0000, without URL prefix (i.e., without
If the workflow accompanies a preprinted article or concerns an article under review, a reference to the article SHOULD be put in the node
reference under the root-level node
paper. Ideally the identifier is a DOI in form of a resolvable URL, or a identifiable text string such as
arXiv:2001.10641, or a short text “Under review at X”/”Paper to appear in Y”.
--- # [...] paper: title: "A good paper" authors: - name: Josiah Carberry ORCID: 0000-0002-1825-0097 - name: John Doe reference: https://doi.org/preprint.1
The configuration file CAN have a root level node
source with a textual description or a single URL to describe the source of the checked material. The field MUST be used if the material used for the check is drawn from multiple sources so that the
repository node (see Codecheck metadata) and the metadata accessible via that URL can not sufficiently describe provenance of code or data files.
--- # [...] source: Data is available at https://download.url/dataset/123456/v2 and code can be found in an attachment to the submitted manuscript.
Further important metadata is created during the CODECHECK process. The
codecheck.yml started by the author is extended with this information by the codechecker. If a codechecker changes the meaning of any content provided by the author in the configuration file, they SHOULD clearly mark these changes in the form of a comment, in addition to a transparent record through the file being under version control.
The configuration file MUST include minimal metadata about the codechecker in a root-level sequence
codechecker with at least one child element. Each item in the
codechecker sequence MUST have one node
name with the codechecker’s name. Each item in the
codechecker sequence SHOULD have a child
ORCID as defined in Author and submission metadata.
The configuration file MUST have a root-level node
report with a unique identifier for the published CODECHECK report, such as a URL or DOI, ideally in a resolvable format.
The CODECHECK CAN add further fields with the following names and semantics:
summary: Short textual summary of the CODECHECK report.
repository: A URL to the code repository where more files and a version history of the checked workflow are available.
check_time: A date or timestamp when the CODECHECK was completed. If not time is provided, it should be assumed that codechecking was completed at the publication date of the CODECHEK report.
certificate: A unique identifier for the certificate as awared in the CODECHECK register.
--- manifest: - file: outputData.csv comment: data/output/one.csv - file: fig1.pdf codechecker: - name: S. Eglen ORCID: 0000-0001-8607-8025 - name: Daniel N. ORCID: 0000-0002-0024-5046 report: https://doi.org/10.5281/zenodo.3674056 summary: | The check was straightforward as all material was provided and documented well, but computations took about 3 hours to run. repository: https://github.com/codecheckers/Piccolo-2020 check_time: "2019-01-01 13:00:00" certificate: 2020-001
codecheck.yml may include any number of other nodes or sequences to support specific instances of a CODECHECK process. For clarity these SHOULD be named in a way that clearly identifies the origin and use case, e.g. by prepending a common prefix to node names or using a single parent node.
--- # [...] publishing_inc_identifier: 12345 publishing_inc_handler: Ed Editor TheBestRepository: recordId: 1a2b3c checksum: cdce90c878462d073b31aec21ccee48e3366250a6baafd215fa73d1c6bc0357b
--- manifest: - file: fig1.pdf
%YAML 1.1 --- version: https://codecheck.org.uk/spec/config/1.0/ manifest: - file: outputData.csv comment: originally stored at data/output/one.csv - file: fig1.pdf comment: Figure 1 - file: resultVectors.txt comment: output vectors in plain text format - file: appendix_figures.pdf comment: "appendix of paper, starting at page 12" paper: title: "A good paper" authors: - name: Josiah Carberry ORCID: 0000-0002-1825-0097 - name: John Doe reference: https://doi.org/preprint.1 codechecker: - name: S. Eglen ORCID: 0000-0001-8607-8025 report: https://doi.org/abcde.12345 summary: | The check was straightforward as all material was provided anddocumented well, but computations took about 3 hours to run. The created figures seem to match the ones provided in the article. The content of other output files was not checked. repository: https://github.com/codecheckers/example-workflow check_time: "2019-01-01 13:00:00" certificate: 2020-999
More examples can be found in the repositories of the codecheckers organisation on GitHub: https://github.com/codecheckers/.