Introduction
The CODECHECK process describes a workflow for a reproduction of computations as part of a scientific peer review. CODECHECK follows a set of principles that allow many different variations into concrete implementations. The requirements for a successful CODECHECK are intentionally kept to a minimum, as are the requirements on how codechecking is conducted, or how the procedure is documented. At the end of the CODECHECK stands a CODECHECK report document, written by the codechecker and understandable to a person with some expertise in the scientific field of the related article. Besides the human-readable information in the CODECHEK report, there is a small set of metadata elements that are part of a CODECHECK procedure which are worth capturing in a more structured format.
This metadata is saved in the CODECHECK configuration file, which is specified in this document in version 1.0. The CODECHECK configuration file can serve as the identifier of a CODECHECK bundle, i.e. all the files part of a CODECHECK. The CODECHEK bundle is not formally specified, as its contents are largely at the discretion of the codechecker. The CODECHECK configuration file, however, is formally specified to enable automated extraction and development of tools to support codechecking. Both the author and the codechecker contribute information to the configuration file.
In the future, this information enables both meta-research about code within peer-reviews and more user-friendly assitance systems for authors, codecheckers, and publisher’s staff.
Note
This specification is result of a scientific collaborative project. Help improving it by providing your feedback.
Notational conventions
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” are to be interpreted as described in RFC 2119.
tl;dr for authors
If you are an author and just want to get the minimal codecheck.yml
file prepared to the community workflow without reading the whole technical specification, then please use the following template. The template has a bit more than the strictly mandatory fields, but these extra fields are important for you as a creator to get credit. Please validate your final configuration file with a YAML Validator.
---
version: https://codecheck.org.uk/spec/config/1.0/
manifest:
- file: name of output file 1 (e.g. figure.pdf)
comment: short description of output file, e.g. ('Figure 1 in the paper', 'Result of running model variant A')
- file: result.csv
comment: short description of output file, e.g. ('Figure 1 in the paper', 'Result of running model variant A')
paper:
title: "A good paper"
authors:
- name: Josiah Carberry
ORCID: 0000-0002-1825-0097
reference: https://doi.org/preprint.1
Format, name and encoding
Format: YAML 1.1 or later
Name: codecheck.yml
The file MUST be encoded in UTF-8.
Versioning
This document specifies version 1.0
. The specification uses a major.minor
semantic versioning scheme. Non-breaking changes can be introduced in a minor version release. The latest version of the specification can be found at https://codecheck.org.uk/spec/config/latest.
Storage location
The codecheck.yml
is stored at the root of the project folder where all files related to the CODECHECK are saved. The folder where the codecheck.yml
file is stored is called the CODECHECK bundle. It is the folder that also includes a directory codecheck
(or .codecheck
) for all files created during codechecking, see CODECHECK bundle documentation.
Content
Explicit document and directive
The file MUST include three dashes (---
), the document start marker, to seperate the directive from document content.
The file SHOULD define the YAML version in the directive. While YAML supports bare (YAML 1.2) or implicit (YAML 1.1) documents, an explicit indication of the format is preferable for the CODECHECK use case. Clarity is better. A codecheck.yml
file is therefore an explicit document.
Version
The file SHOULD include a root-level node version
with a URL denoting the used version of the CODECHECK configuration file specification. If no version is provided, the latest version SHOULD be assumed by software tools, but these tools CAN also abort processing the codecheck.yml
with an informative message.
Example
%YAML 1.1 --- version: https://codecheck.org.uk/spec/config/1.0/
Manifest list
The configuration file MUST have a root-level sequence (i.e., a list) of files called manifest
that form the manifest. All files part of the manifest must be recreated during a CODECHECK.
Each manifest sequence item MUST have a node file
providing the relative path to a file that is part of the computational workflow. The relative paths MUST be relative to the location of the codecheck.yml
. Each manifest sequence item MAY have a node comment
with human-readable information about said file.
Example
--- version: https://codecheck.org.uk/spec/config/1.0/ manifest: - file: outputData.csv comment: data/output/one.csv - file: fig1.pdf - file: resultVectors.txt - file: appendix_figures.pdf comment: "appendix of paper, starting at page 12"
Author and submission metadata
The configuration file SHOULD include minimal metadata about the paper, i.e. the title and the author(s) of the paper whose workflow is submitted to the CODECHECK. For this information, the configuration file SHOULD have a root-level sequence paper
. This information might be added after the CODECHECK or edited, e.g., after publication, therefore all sub-elements are optional.
The element paper
SHOULD have a child item title
with the title of the submission or publication.
The element paper
SHOULD have a child sequence authors
. The child nodes of authors
sequence are called “author item”. There MUST be at least one author item, which is the corresponding author of the workflow under review. The corresponding author MUST be the first author item in the authors
sequence. However, “authors” may be used very broadly and should not only list all authors but can include all types of contributors, e.g., software engineers, infrastructure service staff, etc.
Each author item MUST have a child name
with the author’s name. Each author item SHOULD have a child ORCID
with the author’s ORCID identifier. The value of the MUST can be the plain ORCID, e.g., 0000-0000-0000-0000
, without URL prefix (i.e., without https://orcid.org/...
).
If the workflow accompanies a preprinted article or concerns an article under review, a reference to the article SHOULD be put in the node reference
under the root-level node paper
. Ideally the identifier is a DOI in form of a resolvable URL, or a identifiable text string such as arXiv:2001.10641
, or a short text “Under review at X”/”Paper to appear in Y”.
Example
--- # [...] paper: title: "A good paper" authors: - name: Josiah Carberry ORCID: 0000-0002-1825-0097 - name: John Doe reference: https://doi.org/preprint.1
The configuration file CAN have a root level node source
with a textual description or a single URL to describe the source of the checked material. The field SHOULD be used if the material used for the check is drawn from multiple sources so that the repository
node (see Codecheck metadata) and the metadata accessible via that URL can not sufficiently describe provenance of code or data files.
Example
--- # [...] source: Data is available at https://download.url/dataset/123456/v2 and code can be found in an attachment to the submitted manuscript.
Codecheck metadata
Further important metadata is created during the CODECHECK process. The codecheck.yml
started by the author is extended with this information by the codechecker. If a codechecker changes the meaning of any content provided by the author in the configuration file, they SHOULD clearly mark these changes in the form of a comment, in addition to a transparent record through the file being under version control.
The configuration file MUST include minimal metadata about the codechecker in a root-level sequence codechecker
with at least one child element. Each item in the codechecker
sequence MUST have one node name
with the codechecker’s name. Each item in the codechecker
sequence SHOULD have a child ORCID
as defined in Author and submission metadata.
The configuration file MUST have a root-level node report
with a unique identifier for the published CODECHECK report, such as a URL or DOI, ideally in a resolvable format.
The CODECHECK CAN add further fields with the following names and semantics:
summary
: Short textual summary of the CODECHECK report.repository
: A URL or a list of URLs to the code or data repository/ies where more files and a version history of the checked workflow are available.source
: seesource
.check_time
: A date or timestamp when the CODECHECK was completed. If not time is provided, it should be assumed that codechecking was completed at the publication date of the CODECHEK report.certificate
: A unique identifier for the certificate as awared in the CODECHECK register.
Example
--- manifest: - file: outputData.csv comment: data/output/one.csv - file: fig1.pdf codechecker: - name: S. Eglen ORCID: 0000-0001-8607-8025 - name: Daniel N. ORCID: 0000-0002-0024-5046 report: https://doi.org/10.5281/zenodo.3674056 summary: | The check was straightforward as all material was provided and documented well, but computations took about 3 hours to run. repository: https://github.com/codecheckers/Piccolo-2020 check_time: "2019-01-01 13:00:00" certificate: 2020-001
Additional content
The file codecheck.yml
may include any number of other nodes or sequences to support specific instances of a CODECHECK process. For clarity these SHOULD be named in a way that clearly identifies the origin and use case, e.g. by prepending a common prefix to node names or using a single parent node.
Example
--- # [...] publishing_inc_identifier: 12345 publishing_inc_handler: Ed Editor TheBestRepository: recordId: 1a2b3c checksum: cdce90c878462d073b31aec21ccee48e3366250a6baafd215fa73d1c6bc0357b
Minimal example
---
manifest:
- file: fig1.pdf
Full example
%YAML 1.1
---
version: https://codecheck.org.uk/spec/config/1.0/
manifest:
- file: outputData.csv
comment: originally stored at data/output/one.csv
- file: fig1.pdf
comment: Figure 1
- file: resultVectors.txt
comment: output vectors in plain text format
- file: appendix_figures.pdf
comment: "appendix of paper, starting at page 12"
paper:
title: "A good paper"
authors:
- name: Josiah Carberry
ORCID: 0000-0002-1825-0097
- name: John Doe
reference: https://doi.org/preprint.1
codechecker:
- name: S. Eglen
ORCID: 0000-0001-8607-8025
report: https://doi.org/abcde.12345
summary: |
The check was straightforward as all material was provided anddocumented well, but computations took about 3 hours to run.
The created figures seem to match the ones provided in the article. The content of other output files was not checked.
repository:
- https://github.com/codecheckers/example-workflow
- https://github.com/codecheckers/example-data
check_time: "2019-01-01 13:00:00"
certificate: 2020-999
More examples can be found in the repositories of the codecheckers organisation on GitHub: https://github.com/codecheckers/.