This page documents the default concrete implementation of a CODE CHECK process developed by the CODE CHECK core team. See the CODE CHECK process page for alternative approaches and the principles for the foundations of a CODE CHECK.
The author must provide a preproducible workflow: all data and code files necessary to reproduce the results are provided in a way that allows fellow researchers to follow and execute the steps. Often this workflow includes the generation of key figures from the article based on data.
A typical measure for documentation is to provide at least so much information as the author would themselves need after a long period of time, e.g. 1 year, to run the analysis again. Any researcher, even if not familiar with the software stack, should be able to run the workflow and find out if the code works.
Common sense shall be applied to decide about the suitable amount of data and to handle big datasets and privacy concerns. For example, data may be deposited depending on community practices in remote repositories, synthetic data may be used, subsets or preprocessed data may be included, or protected access to information may be provided (e.g. cloud-based data enclaves).
The minimal set of files, besides all required data and code, to implement a CODE CHECK process are the following (
/ is the project root directory):
/READMEfile with instructions how to execute the workflow (must be plain text, may have suitable extension, e.g.,
/MANIFESTfile with a list of output files created by the workflow - these must be created by the workflow and are the basis for validating a successful check
/LICENSEfile with information about licenses for all submitted material, e.g. code license for scripts and data licenses for used datasets
These files and the worklow are published in a dedicated self-contained GitHub repository in the codecheckers organisation. After the check, authors may use a pull request to transfer the improvements and certificate back to their own repository.
Beyond that, any additional configuration and information is extremely helpful, of course. Concrete guidelines as to what this can entail are given by the codechecker’s tasks below and the concept of a research compendium.
The codechecker in general is not there to fix things, but to document how far they go and give feedback to the author until a check could be completed or is found impossible (see principle 2). However, a codechecker may, for example out of personal interest in the research at hand, invest additional efforts. In any case, the overall goal is to leave the workflow repository in the same or better condition.
Prerequisite: familiarity with
make, https://the-turing-way.netlify.com/make/make.html & https://swcarpentry.github.io/make-novice/reference.
- Fork the repository to the codecheckers org
- Create a directory
codecheckto not interfere with original files (may use
codecheckexists in submission). All files that are by no chance created by an author go into this directory. Other files that an author could have used must not be hidden in this check-related directory, but just be added to the appropriate location (see “leave in a better condition” above).
- Re-run the workflow based on provided documentation, i.e. recreate all files listed in
MANIFEST, based on a
- template found at XXXX
- Create a notebook?
- Document the used computing environment using suitable configuration files, see REES specification, for example by creating one or several of the following files or tools:
DESCRIPTIONfile (R package)
- Add a report
- WHO checked WHAT, and HOW?
- template for report at XXXX
- Optional report content depending on interest, time, and skills:
- Do the generated outputs match the ones in the original manuscript? Are the differences relevant or not?
- Are used pieces of software and data properly CITED and publicly DEPOSITED und suitable LICENSES?
- Are open formats (text-based) included?
- Is data and software FAIR?
- Deposit the original material along with the reproduced files on Zenodo under you name, adding the original authors as co-authors
Every code check is unique, just as the associated research article. The codechecker can thereby rely on the following examples and future published code checks for good practices and approaches.
- seed not set, but human can judge the code works
- codechecker added
pip freezeafter workflow could be executed
- codechecker added text to README about the computing environment
Draft check report: https://github.com/sje30/codecheck/blob/master/cert/eglen2016/eglen2016-crc.Rmd
Hathway Goodman, 2018
Barto Sutten Anderson, 1983
Are the CODE CHECK principles not a quite low bar?
- A little bit is more than nothing.
- The lower the bar, the earlier in the process a CODE CHECK can occure (e.g. pre-review)
- Exposure to reproducibility practices is key.
- Nudging works.
- Better get a certificate for small parts of the workflow than not doing any code checks - not everything needs to be checked.
- It’s old, works, and is available everywhere.
- Use the caching mechanism.
How should I best package my code and data?
So that the codechecker can make sense of it. Beyond that, follow community practices or create a research compendium.
What if I used proprietary software X?
The codechecker must be able to conduct the check. If they have access to he software, there is no problem.
Can’t researchers cheat by just creating the
But I work with huge/privacy data!
- synthetic data
Why does a research have to provide licensing information?
In this first implementation of the CODE CHECK principles described here, we imagine an Open process, and there is no Openness without proper handling of copyright.
Why don’t you require containers?
The codechecker put’s their name under the check. That is more important than technical details, which will vary on a case by case basis.
What if Docker/
make/software X is not available anymore?
Software stacks are volatile, and CODE CHECK acknowledges this by focussing on the idea that a workflow was once reproducible by a person other than the author. This is a huge step forward, and we are excited for the times when even more will be commonplace. The check largely increases the chances that the checked code works for third parties and is accessible, e.g. using text-based file formats, and portable, e.g. using containers. The files and archives (a Docker image is a tarball of a directory system) will be readable much longer than tools will be there to easily work with them.