+ - 0:00:00
Notes for current slide
Notes for next slide

Code execution during peer review

with CODECHECK and at the AGILE conference
https://codecheck.org.uk/ | https://reproducible-agile.github.io/

Daniel Nüst @ Collaborations Workshop 2022 (CW22), 2022-04-04

Institute for Geoinformatics, University of Münster | http://nüst.de | @nordholmen

Slides: https://bit.ly/cw22-keynote-daniel

DOI:10.6084/m9.figshare.19487573

CC-BY-SA 4.0

1 / 39

Declarations and acknowledgements

Declarations

Reproducibility Chair AGILE conference

CODECHECK paper: https://f1000research.com/articles/10-253/v2

Acknowledgements

CODECHECK: Mozilla mini science grant, UK SSI; editors @ Gigascience, eLife, Scientific Data

Reproducible AGILE received funding as an AGILE Initiative

All work was supported by the project Opening Reproducible Research (o2r) with funding by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) under project numbers PE 1632/10-1 and PE 1632/17-1.

3 / 39

CODECHECK: Evaluating the reproducibility of computational results reported in scientific journals

Stephen J Eglen Cambridge Computational Biology Institute
https://sje30.github.io University of Cambridge
sje30@cam.ac.uk @StephenEglen
Daniel Nüst Institute for Geoinformatics
https://nordholmen.net University of Münster
daniel.nuest@uni-muenster.de @nordholmen

https://codecheck.org.uk/

4 / 39

CODECHECK in one slide

Premise: paper submitted to peer review.

5 / 39

CODECHECK in one slide

Premise: paper submitted to peer review.

  1. We take your paper, code and datasets.
5 / 39

CODECHECK in one slide

Premise: paper submitted to peer review.

  1. We take your paper, code and datasets.

  2. We run your code on your data.

5 / 39

CODECHECK in one slide

Premise: paper submitted to peer review.

  1. We take your paper, code and datasets.

  2. We run your code on your data.

  3. If our results match your results, go to step 5.

5 / 39

CODECHECK in one slide

Premise: paper submitted to peer review.

  1. We take your paper, code and datasets.

  2. We run your code on your data.

  3. If our results match your results, go to step 5.

  4. Else we talk to you to find out where code broke. If you fix your code or data, we return to step 2 and try again.

5 / 39

CODECHECK in one slide

Premise: paper submitted to peer review.

  1. We take your paper, code and datasets.

  2. We run your code on your data.

  3. If our results match your results, go to step 5.

  4. Else we talk to you to find out where code broke. If you fix your code or data, we return to step 2 and try again.

  5. We write a report summarising that we could reproduce your outputs (document error messages, possibly mismatches we see)

5 / 39

CODECHECK in one slide

Premise: paper submitted to peer review.

  1. We take your paper, code and datasets.

  2. We run your code on your data.

  3. If our results match your results, go to step 5.

  4. Else we talk to you to find out where code broke. If you fix your code or data, we return to step 2 and try again.

  5. We write a report summarising that we could reproduce your outputs (document error messages, possibly mismatches we see)

  6. We work with you to freely share your paper, code, data and our reproduction.

5 / 39

Premise

Figure 1 of https://doi.org/10.12688/f1000research.51738.2

We should be sharing material on the left, not the right.

"Paper as advert for Scholarship" (Buckheit & Donoho, 1995)

6 / 39

Premise

Figure 1 of https://doi.org/10.12688/f1000research.51738.2

We should be sharing material on the left, not the right.

"Paper as advert for Scholarship" (Buckheit & Donoho, 1995)

6 / 39

The left half of the diagram shows a diverse range of materials used within a laboratory. These materials are often then condensed for sharing with the outside world via the research paper, a static PDF document. Working backwards from the PDF to the underlying materials is impossible. This prohibits reuse and is not only non-transparent for a specific paper but is also ineffective for science as a whole. By sharing the materials on the left, others outside the lab can enhance this work.

As we all know from advertising, there is a big disconnect between what is advertised, and what the actual experience is like.

Approaches to code sharing

7 / 39

CODECHECK takes a different approach...

The CODECHECK philosophy

  • Systems like Code Ocean set the bar high by "making code reproducible forever for everyone"

  • CODECHECK simply asks "was the code executable once for someone else?""

  • We check the code runs and generates the expected number of output files

  • The contents of those output files must not strictly be checked, though in practice until today they are; in any case outputs available for others (authors) to see

  • The validity of the code is not checked; complement to scientific peer review

8 / 39

The CODECHECK philosophy

  • Systems like Code Ocean set the bar high by "making code reproducible forever for everyone"

  • CODECHECK simply asks "was the code executable once for someone else?""

  • We check the code runs and generates the expected number of output files

  • The contents of those output files must not strictly be checked, though in practice until today they are; in any case outputs available for others (authors) to see

  • The validity of the code is not checked; complement to scientific peer review

More details see paper and CODECHECK principles.

8 / 39

The CODECHECK example process implementation

Figure 2 of https://doi.org/10.12688/f1000research.51738.2

9 / 39

Only briefly

Our workflow is just one of many possibilities of a CODECHECK workflow. Here we consider several dimensions in a space of possible CODECHECK workflows (Figure 3). These aspects touch on timing, responsibilities, and transparency.

Variations in a codecheck

Figure 3 of https://doi.org/10.12688/f1000research.51738.2

10 / 39

Skip

Core principles

1. Codecheckers record but don't investigate or fix.

2. Communication between humans is key.

3. Credit is given to codecheckers.

4. Workflows must be auditable.

5. Open by default and transitional by disposition.

11 / 39

Example certificate

Figure 4 of https://doi.org/10.12688/f1000research.51738.2 (click image to scroll)

13 / 39

Figure 4 shows pages 1–4 (of 10) of an example certificate to check predictions of COVID-19 spread across the USA. Figure 4A shows the certificate number and its DOI, which points to the certificate and any supplemental files on Zenodo. The CODECHECK logo is added for recognition and to denote successful reproduction. Figure 4B provides the key metadata extracted from codecheck.yml; it names the paper that was checked (title, DOI), the authors, the codechecker, when the check was performed, and where code/data are available. Figure 4C shows a textual summary of how the CODECHECK was performed and key findings. Figure 4D (page 2 of the certificate) shows the outputs that were generated based on the MANIFEST of output files in the CODECHECK. It shows the file name (Output), the description stating to which figure/table each file should be compared in the original paper (Comment), and the file size. Page 3 of the certificate, Figure 4E gives detailed notes from the codechecker, here documenting what steps were needed to run the code and that the code took about 17 hours to complete. Page 4 of the certificate shows the first output generated by the CODECHECK Figure 4F. In this case, the figure matched figure 4 of 52. The remaining pages of the certificate show other outputs and the computing environment in which the certificate itself was created (not shown here).

Limitations

  1. CODECHECKER time is valuable, so needs credit.

  2. Very easy to cheat the system, but who cares?

  3. Authors' code/data must be freely available.

  4. Deliberately low threshold for gaining a certificate.

  5. High-performance compute is a resource drain.

  6. Cannot (yet) support all thinkable/existing workflows and languages.

14 / 39

Next steps

  1. Embedding into journals' workflows.

  2. Training a community of codecheckers (❤️ ReproHack).

  3. Funding for a codecheck editor.

  4. Come and get involved

For more information please see: http://codecheck.org.uk and #CODECHECK

15 / 39

16 / 39

Reproducible AGILE

https://reproducible-agile.github.io/

17 / 39

Reproducible AGILE

https://reproducible-agile.github.io/

2017, ‘18 & ‘19: Workshops on reproducibility

2019: Reproducible publications at AGILE conferences (initiative)

2020: AGILE Reproducible Paper Guidelines v1

2020: First AGILE reproducibility review; guidelines v2

2021: Guidelines mandatory; repro reviews linked from papers: https://agile-giss.copernicus.org/articles/2/index.html

17 / 39

AGILE Reproducible Paper Guidelines

https://doi.org/10.17605/OSF.IO/CB7Z8

  • Promotion, not exclusion
  • Data and software availability section
  • Author & reviewer guidelines
  • Reproducibility checklist

14 successful reproductions in 2020 & '21

18 / 39

Review process

Reproducibility review after accept/reject decisions

Reproducibility review & communication

Community conference & volunteers

Badges on proceedings website, article website with link, and first article page (💖 Copernicus!)

:scale 130%

19 / 39

Review process

Reproducibility review after accept/reject decisions

Reproducibility review & communication

Community conference & volunteers

Badges on proceedings website, article website with link, and first article page (💖 Copernicus!)

:scale 130%

19 / 39

Review process

Reproducibility review after accept/reject decisions

Reproducibility review & communication

Community conference & volunteers

Badges on proceedings website, article website with link, and first article page (💖 Copernicus!)

:scale 130%

19 / 39

🙌

How to put your community on a path towards more reproducibility in 5 easy hard steps

20 / 39

🙌

How to put your community on a path towards more reproducibility in 5 easy hard steps

1️⃣ Build a team of enthusiasts (workshop, social events) 💪🧠

20 / 39

🙌

How to put your community on a path towards more reproducibility in 5 easy hard steps

1️⃣ Build a team of enthusiasts (workshop, social events) 💪🧠

2️⃣ Assess the current state and raise awareness (workshop, paper) 🔬

20 / 39

🙌

How to put your community on a path towards more reproducibility in 5 easy hard steps

1️⃣ Build a team of enthusiasts (workshop, social events) 💪🧠

2️⃣ Assess the current state and raise awareness (workshop, paper) 🔬

3️⃣ Institutional support (🙏 AGILE Council 🙏 + committee chairs)

20 / 39

🙌

How to put your community on a path towards more reproducibility in 5 easy hard steps

1️⃣ Build a team of enthusiasts (workshop, social events) 💪🧠

2️⃣ Assess the current state and raise awareness (workshop, paper) 🔬

3️⃣ Institutional support (🙏 AGILE Council 🙏 + committee chairs)

4️⃣ Positive encouragement (no reproduction != bad science)

20 / 39

🙌

How to put your community on a path towards more reproducibility in 5 easy hard steps

1️⃣ Build a team of enthusiasts (workshop, social events) 💪🧠

2️⃣ Assess the current state and raise awareness (workshop, paper) 🔬

3️⃣ Institutional support (🙏 AGILE Council 🙏 + committee chairs)

4️⃣ Positive encouragement (no reproduction != bad science)

5️⃣ Keep at it! 🤗

20 / 39

🙌

How to put your community on a path towards more reproducibility in 5 easy hard steps

1️⃣ Build a team of enthusiasts (workshop, social events) 💪🧠

2️⃣ Assess the current state and raise awareness (workshop, paper) 🔬

3️⃣ Institutional support (🙏 AGILE Council 🙏 + committee chairs)

4️⃣ Positive encouragement (no reproduction != bad science)

5️⃣ Keep at it! 🤗


(Next) steps

Reproducibility reviews 2022+

Grow reproducibility reviewer team

Continue community discourse

Re-assess new papers > impact?

Towards open scholarship: Open review if tenured? Format-free first submission? CRediT?

20 / 39

🙌

How to put your community on a path towards more reproducibility in 5 easy hard steps

1️⃣ Build a team of enthusiasts (workshop, social events) 💪🧠

2️⃣ Assess the current state and raise awareness (workshop, paper) 🔬

3️⃣ Institutional support (🙏 AGILE Council 🙏 + committee chairs)

4️⃣ Positive encouragement (no reproduction != bad science)

5️⃣ Keep at it! 🤗


(Next) steps

Reproducibility reviews 2022+

Grow reproducibility reviewer team

Continue community discourse

Re-assess new papers > impact?

Towards open scholarship: Open review if tenured? Format-free first submission? CRediT?

Phase out when standard practice...

20 / 39

What is RSE(ng) about this?

🖥️

🤹 ⛏️

T ➡️ π

21 / 39

What is RSE(ng) about this?

🖥️

🤹 ⛏️

T ➡️ π

🤓

👶

🧰

21 / 39

What is RSE(ng) about this?

🖥️

🤹 ⛏️

T ➡️ π

🤓

👶

🧰

📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦📦

21 / 39

The skillset of an RSE is great for just trying to run somebody else's workflow, and the task can be a fun "riddle" to solve.

The feedback that an RSE with development experience can give is helpful, and rare help for many researchers. RSEs can give a great contribution to scholarly communication.

PI-shaped scientists, traditional vs. modern: need deep topical and deep technical skills


The kind community that we as RSEs live in is also a great mindset for a positive attitude in reproduction, in the interaction with an author.

The whole exercise in general is excellent for early career researchers, and an entry point to participate in peer review, add a missing set of skills to peer review.

And you always learn something!


And you learn to use all of the package managers that exist.

Code review != reproducibility review

22 / 39

Code review != reproducibility review

22 / 39

Code review != reproducibility review

22 / 39

Why was the talk announced as code review and you did not talk about reviewing code but explicitly about two processes that did not care about code quality?

Well, I hope by now you don't think it was merely a trick to get all of you interested!

Instead, I would like to think of what I showed today as a precursor, hopefully, of more actual code review as part of scholarly communication and discourse.

https://researchcodereviewcommunity.github.io/dev-review/

23 / 39

Computational reproducibility is still perceived as hard, much too rarely taught or checked, and if achieved it does not get enough credit. Irreproducibility is not a technological problem, but a social and systemic one. CODECHECK and Reproducible AGILE try to tackle a small part of all the bigger problems that science has, and be kind in the process. Cultural change takes time.

With the few seconds left, I want to answer two more questions:

[Many problems: Publish or perish, Broken metrics (citations, JIF), Structural change not considering , senior academics, Publication bias, Long-term funding for tools & infrastructure, HARKing, p-Hacking, Scholarly communication 1.0, Lack of reusability, Lack of transparency, Lack of reproducibility, Reinventing the wheel, Retraction practices, Not invented here syndrome, Fraud, Imposter syndrome, No “negative” citation, ...]

One thing on more reproducible research publications:

24 / 39

One thing on more reproducible research publications:

Have a README: all else is details.

Inspired by Greg Wilson’s Teching Tech Together Rule 1 - http://teachtogether.tech/en/index.html

24 / 39

One thing on more reproducible research publications:

Have a README: all else is details.

Inspired by Greg Wilson’s Teching Tech Together Rule 1 - http://teachtogether.tech/en/index.html








Thank you! Questions?

HTML slides: https://bit.ly/cw22-keynote-daniel | PDF slides: https://doi.org/10.6084/m9.figshare.19487573

24 / 39

If your remember ONE thing from this talk, it should be this:

I'm going to leave that up there: Thank you for your attention - I look forward to your questions.

Encore

25 / 39

"It ain't pretty, but it works" (H. Bastian)

(The most prominent check until today!)

26 / 39

Who does the work?

  1. AUTHOR provides code/data and instructions on how to run.

  2. CODECHECKER runs code and writes certificate.

  3. PUBLISHER oversees process, helps depositing artifacts, and persistently publishes certificate.

27 / 39

Who does the work?

  1. AUTHOR provides code/data and instructions on how to run.

  2. CODECHECKER runs code and writes certificate.

  3. PUBLISHER oversees process, helps depositing artifacts, and persistently publishes certificate.

Who benefits?

  1. AUTHOR gets early check that "code works"; gets snapshot of code archived and increased trust in stability of results.

  2. CODECHECKER gets insight in latest research and methods, credit from community, and citable object.

  3. PUBLISHER Gets citable certificate with code/data bundle to share and increases reputation of published articles.

  4. PEER REVIEWERS can see certificate rather than check code themselves.

  5. READER Can check certificate and build upon work immediately.

27 / 39
28 / 39

Definition

How the Turing Way defines reproducible research¶

CC-BY 4.0 | © The Turing Way Community | https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions.html

29 / 39

Learn more about code execution practices at journals and conferences

https://osf.io/x32nc

Daniel Nüst, Heidi Seibold, Stephen Eglen, Lea Schulz-Vanheyden, Limor Peer, Josef Spillner

30 / 39

Deep dive

Chiarelli, Andrea, Loffreda, Lucia, & Johnson, Rob. (2021). The Art of Publishing Reproducible Research Outputs: Supporting emerging practices through cultural and technological innovation. Zenodo. https://doi.org/10.5281/zenodo.5521077

Chiarelli, Andrea, Loffreda, Lucia, & Johnson, Rob. (2021). Executive Summary: The Art of Publishing Reproducible Research Outputs: Supporting emerging practices through cultural and technological innovation. Zenodo. https://doi.org/10.5281/zenodo.5639384

31 / 39

Reproducible AGILE and CODECHECK: Highlights of Lessons learned

Spectrum or layers of reproducibility very apparent

Effect of guidelines at AGILE: improved reproducibility, community discourse

Reproducibility reports/CODECHECK certificates full of recommendations for improvement, often well received by authors, many included in revised submission

Good practices spread slowly, establishing a process is tedious, needs time until familiarity

Challenges for reproducibility reviewer: Inconsistencies and disconnects (figures), lack of documentation, unknown runtimes vs. no subsets of data, lack of reprod. guidance

Reproductions are rewarding and educational, matching expertises tricky

Communication is without alternative

Safety net (👀), not security

32 / 39

What can communities & institutions do?

Introduce reproducibility reviews - CODECHECK (or not) - at your journals, labs, collaborations!

Workshops on RCR, ReproHacks

Provide support (R2S2, PhD edu.)

Rewards and incentives

Community discourse

Awareness > Change

Throw technology at it

33 / 39

Digital information lasts forever, or five years - whichever comes first.

Rothenberg, Jeff. 1995. “Ensuring the Longevity of Digital Documents.” Scientific American 272 (1): 42–47. JSTOR via https://twitter.com/snet_jklump/status/1141934045820887040?s=09

34 / 39

"Preproducibility" - Philip Stark

"Science should be 'show me', not 'trust me'; it should be 'help me if you can', not 'catch me if you can'."

[...]

"If you and I get different results, preproducibility can help us to identify why — and the answer might be fascinating."

35 / 39

Reproducibility spectrum

Peng R. D. (2011). Reproducible research in computational science. Science (New York, N.Y.), 334(6060), 1226–1227. https://doi.org/10.1126/science.1213847

36 / 39

Five selfish reasons to work reproducibly

  1. reproducibility helps to avoid disaster

  2. reproducibility makes it easier to write papers

  3. reproducibility helps reviewers see it your way

  4. reproducibility enables continuity of your work

  5. reproducibility helps to build your reputation

Markowetz, F. Five selfish reasons to work reproducibly. Genome Biol 16, 274 (2015). https://doi.org/10.1186/s13059-015-0850-7

37 / 39

Reproducibility is "more work"

Quintana, D. S. (2020, November 28). Five things about open and reproducible science that every early career researcher should know. https://doi.org/10.17605/OSF.IO/DZTVQ

38 / 39

GIScience assessment

Nüst, Daniel. 2021. Infrastructures and Practices for Reproducible Research in Geography, Geosciences, and GIScience. Doctoral dissertation, University of Münster, Germany. https://doi.org/10.5281/zenodo.4768096

39 / 39

Everybody should do this for their discipline

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow