Abstract
Background Classification algorithms
assign observations to groups based on patterns in data. The
machine-learning community have developed myriad classification
algorithms, which are used in diverse life science research domains.
Algorithm choice can affect classification accuracy dramatically, so it
is crucial that researchers optimize the choice of which algorithm(s) to
apply in a given research domain on the basis of empirical evidence. In
benchmark studies, multiple algorithms are applied to multiple datasets,
and the researcher examines overall trends. In addition, the researcher
may evaluate multiple hyperparameter combinations for each algorithm and
use feature selection to reduce data dimensionality. Although software
implementations of classification algorithms are widely available,
robust benchmark comparisons are difficult to perform when researchers
wish to compare algorithms that span multiple software packages.
Programming interfaces, data formats, and evaluation procedures differ
across software packages; and dependency conflicts may arise during
installation.
Findings To address these challenges,
we created ShinyLearner, an open-source project for integrating
machine-learning packages into software containers. ShinyLearner
provides a uniform interface for performing classification, irrespective
of the library that implements each algorithm, thus facilitating
benchmark comparisons. In addition, ShinyLearner enables researchers to
optimize hyperparameters and select features via nested
cross-validation; it tracks all nested operations and generates output
files that make these steps transparent. ShinyLearner includes a Web
interface to help users more easily construct the commands necessary to
perform benchmark comparisons. ShinyLearner is freely available at https://github.com/srp33/ShinyLearner.
Conclusions This
software is a resource to researchers who wish to benchmark multiple
classification or feature-selection algorithms on a given dataset. We
hope it will serve as example of combining the benefits of software
containerization with a user-friendly approach.