batchtools
. Older versions of this package were released under the name BatchJobs
(Bischl et al. 2015).batchtools
supports both multi-core and multi-node computations with and without schedulers. By making use of cluster template files, most schedulers and queueing systems are supported (e.g. Torque, Sun Grid Engine, Slurm).BiocParallel
package (see here) provides similar functionalities as batchtools
, but is tailored to use Bioconductor objects.This topic is covered in more detail in other tutorials. The following only provides a very brief overview of this submission method.
1. Create Slurm submission script, here called script_name.sh with:
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=1-00:15:00 # 1 day and 15 minutes
#SBATCH --mail-user=useremail@address.com
#SBATCH --mail-type=ALL
#SBATCH --job-name="some_test"
#SBATCH -p short # Choose queue/partition from: intel, batch, highmem, gpu, short
Rscript my_script.R
2. Submit R script called my_script.R by above Slurm script with:
sbatch script_name.sh
batchtools
batchtools
for a computer cluster using SLURM as scheduler (workload manager). SLURM is the scheduler used by the HPCC at UCR.batchtools
herebatchtools
First login to your cluster account, open R and execute the following lines. This will create a test directory (here mytestdir
), redirect R into this directory and then download the required files:
dir.create("mytestdir")
setwd("mytestdir")
download.file("https://bit.ly/3Oh9dRO", "slurm.tmpl")
download.file("https://bit.ly/3KPBwou", ".batchtools.conf.R")
The following code defines a test function (here myFct
) that will be run on the cluster for demonstration purposes.
The test function (myFct
) subsets the iris
data frame by rows, and appends the host name and R version of each node where the function was executed. The R version to be used on each node can be specified in the slurm.tmpl
file (under module load
).
library('RenvModule')
module('load','slurm') # Loads slurm among other modules
library(batchtools)
myFct <- function(x) {
Sys.sleep(10) # to see job in queue, pause for 10 sec
result <- cbind(iris[x, 1:4,],
Node=system("hostname", intern=TRUE),
Rversion=paste(R.Version()[6:7], collapse="."))
return(result)
}
The following creates a batchtools
registry, defines the number of jobs and resource requests, and then submits the jobs to the cluster via SLURM.
reg <- makeRegistry(file.dir="myregdir", conf.file=".batchtools.conf.R")
Njobs <- 1:4 # Define number of jobs (here 4)
ids <- batchMap(fun=myFct, x=Njobs)
done <- submitJobs(ids, reg=reg, resources=list(partition="short", walltime=120, ntasks=1, ncpus=1, memory=1024))
waitForJobs() # Wait until jobs are completed
After the jobs are completed one can inspect their status as follows.
getStatus() # Summarize job status
showLog(Njobs[1])
# killJobs(Njobs) # # Possible from within R or outside with scancel
The results are stored as .rds
files in the registry directory (here myregdir
). One can access them manually via readRDS
or use various convenience utilities provided by the batchtools
package.
readRDS("myregdir/results/1.rds") # reads from rds file first result chunk
loadResult(1)
lapply(Njobs, loadResult)
reduceResults(rbind) # Assemble result chunks in single data.frame
do.call("rbind", lapply(Njobs, loadResult))
By default existing registries will not be overwritten. If required one can explicitly clean and delete them with the following functions.
clearRegistry() # Clear registry in R session
removeRegistry(wait=0, reg=reg) # Delete registry directory
# unlink("myregdir", recursive=TRUE) # Same as previous line
Loading a registry can be useful when accessing the results at a later state or after moving them to a local system.
from_file <- loadRegistry("myregdir", conf.file=".batchtools.conf.R")
reduceResults(rbind)
batchtools
sessionInfo()
## R version 4.4.0 (2024-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Debian GNU/Linux 11 (bullseye)
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/Los_Angeles
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.35 R6_2.5.1 fastmap_1.1.1 xfun_0.43 cachem_1.0.8 knitr_1.46 htmltools_0.5.8.1 rmarkdown_2.26 lifecycle_1.0.4 cli_3.6.2 sass_0.4.9 jquerylib_0.1.4
## [13] compiler_4.4.0 tools_4.4.0 evaluate_0.23 bslib_0.7.0 yaml_2.3.8 rlang_1.1.3 jsonlite_1.8.8
Bischl, Bernd, Michel Lang, Olaf Mersmann, Jörg Rahnenführer, and Claus Weihs. 2015. “BatchJobs and BatchExperiments: Abstraction Mechanisms for Using R in Batch Environments.” Journal of Statistical Software. http://www.jstatsoft.org/v64/i11/.