GEN242: Data Analysis in Genome Biology
2026-04-30
Topics covered in this slide show:
bigdata directoriesbatchtools — cluster-aware job management from RNote
Full tutorials: Linux/HPC Tutorial · Parallel R Tutorial
The core problem on remote systems: when your SSH connection drops, any running process in that terminal — your R session, a running script, an interactive job — is killed immediately.
Tmux solves this by running your terminal session inside a persistent server process on the remote machine. The session keeps running after you disconnect and you can reattach to it from any location, on any computer.
nvim-R it replicates the RStudio “script + console” workflow entirely in the terminalnvim-R-Tmux in action
Install nvim-R-Tmux once in your account:
git clone https://github.com/tgirke/nvim-R-Tmux.git
cd nvim-R-Tmux
module load neovim/0.11.4 tmux R && bash install_nvim_r_tmux.sh
# Log out and back in to activateStart or reattach to a session:
tmux a # reattach to existing session (or start new default layout)
tmux new -s mywork # start a new named session
tmux ls # list all active sessionsWarning
Always start tmux from a head node (skylark or bluejay), not a compute node. Tmux sessions can only be reattached from the same head node where they were started — note which one you are on.
Step 1 — Start or reattach to a tmux session (from the head node)
Switch between the five default windows with Ctrl-a 1 through Ctrl-a 5.
Step 2 — Log in to a compute node with srun (from inside tmux)
srun --partition=gen242 --account=gen242 --mem=2gb \
--cpus-per-task 4 --ntasks 1 --time 1:00:00 --pty bash -lStep 3 — Open your R script in nvim and start the R console
Inside nvim: press \rf to open a connected R session in a split pane.
Step 4 — Send code to R
| Action | Key |
|---|---|
| Send current line | Enter (normal mode) |
| Send visual selection | Enter (visual mode — press v to start) |
| Send entire code chunk (Rmd/qmd) | \cc |
| Start R console | \rf |
| Quit R | \rq |
| Key | Action |
|---|---|
\rf |
open connected R session |
Enter |
send line/selection to R |
\cc |
send code chunk |
Ctrl-w w |
switch between nvim and R pane |
gz |
maximize current viewport |
Ctrl-w = |
equalize split sizes |
Ctrl-w H / K |
toggle horizontal/vertical split |
Ctrl-Space |
omni-completion for R objects and functions |
:Rhelp fct_name |
open R help from nvim command mode |
Prefix key: Ctrl-a — hold Ctrl, press a, release both, then press the next key.
| Key | Action |
|---|---|
Ctrl-a \| |
split pane vertically |
Ctrl-a - |
split pane horizontally |
Ctrl-a + arrow |
move cursor between panes |
Alt + arrow |
resize pane (no prefix needed) |
Ctrl-a z |
zoom/unzoom active pane (maximize) |
Ctrl-a o |
rotate pane arrangement |
Ctrl-a x |
close current pane |
Ctrl-a m |
toggle mouse support on/off |
| Key | Action |
|---|---|
Ctrl-a c |
create new window |
Ctrl-a n / Ctrl-a p |
next / previous window |
Ctrl-a 1…5 |
jump to window by number |
Ctrl-a , |
rename current window |
| Key / Command | Action |
|---|---|
Ctrl-a d |
detach — session keeps running in background |
Ctrl-a s |
switch between sessions |
tmux a |
reattach to existing session |
tmux a -t NAME |
reattach to named session |
tmux ls |
list active sessions |
Ctrl-a : kill-session |
kill current session |
Ctrl-a r |
reload tmux config |
Tip
Mouse support is enabled by default. Use Ctrl-a m to toggle it off when you need to select text for terminal copy/paste. On most terminals, Shift+click selects text even when mouse support is active.
The HPCC cluster has over 2,000 software tools installed, including multiple versions of the same tool. A module system manages these so that users can load exactly the version they need without conflicts.
module load itsupport@hpcc.ucr.edumodule avail # list all available modules
module avail R # list all modules starting with "R"
module load R # load the default version of R
module load R/4.5.2 # load a specific R version
module list # show currently loaded modules
module unload R # unload R
module unload R/4.5.0 # unload a specific versionEach HPCC user account includes only 20 GB of home directory space. For research data, much larger storage is available via the bigdata filesystem.
| Path | Purpose |
|---|---|
~/ (home) |
scripts, config files, small outputs — 20 GB limit |
/bigdata/labname/username |
your personal large data |
/bigdata/labname/shared |
shared space within your lab group |
For GEN242 users, labname = gen242:
ls /bigdata/gen242/ # list course bigdata directory
ls /bigdata/gen242/shared/ # shared data for the courseCheck your quota on the HPCC Cluster Dashboard or from the command line:
Warning
All members of a lab group share the same bigdata quota. Coordinate with your group before storing very large datasets. Always clean up intermediate files that are no longer needed.
Note
Additional project data details for GEN242 are on the Project Data page.
HPCC uses Slurm as its workload manager and job scheduler. All compute-intensive jobs must be submitted through Slurm — running heavy jobs directly on the head node is not permitted and will be killed.
| Mode | Command | Use case |
|---|---|---|
| Batch job | sbatch script.sh |
non-interactive, production runs |
| Interactive session | srun --pty bash -l |
testing, debugging, short tasks |
| Partition | Time limit | Notes |
|---|---|---|
gen242 |
varies | course partition — use for homework |
short |
2 hours | quick testing |
intel / batch |
longer | general compute |
highmem |
longer | large memory jobs |
gpu |
varies | GPU-accelerated jobs |
Slurm cluster overview
sbatchCreate a submission script script_name.sh:
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=1-00:15:00 # 1 day and 15 minutes
#SBATCH --mail-user=user@ucr.edu
#SBATCH --mail-type=ALL
#SBATCH --job-name="my_analysis"
#SBATCH --partition=gen242
#SBATCH --account=gen242
Rscript my_script.R # the R script to runSubmit it:
Output (STDOUT and STDERR) is written to slurm-<jobid>.out by default.
srunsrun --pty bash -l # minimal interactive session
# With specific resources:
srun --x11 --partition=gen242 --account=gen242 \
--mem=2gb --cpus-per-task 4 --ntasks 1 \
--time 1:00:00 --pty bash -lsqueue # all jobs in queue
squeue -u <username> # your jobs only
scontrol show job <JOBID> # detailed job info
jobMonitor # custom HPCC cluster activity viewscancel -i <JOBID> # cancel one job
scancel -u <username> # cancel all your jobs
scancel --name <myJobName> # cancel by job name
scontrol update jobid=<JOBID> TimeLimit=<NEW_TIME> # change walltimeR provides many options for parallel computation — from single-node multi-core parallelism to full cluster-scale job arrays.
| Package | Scope | Notes |
|---|---|---|
parallel |
multi-core (single node) | built into R base |
foreach + doParallel |
multi-core (single node) | simple foreach loops |
batchtools |
multi-node cluster | most comprehensive, Slurm-aware |
BiocParallel |
multi-core + cluster | Bioconductor-oriented |
crew + crew.cluster |
multi-node cluster | most comprehensive, Slurm-aware |
Full list: CRAN High Performance Computing Task View
sbatchThe simplest method: write an R script, submit it with a Slurm bash script (here script_name.sh).
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=1-00:15:00
#SBATCH --partition=gen242
#SBATCH --account=gen242
Rscript my_script.RLimitation: managing many jobs (e.g. 100s of parameter combinations) manually becomes error-prone. This is where batchtools excels.
batchtools?batchtools — Setup and Demobatchtools orchestrates cluster job arrays from within an R session. All job management — submission, monitoring, result collection — happens in R. Note, the R script for the following demo is here.
From within R on the cluster (after logging in and starting an R session):
Two required files: - slurm.tmpl — Slurm submission template (specifies partition, R version, resources) - .batchtools.conf.R — tells batchtools to use the Slurm template
library(RenvModule)
module("load", "slurm") # loads Slurm environment modules
library(batchtools)
# Define the function that will run on each compute node
myFct <- function(x) {
Sys.sleep(10) # pause 10s so you can see the job in the queue
result <- cbind(
iris[x, 1:4],
Node = system("hostname", intern=TRUE), # which node ran this?
Rversion = paste(R.Version()[6:7], collapse=".")
)
return(result)
}reg <- makeRegistry(file.dir="myregdir", conf.file=".batchtools.conf.R")
Njobs <- 1:4 # run 4 jobs (rows 1–4 of iris)
ids <- batchMap(fun=myFct, x=Njobs) # map function over job IDs
done <- submitJobs(ids, reg=reg,
resources=list(
partition = "gen242",
account = "gen242",
walltime = 120, # seconds
ntasks = 1,
ncpus = 1,
memory = 1024 # MB
))
waitForJobs() # block R until all jobs finishgetStatus() # summarize: submitted / running / done / error
showLog(Njobs[1]) # inspect log for job 1
# Retrieve results
loadResult(1) # single result
lapply(Njobs, loadResult) # all results as list
reduceResults(rbind) # combine all results into one data.frame
do.call("rbind", lapply(Njobs, loadResult)) # equivalentbatchtools — Registry Management and ConclusionsResults are stored as .rds files in the registry directory (myregdir). The registry persists across R sessions — you can close R, come back later, and reload results.
# Read result files directly
readRDS("myregdir/results/1.rds")
# Reload a registry into a new R session (e.g. after moving to local machine)
from_file <- loadRegistry("myregdir", conf.file=".batchtools.conf.R")
reduceResults(rbind)
# Clean up when done
clearRegistry() # clear registry object in R session
removeRegistry(wait=0, reg=reg) # delete registry directory from disk
# unlink("myregdir", recursive=TRUE) # same as abovebatchtools workflow summaryLogin node → R session → batchtools
↓
makeRegistry() # create job database
batchMap(fun, args) # define one job per argument value
submitJobs(resources) # submit all jobs to Slurm at once
waitForJobs() # wait for completion
getStatus() # inspect job status
reduceResults(rbind) # collect results into R
batchtools over plain sbatchTip
For Bioconductor workflows, BiocParallel provides similar functionality with native support for Bioconductor S4 objects. See BiocParallel vignette.
| Topic | Key commands / concepts |
|---|---|
| Tmux — sessions | tmux a reattach · Ctrl-a d detach · tmux ls list |
| Tmux — panes | Ctrl-a \| split · Ctrl-a + arrow move · Ctrl-a z zoom |
| Tmux — windows | Ctrl-a c new · Ctrl-a 1…5 jump |
| nvim-R | \rf start R · Enter send line · \cc send chunk |
| Module system | module avail R · module load R/4.4.0 · module list |
| Big data | /bigdata/gen242/<username> · monitor at dashboard.hpcc.ucr.edu |
| Slurm — submit | sbatch script.sh · srun --pty bash -l |
| Slurm — monitor | squeue -u <user> · scontrol show job <ID> · jobMonitor |
| Slurm — cancel | scancel -i <ID> · scancel -u <user> |
| batchtools | makeRegistry() → batchMap() → submitJobs() → reduceResults() |
GEN242 · UC Riverside · Linux/HPC Tutorial · Parallel R Tutorial