Load workflow environment with sample data into your current working directory. The sample data are described here.
In the workflow environments generated by genWorkenvir
all data inputs are stored in
a data/
directory and all analysis results will be written to a separate
results/
directory, while the systemPipeVARseq.Rmd
script and the targets
file are expected to be located in
the parent directory. The R session is expected to run from this parent
directory. Additional parameter files are stored under param/
.
To work with real data, users want to organize their own data similarly
and substitute all test data for their own data. To rerun an established
workflow on new data, the initial targets
file along with the corresponding
FASTQ files are usually the only inputs the user needs to provide.
library(systemPipeRdata)
genWorkenvir(workflow="varseq")
setwd("varseq")
Alternatively, this can be done from the command-line as follows:
$ Rscript -e "systemPipeRdata::genWorkenvir(workflow='varseq')"
$ cd varseq
Now download the latest systemPipeVARseq.Rmd
script for this course. From
within R this can be done as follows.
download.file("https://raw.githubusercontent.com/tgirke/GEN242/gh-pages/_vignettes/13_VARseqWorkflow/systemPipeVARseq.Rmd", "systemPipeVARseq.Rmd")
Or from the command-line one can do this with wget
.
$ wget -O systemPipeVARseq.Rmd https://raw.githubusercontent.com/tgirke/GEN242/gh-pages/_vignettes/13_VARseqWorkflow/systemPipeVARseq.Rmd
Now log in to a computer node on the HPCC/biocluser. The following command sequence will connect the user from the command-line to a computer node on the cluster.
$ srun --x11 --partition=short --mem=2gb --cpus-per-task 1 --ntasks 1 --time 2:00:00 --pty bash -l
Load desired R version from module system (here R-3.4.2).
$ module load R/3.4.2
Now open the R markdown script systemPipeVARseq.Rmd
in your R IDE (e.g. nvim-r or RStudio) and
run the workflow as outlined below.
Note, Tmux sessions should always run on one of the headnodes and never on the computer nodes themsleves. This is important since Tmux sessions are persistent meaning they don’t close automatically when a computer job finishes. Thus, they are not controlled by the queueing system.
To check the environment of R session, one can execute the following commands from R. The first line returns the node name of the R session.
system("hostname") # should return name of a compute node starting with i or c
getwd() # checks current working directory of R session
dir() # returns content of current working directory
Required packages and resources
The systemPipeR
package needs to be loaded to perform the analysis steps shown in
this report (H Backman et al., 2016).
library(systemPipeR)
If applicable users can load custom functions not provided by systemPipeR
. Skip
this step if this is not the case.
source("systemPipeVARseq_Fct.R")

