Load workflow environment with sample data into your current working directory. The sample data are described here.
In the workflow environments generated by
genWorkenvir all data inputs are stored in
data/ directory and all analysis results will be written to a separate
results/ directory, while the
systemPipeChIPseq.Rmd script and the
targets file are expected to be located in
the parent directory. The R session is expected to run from this parent
directory. Additional parameter files are stored under
To work with real data, users want to organize their own data similarly
and substitute all test data for their own data. To rerun an established
workflow on new data, the initial
targets file along with the corresponding
FASTQ files are usually the only inputs the user needs to provide.
library(systemPipeRdata) genWorkenvir(workflow="chipseq") setwd("chipseq")
Alternatively, this can be done from the command-line as follows:
$ Rscript -e "systemPipeRdata::genWorkenvir(workflow='chipseq')" $ cd chipseq
Now download the latest
systemPipeChIPseq.Rmd script for this course. From
within R this can be done as follows.
Or from the command-line one can do this with
$ wget -O systemPipeChIPseq.Rmd https://raw.githubusercontent.com/tgirke/GEN242/gh-pages/_vignettes/12_ChIPseqWorkflow/systemPipeChIPseq.Rmd
Now log in to a computer node on the HPCC/biocluser. The following command sequence will connect the user from the command-line to a computer node on the cluster.
$ srun --x11 --partition=short --mem=2gb --cpus-per-task 1 --ntasks 1 --time 2:00:00 --pty bash -l
Load desired R version from module system (here R-3.4.2).
$ module load R/3.4.2
Now open the R markdown script
systemPipeChIPseq.Rmdin your R IDE (e.g. nvim-r or RStudio) and
run the workflow as outlined below.
Note, Tmux sessions should always run on one of the headnodes and never on the computer nodes themsleves. This is important since Tmux sessions are persistent meaning they don’t close automatically when a computer job finishes. Thus, they are not controlled by the queueing system.
To check the environment of R session, one can execute the following commands from R. The first line returns the node name of the R session.
system("hostname") # should return name of a compute node starting with i or c getwd() # checks current working directory of R session dir() # returns content of current working directory
Required packages and resources
systemPipeR package needs to be loaded to perform the analysis steps shown in
this report (H Backman et al., 2016).
If applicable users can load custom functions not provided by
this step if this is not the case.