Load workflow environment with sample data into your current working directory. The sample data are described here.

In the workflow environments generated by genWorkenvir all data inputs are stored in a data/ directory and all analysis results will be written to a separate results/ directory, while the systemPipeChIPseq.Rmd script and the targets file are expected to be located in the parent directory. The R session is expected to run from this parent directory. Additional parameter files are stored under param/.

To work with real data, users want to organize their own data similarly and substitute all test data for their own data. To rerun an established workflow on new data, the initial targets file along with the corresponding FASTQ files are usually the only inputs the user needs to provide.

library(systemPipeRdata)
genWorkenvir(workflow="chipseq")
setwd("chipseq")


Alternatively, this can be done from the command-line as follows:

$Rscript -e "systemPipeRdata::genWorkenvir(workflow='chipseq')"$ cd chipseq


Now download the latest systemPipeChIPseq.Rmd script for this course. From within R this can be done as follows.

download.file("https://raw.githubusercontent.com/tgirke/GEN242/gh-pages/_vignettes/12_ChIPseqWorkflow/systemPipeChIPseq.Rmd", "systemPipeChIPseq.Rmd")


Or from the command-line one can do this with wget.

$wget -O systemPipeChIPseq.Rmd https://raw.githubusercontent.com/tgirke/GEN242/gh-pages/_vignettes/12_ChIPseqWorkflow/systemPipeChIPseq.Rmd  Now log in to a computer node on the HPCC/biocluser. The following command sequence will connect the user from the command-line to a computer node on the cluster. $ srun --x11 --partition=short --mem=2gb --cpus-per-task 1 --ntasks 1 --time 2:00:00 --pty bash -l


Load desired R version from module system (here R-3.4.2).

\$ module load R/3.4.2


Now open the R markdown script systemPipeChIPseq.Rmdin your R IDE (e.g. nvim-r or RStudio) and run the workflow as outlined below.

Note, Tmux sessions should always run on one of the headnodes and never on the computer nodes themsleves. This is important since Tmux sessions are persistent meaning they don’t close automatically when a computer job finishes. Thus, they are not controlled by the queueing system.

To check the environment of R session, one can execute the following commands from R. The first line returns the node name of the R session.

system("hostname") # should return name of a compute node starting with i or c
getwd() # checks current working directory of R session
dir() # returns content of current working directory


Required packages and resources

The systemPipeR package needs to be loaded to perform the analysis steps shown in this report (H Backman et al., 2016).

library(systemPipeR)


If applicable users can load custom functions not provided by systemPipeR. Skip this step if this is not the case.

source("systemPipeChIPseq_Fct.R")


Previous Page                     Next Page