Read mapping with Bowtie2/Tophat2
The NGS reads of this project will be aligned against the reference genome
sequence using Bowtie2/TopHat2
(Kim et al., 2013; Langmead et al., 2012). The
parameter settings of the aligner are defined in the tophat.param
file.
args <- systemArgs(sysma="param/tophat.param", mytargets="targets_trim.txt")
sysargs(args)[1] # Command-line parameters for first FASTQ file
Submission of alignment jobs to compute cluster, here using 72 CPU cores (18 qsub
processes each with 4 CPU cores).
moduleload(modules(args))
system("bowtie2-build ./data/tair10.fasta ./data/tair10.fasta")
resources <- list(walltime="20:00:00", nodes=paste0("1:ppn=", cores(args)), memory="10gb")
reg <- clusterRun(args, conffile=".BatchJobs.R", template="torque.tmpl", Njobs=18, runid="01",
resourceList=resources)
waitForJobs(reg)
Alignment with Rsubread
The following example shows how one can use within the systemPipeR
environment the R-based
aligner Rsubread
or other R-based functions that read from input files and write to output files.
library(Rsubread)
args <- systemArgs(sysma="param/rsubread.param", mytargets="targets.txt")
buildindex(basename=reference(args), reference=reference(args)) # Build indexed reference genome
align(index=reference(args), readfile1=infile1(args), input_format="FASTQ",
output_file=outfile1(args), output_format="SAM", nthreads=2)
for(i in seq(along=outfile1(args))) asBam(file=outfile1(args)[i], destination=gsub(".sam", "", outfile1(args)[i]), overwrite=TRUE, indexDestination=TRUE)
unlink(outfile1(args)); unlink(paste0(outfile1(args),".indel"))
Read mapping with HISAT2
args <- systemArgs(sysma="param/hisat2.param", mytargets="targets.txt")
# args <- systemArgs(sysma="param/hisat2.param", mytargets="targets_trim.txt")
sysargs(args)[1] # Command-line parameters for first FASTQ file
moduleload(modules(args))
system("hisat2-build ./data/tair10.fasta ./data/tair10.fasta")
runCommandline(args=args)
Check whether all BAM files have been created
file.exists(outpaths(args))
Read and alignment stats
The following provides an overview of the number of reads in each sample and how many of them aligned to the reference.
args <- systemArgs(sysma="param/hisat2.param", mytargets="targets.txt")
read_statsDF <- alignStats(args=args)
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, quote=FALSE, sep="\t")
read.table(system.file("extdata", "alignStats.xls", package="systemPipeR"), header=TRUE)[1:4,]
## FileName Nreads2x Nalign Perc_Aligned Nalign_Primary Perc_Aligned_Primary
## 1 M1A 192918 177961 92.24697 177961 92.24697
## 2 M1B 197484 159378 80.70426 159378 80.70426
## 3 A1A 189870 176055 92.72397 176055 92.72397
## 4 A1B 188854 147768 78.24457 147768 78.24457
Create symbolic links for viewing BAM files in IGV
The symLink2bam
function creates symbolic links to view the BAM alignment files in a
genome browser such as IGV. The corresponding URLs are written to a file
with a path specified under urlfile
in the results
directory.
symLink2bam(sysargs=args, htmldir=c("~/.html/", "projects/tests/"),
urlbase="http://biocluster.ucr.edu/~tgirke/",
urlfile="./results/IGVurl.txt")