The function filterVars filters VCF files based on user definable quality parameters. It sequentially imports each VCF file into R, applies the filtering on an internally generated VRanges object and then writes the results to a new subsetted VCF file. The filter parameters are passed on to the corresponding argument as a character string. The function applies this filter to the internally generated VRanges object using the standard subsetting syntax for two dimensional objects such as: vr[filter, ]. The parameter files (filter_gatk.param, filter_sambcf.param and filter_vartools.param), used in the filtering steps, define the paths to the input and output VCF files which are stored in new SYSargs instances.

Filter variants called by GATK

The below example filters for variants that are supported by >=x reads and >=80% of them support the called variants. In addition, all variants need to pass >=x of the soft filters recorded in the VCF files generated by GATK. Since the toy data used for this workflow is very small, the chosen settings are unreasonabley relaxed. A more reasonable filter setting is given in the line below (here commented out).

library(BBmisc) # Defines suppressAll()
args <- systemArgs(sysma="param/filter_gatk.param", mytargets="targets_gatk.txt")[1:4]
filter <- "totalDepth(vr) >= 2 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))>=1"
# filter <- "totalDepth(vr) >= 20 & (altDepth(vr) / totalDepth(vr) >= 0.8) & rowSums(softFilterMatrix(vr))==6"
suppressAll(filterVars(args, filter, varcaller="gatk", organism="A. thaliana"))
writeTargetsout(x=args, file="targets_gatk_filtered.txt", overwrite=TRUE)

Filter variants called by BCFtools

The following shows how to filter the VCF files generated by BCFtools using similar parameter settings as in the previous filtering of the GATK results.

args <- systemArgs(sysma="param/filter_sambcf.param", mytargets="targets_sambcf.txt")[1:4]
filter <- "rowSums(vr) >= 2 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
# filter <- "rowSums(vr) >= 20 & (rowSums(vr[,3:4])/rowSums(vr[,1:4]) >= 0.8)"
suppressAll(filterVars(args, filter, varcaller="bcftools", organism="A. thaliana"))
writeTargetsout(x=args, file="targets_sambcf_filtered.txt", overwrite=TRUE)

Check filtering outcome for one sample

length(as(readVcf(infile1(args)[1], genome="Ath"), "VRanges")[,1])
length(as(readVcf(outpaths(args)[1], genome="Ath"), "VRanges")[,1])

