VAR-Seq Workflow

  • Read preprocessing
    • Quality filtering (trimming)
    • FASTQ quality report
  • Alignments
  • Alignment statistics
  • Variant calling
  • Variant filtering
  • Variant annotation
  • Combine results from many samples
  • Summary statistics of samples

Challenge Project: Identification coding variants affecting conserved protein residues

  • Run workflow from start to finish (steps 1-8) on data set from Lu et al (2012)
  • Challenge project tasks
    • Map all coding variants to one or both of the following protein features:
      • Pfam domains
      • Prosite motifs
      • Rank variants mapping to above protein features by the degree of conservation of AA residues

References

  • Lu P, Han X, Qi J, Yang J, Wijeratne AJ, Li T, Ma H (2012) Analysis of Arabidopsis genome-wide variations before and after meiosis and meiotic recombination by resequencing Landsberg erecta and all four products of a single meiosis. Genome Res 22: 508–518 PubMed
  • DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43: 491–498 PubMed
  • Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day INM, Gaunt TR, Campbell C (2015) An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics 31: 1536–1543 PubMed