Motif Enrichment Analysis (MEA)

2 minute read



ChIP-Seq Workflow

  1. Read quality assessment, filtering and trimming
  2. Align reads to reference genome
  3. Compute read coverage across genome
  4. Peak calling with different methods and consensus peak identification
  5. Annotate peaks
  6. Differential binding analysis
  7. Gene set enrichment analysis
  8. Motif prediction to identify putative TF binding sites

Challenge Projects

1. Motif enrichment

  • Run workflow from start to finish (steps 1-8) on ChIP-Seq data set from Kaufman et al. (2010)
  • Challenge project tasks
    • Prioritize/rank peaks by FDR from differential binding analysis
    • Parse peak sequences from genome
    • Determine which motifs in the Jaspar database (motifDB) show the highest enrichment in the peak sequences. The motif enrichment tests can be performed with the PWMEnrich package. Basic starter code for accomplishing these tasks is provided here. The motif mapping can be performed with matchPWM or motifmatcher, and motif identification in databases can be performed with MotIV.
    • To have distinct challenge project aspects for each of the two students in this project, one could use different peak ranking approaches, e.g. one ranks by FDR of differential binding analysis, and the other by coverage or p-values of peak caller.

2. Motif discovery

  • Run workflow from start to finish (steps 1-8) on ChIP-Seq data set from Kaufman et al. (2010)
  • Challenge project tasks
    • Use peaks discovered in workflow (step 1-7 above) for motif discovery
    • Run discovery with at least two motif discovery tools (MEMEchip and BCRANK)
    • Identify motifs that are identified by at least two discovery tools
    • Identify motifs that are most similar to those reported by Kaufman et al. (2020) paper
    • Optional: compare with known motifs in Jasper database

References

  • Frith, Martin C., Yutao Fu, Liqun Yu, Jiang‐fan Chen, Ulla Hansen, and Zhiping Weng. 2004. “Detection of Functional DNA Motifs via Statistical Over‐representation.” Nucleic Acids Research 32 (4): 1372–81. PubMed
  • Kaufmann, K, F Wellmer, J M Muiño, T Ferrier, S E Wuest, V Kumar, A Serrano-Mislata, et al. 2010. “Orchestration of Floral Initiation by APETALA1.” Science 328 (5974): 85–89. PubMed
  • Machanick P, Bailey TL (2011) MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27: 1696–1697. PubMed
  • McLeay, Robert C, and Timothy L Bailey. 2010. “Motif Enrichment Analysis: A Unified Framework and an Evaluation on ChIP Data.” BMC Bioinformatics 11: 165. PubMed
  • Tompa, M, N Li, T L Bailey, G M Church, B De Moor, E Eskin, A V Favorov, et al. 2005. “Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites.” Nature Biotechnology 23 (1): 137–44. PubMed
  • Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33: 831–838. PubMed
Last modified 2023-06-06: some edits (40db28caa)