HW4: Pairwise Alignments
2 minute read
A. Choice of Sequence Type
- Task 1: Which sequence type - amino acid or nucleotide - is more appropriate to search databases for remotely related sequences? Provide at least three reasons for your decision.
B. Dynamic Programming for Pairwise Alignments
- Task 2: Create manually (or write an R script for it) one global and one local alignment for the following two protein sequences using the Needleman-Wusch and Smith-Waterman algorithms, respectively:
O15528: PFGFGKRSCMGRRLA
P98187: FIPFSAGPRNCIGQK
Use in each case BLOSUM50 as substitution matrix and 8 as gap extension penalty (no extra penalty for gap opening). Note, here is some helper code in R to create the initial matrix programmatically for upload to a spreadsheet program. Alternatively, solve the entire homework by writing an R script. Your answers should contain the following components:
- Manually populated dynamic programming matrices
- The optimal pairwise alignments created by traceback
- The final scores of the alignments
C. Alignments with Different Substitution Matrices
- Task 1: Load the
Biostrings
package in R, import the following two cytochrome P450 sequencesO15528
andP98187
from NCBI (save asmyseq.fasta
), and create a global alignment with thepairwiseAlignment
function fromBiostrings
as follows:
library(Biostrings)
myseq <- readAAStringSet("myseq.fasta", "fasta")
(p <- pairwiseAlignment(myseq[[1]], myseq[[2]], type="global", substitutionMatrix="BLOSUM50"))
writePairwiseAlignments(p)
Your answers should address the following:
- Record the scores for the scoring matrices BLOSUM50, BLOSUM62 and BLOSUM80.
- How and why do the scores differ for the three scoring matrices?
Homework submission
Assemble the results from this homework in one PDF file (HW4.pdf) and upload it to your private GitHub repository under Homework/HW4/HW4.pdf
.
Due date
This homework is due on Thu, April 27 at 6:00 PM (~10 days).
Homework Solutions
See here.
Last modified 2023-05-02: some edits (f1d1f2852)