A. Choice of Sequence Type
- Task 1: Which sequence type - amino acid or nucleotide - is more appropriate to search databases for remotely related sequences? Provide at least three reasons for your decision.
B. Dynamic Programming for Pairwise Alignments
- Task 2: Create manually (or write an R script for it) one global and one local alignment for the following two protein sequences using the Needleman-Wusch and Smith-Waterman algorithms, respectively:
O15528: PFGFGKRSCMGRRLA
P98187: FIPFSAGPRNCIGQK
Use in each case BLOSUM50 as substitution matrix and 8 as gap opening and extension penalties. Note, here is some R code to create the initial matrix programmatically for upload to a spreadsheet program. Alternatively, solve the entire homework by writing an R script. Your answers should contain the following components:
- Manually populated dynamic programming matrices
- The optimal pairwise alignments created by traceback
- The final scores of the alignments
C. Alignments with Different Substitution Matrices
- Task 1: Load the
Biostrings
package in R, import the following two cytochrome P450 sequencesO15528
andP98187
from NCBI (save asmyseq.fasta
), and create a global alignment with thepairwiseAlignment
function fromBiostrings
as follows:
library(Biostrings)
myseq <- readAAStringSet("myseq.fasta", "fasta")
(p <- pairwiseAlignment(myseq[[1]], myseq[[2]], type="global", substitutionMatrix="BLOSUM50"))
writePairwiseAlignments(p)
Your answers should address the following items:
- Record the scores for the scoring matrices BLOSUM50, BLOSUM62 and BLOSUM80.
- How and why do the scores differ for the three scoring matrices?
Homework submission
Assemble the results from this homework in one PDF file (HW4.pdf
) and upload it to your private GitHub repository under Homework/HW4/HW4.pdf
.
Due date
This homework is due in two weeks on Tue, April 24th at 6:00 PM.