A. Choice of Sequence Type
- Task 1: Which sequence type - amino acid or nucleotide - is more appropriate to search databases for remotely related sequences? Provide at least three reasons for your decision.
B. Dynamic Programming for Pairwise Alignments
- Task 2: Create manually (or write an R script for it) one global and one local alignment for the following two protein sequences using the Needleman-Wusch and Smith-Waterman algorithms, respectively:
Use in each case BLOSUM50 as substitution matrix and 8 as gap opening and extension penalties. Note, here is some R code to create the initial matrix programmatically for upload to a spreadsheet program. Alternatively, solve the entire homework by writing an R script. Your answers should contain the following components:
- Manually populated dynamic programming matrices
- The optimal pairwise alignments created by traceback
- The final scores of the alignments
C. Alignments with Different Substitution Matrices
- Task 1: Load the
Biostringspackage in R, import the following two cytochrome P450 sequences
P98187from NCBI (save as
myseq.fasta), and create a global alignment with the
Your answers should address the following items:
- Record the scores for the scoring matrices BLOSUM50, BLOSUM62 and BLOSUM80.
- How and why do the scores differ for the three scoring matrices?
Assemble the results from this homework in one PDF file (
HW4.pdf) and upload it to your private GitHub repository under
This homework is due in two weeks on Tue, April 24th at 6:00 PM.