# HW4: Pairwise Alignments

2 minute read

## A. Choice of Sequence Type

**Task 1**: Which sequence type - amino acid or nucleotide - is more appropriate to search databases for remotely related sequences? Provide at least three reasons for your decision.

## B. Dynamic Programming for Pairwise Alignments

**Task 2**: Create manually (or write an R script for it) one global and one local alignment for the following two protein sequences using the Needleman-Wusch and Smith-Waterman algorithms, respectively:

```
O15528: PFGFGKRSCMGRRLA
P98187: FIPFSAGPRNCIGQK
```

Use in each case BLOSUM50 as substitution matrix and 8 as gap extension penalty (no extra penalty for gap opening). Note, here is some helper code in R to create the initial matrix programmatically for upload to a spreadsheet program. Alternatively, solve the entire homework by writing an R script. Your answers should contain the following components:

- Manually populated dynamic programming matrices
- The optimal pairwise alignments created by traceback
- The final scores of the alignments

## C. Alignments with Different Substitution Matrices

**Task 1**: Load the`Biostrings`

package in R, import the following two cytochrome P450 sequences`O15528`

and`P98187`

from NCBI (save as`myseq.fasta`

), and create a global alignment with the`pairwiseAlignment`

function from`Biostrings`

as follows:

```
library(Biostrings)
myseq <- readAAStringSet("myseq.fasta", "fasta")
(p <- pairwiseAlignment(myseq[[1]], myseq[[2]], type="global", substitutionMatrix="BLOSUM50"))
writePairwiseAlignments(p)
```

Your answers should address the following:

- Record the scores for the scoring matrices BLOSUM50, BLOSUM62 and BLOSUM80.
- How and why do the scores differ for the three scoring matrices?

## Homework submission

Assemble the results from this homework in one PDF file (HW4.pdf) and upload it to your private GitHub repository under `Homework/HW4/HW4.pdf`

.

## Due date

This homework is due in two weeks on Thu, April 21 at 6:00 PM.

## Homework Solutions

See here.

Last modified 2022-04-23: some edits (975199d54)