HW5 - Programming in R

3 minute read



Source code downloads:   [ .R ]

A. Reverse and complement of DNA

Task 1: Write a RevComp function that returns the reverse and complement of a DNA sequence string. Include an argument that will allow to return only (i) the reversed sequence, (ii) the complemented sequence, or (iii) the reversed and complemented sequence. The following R functions will be useful for the implementation:

Generate a short test DNA sequence

x <- c("ATGCATTGGACGTTAG")  
x
## [1] "ATGCATTGGACGTTAG"

Vectorize sequence

x <- substring(x, 1:nchar(x), 1:nchar(x)) 
x
##  [1] "A" "T" "G" "C" "A" "T" "T" "G" "G" "A" "C" "G" "T" "T" "A" "G"

Reverse sequence

x <- rev(x) 
x
##  [1] "G" "A" "T" "T" "G" "C" "A" "G" "G" "T" "T" "A" "C" "G" "T" "A"

Collapse sequence back to character string

x <- paste(x, collapse="")
x
## [1] "GATTGCAGGTTACGTA"

Form complement of sequence

chartr("ATGC", "TACG", x) 
## [1] "CTAACGTCCAATGCAT"

Task 2: Write a function that applies the RevComp function to many sequences stored in a vector. In addition, write an export function that saves the sequences generated under Tasks 1 and 2 to a file in FASTA format.

B. Translate DNA into Protein

Task 3: Write a function that will translate one or many DNA sequences in all three reading frames into proteins. The following commands will simplify this task:

Import lookup table of genetic code

AAdf <- read.table(file="http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/AA.txt", header=TRUE, sep="\t") 
AAdf[1:4,]
##   Codon AA_1 AA_3 AA_Full AntiCodon
## 1   TCA    S  Ser  Serine       TGA
## 2   TCG    S  Ser  Serine       CGA
## 3   TCC    S  Ser  Serine       GGA
## 4   TCT    S  Ser  Serine       AGA

Generated named vector of relevant components

AAv <- as.character(AAdf[,2]) 
names(AAv) <- AAdf[,1] 
AAv
## TCA TCG TCC TCT TTT TTC TTA TTG TAT TAC TAA TAG TGT TGC TGA TGG CTA CTG CTC CTT CCA CCG CCC CCT CAT 
## "S" "S" "S" "S" "F" "F" "L" "L" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" "L" "L" "L" "P" "P" "P" "P" "H" 
## CAC CAA CAG CGA CGG CGC CGT ATT ATC ATA ATG ACA ACG ACC ACT AAT AAC AAA AAG AGT AGC AGA AGG GTA GTG 
## "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "I" "M" "T" "T" "T" "T" "N" "N" "K" "K" "S" "S" "R" "R" "V" "V" 
## GTC GTT GCA GCG GCC GCT GAT GAC GAA GAG GGA GGG GGC GGT 
## "V" "V" "A" "A" "A" "A" "D" "D" "E" "E" "G" "G" "G" "G"

Tripletize sequence and translate by name subsetting/sorting of AAv

y <- gsub("(...)", "\\1_", x) 
y <- unlist(strsplit(y, "_")) 
y <- y[grep("^...$", y)] 
AAv[y] 
## GAT TGC AGG TTA CGT 
## "D" "C" "R" "L" "R"

Homework submission

Submit the 3 functions in one well structured and annotated R script to your private GitHub repository under Homework/HW5/HW5.R. The script should include instructions on how to use the functions.

Due date

This homework is due on Thu, April 25th at 6:00 PM.

Homework Solutions

To be posted.

Last modified 2024-03-23: some edits (1178906e4)