HW5 - Programming in R
3 minute read
A. Reverse and complement of DNA
Task 1: Write a RevComp
function that returns the reverse and complement of a DNA sequence string. Include an argument that will allow to return only (i) the reversed sequence, (ii) the complemented sequence, or (iii) the reversed and complemented sequence. The following R functions will be useful for the implementation:
Generate a short test DNA sequence
x <- c("ATGCATTGGACGTTAG")
x
## [1] "ATGCATTGGACGTTAG"
Vectorize sequence
x <- substring(x, 1:nchar(x), 1:nchar(x))
x
## [1] "A" "T" "G" "C" "A" "T" "T" "G" "G" "A" "C" "G" "T" "T" "A" "G"
Reverse sequence
x <- rev(x)
x
## [1] "G" "A" "T" "T" "G" "C" "A" "G" "G" "T" "T" "A" "C" "G" "T" "A"
Collapse sequence back to character string
x <- paste(x, collapse="")
x
## [1] "GATTGCAGGTTACGTA"
Form complement of sequence
chartr("ATGC", "TACG", x)
## [1] "CTAACGTCCAATGCAT"
Task 2: Write a function that applies the RevComp
function to many sequences stored in a vector. In addition, write an export function that saves the sequences
generated under Tasks 1 and 2 to a file in FASTA format.
B. Translate DNA into Protein
Task 3: Write a function that will translate one or many DNA sequences in all three reading frames into proteins. The following commands will simplify this task:
Import lookup table of genetic code
AAdf <- read.table(file="http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/AA.txt", header=TRUE, sep="\t")
AAdf[1:4,]
## Codon AA_1 AA_3 AA_Full AntiCodon
## 1 TCA S Ser Serine TGA
## 2 TCG S Ser Serine CGA
## 3 TCC S Ser Serine GGA
## 4 TCT S Ser Serine AGA
Generated named vector of relevant components
AAv <- as.character(AAdf[,2])
names(AAv) <- AAdf[,1]
AAv
## TCA TCG TCC TCT TTT TTC TTA TTG TAT TAC TAA TAG TGT TGC TGA TGG CTA CTG CTC CTT CCA CCG CCC CCT CAT
## "S" "S" "S" "S" "F" "F" "L" "L" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" "L" "L" "L" "P" "P" "P" "P" "H"
## CAC CAA CAG CGA CGG CGC CGT ATT ATC ATA ATG ACA ACG ACC ACT AAT AAC AAA AAG AGT AGC AGA AGG GTA GTG
## "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "I" "M" "T" "T" "T" "T" "N" "N" "K" "K" "S" "S" "R" "R" "V" "V"
## GTC GTT GCA GCG GCC GCT GAT GAC GAA GAG GGA GGG GGC GGT
## "V" "V" "A" "A" "A" "A" "D" "D" "E" "E" "G" "G" "G" "G"
Tripletize sequence and translate by name subsetting/sorting of AAv
y <- gsub("(...)", "\\1_", x)
y <- unlist(strsplit(y, "_"))
y <- y[grep("^...$", y)]
AAv[y]
## GAT TGC AGG TTA CGT
## "D" "C" "R" "L" "R"
Homework submission
Submit the 3 functions in one well structured and annotated R script to your
private GitHub repository under Homework/HW5/HW5.R
. The script should include
instructions on how to use the functions.
Due date
This homework is due on Fri, April 26th at 6:00 PM.
Homework Solutions
See here.