x <- "ATGCATTGGACGTTAG"
## Step 1: vectorize the string into individual characters
x <- substring(x, 1:nchar(x), 1:nchar(x))
x
## [1] "A" "T" "G" "C" "A" "T" "T" "G" "G" "A" "C" "G" "T" "T" "A" "G"
## Step 2: reverse the character vector
x <- rev(x)
x
## [1] "G" "A" "T" "T" "G" "C" "A" "G" "G" "T" "T" "A" "C" "G" "T" "A"
## Step 3: collapse back into a single string
x <- paste(x, collapse = "")
x
## [1] "GATTGCAGGTTACGTA"
## Step 4: complement using base substitution (A↔T, G↔C)
chartr("ATGC", "TACG", x)
## [1] "CTAACGTCCAATGCAT"HW04: Programming in R
Overview
This homework involves writing three R functions for transforming and translating DNA sequences. All functions should be implemented in a single well-structured and annotated R script submitted to your private GitHub repository. Read the instructions for each task carefully. The expected function names, arguments, and output formats are detailed below and will be used for grading.
A. Reverse and Complement of DNA
Task 1 — Write the RevComp function
Write a function named exactly RevComp that accepts a single DNA sequence string and returns a tranformed version of it. The function must include an argument named type that controls which transformation is applied:
type value |
Transformation |
|---|---|
"rev" |
Return only the reversed sequence |
"comp" |
Return only the complemented sequence |
"revcomp" |
Return the reverse complement (reversed then complemented) |
The default value of type should be "revcomp".
Required function signature:
Expected results for test sequence "ATGCATTGGACGTTAG":
x <- "ATGCATTGGACGTTAG"
RevComp(x, type = "rev") # → "GATTGCAGGTTACGTA"
RevComp(x, type = "comp") # → "TACGTAACCTGCAATC"
RevComp(x, type = "revcomp") # → "CTAACGTCCAATGCAT"
RevComp(x) # → "CTAACGTCCAATGCAT" (default)Useful R building blocks for your implementation:
Task 2 — Vectorize RevComp and export to FASTA
Write two additional functions:
(a) A function named RevCompVector that applies RevComp to a vector of DNA sequences (multiple sequences at once). It should accept the same type argument and pass it through to RevComp.
Required function signature:
Expected behavior:
seqs <- c("ATGCATTGGACGTTAG", "TTGGCAATCGA", "GCTAGCTA")
RevCompVector(seqs, type = "rev")
## [1] "GATTGCAGGTTACGTA" "AGCTAACGGTT" "ATCGATCG"(b) A function named WriteFasta that saves a named vector of DNA sequences to a file in standard FASTA format. Each sequence should be preceded by a header line starting with > followed by the sequence name.
Required function signature:
Expected output format in the saved file:
>seq1
ATGCATTGGACGTTAG
>seq2
TTGGCAATCGA
>seq3
GCTAGCTA
Example usage:
seqs <- c(seq1 = "ATGCATTGGACGTTAG",
seq2 = "TTGGCAATCGA",
seq3 = "GCTAGCTA")
WriteFasta(seqs, file = "myseqs.fasta")If the input vector has no names, the function should assign default names seq1, seq2, … automatically.
B. Translate DNA into Protein
Task 3 — Write a DNA translation function
Write a function named TranslateDNA that translates one or more DNA sequences into protein sequences using the standard genetic code. The function should translate in all three reading frames (frame 1, 2, and 3) and return all translations.
Required function signature:
Where x can be a single sequence string or a vector of sequences.
Return value: a named list where each element contains the three reading frame translations for one input sequence, labeled frame1, frame2, frame3. Stop codons should be represented as *.
Expected output for a single sequence:
TranslateDNA("ATGCATTGGACGTTAG")
## $frame1
## [1] "MHWT*"
## $frame2
## [1] "CIGR" (or similar, depending on frame shift)
## $frame3
## [1] "ALDS" (or similar, depending on frame shift)Useful R building blocks for your implementation:
## Import the genetic code lookup table
AAdf <- read.table(
file = "http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/AA.txt",
header = TRUE, sep = "\t"
)
AAdf[1:4, ]
## Codon AA_1 AA_3 AA_Full AntiCodon
## 1 TCA S Ser Serine TGA
## 2 TCG S Ser Serine CGA
## 3 TCC S Ser Serine GGA
## 4 TCT S Ser Serine AGA
## Create named vector: codon → single-letter amino acid
AAv <- as.character(AAdf[, 2])
names(AAv) <- AAdf[, 1]
## Tripletize and translate (for a sequence x already split into characters)
y <- gsub("(...)", "\\1_", x) # insert _ after every 3 chars
y <- unlist(strsplit(y, "_")) # split on _
y <- y[grep("^...$", y)] # keep only complete triplets
AAv[y] # look up amino acids by codon name
## GAT TGC AGG TTA CGT
## "D" "C" "R" "L" "R"To translate in frame 2, skip the first character of the sequence before tripletizing. To translate in frame 3, skip the first two characters. Use substring(seq, start) to shift the reading frame.
Homework Submission
Submit one R script named HW4.R to your private GitHub homework repository at the following exact path:
Homework/HW4/HW4.R
Optionally, students who wish to demonstrate their functions in a rendered document can additionally submit a HW4.qmd file in the same directory. The .qmd file should source the HW4.R script and execute the functions from it, for example:
Note that the .qmd file is an optional add-on and will not be used for grading. Grading is performed exclusively on HW4.R. The HW4.R file must still be fully self-contained and structured as specified above regardless of whether a .qmd file is also submitted.
Requirements for full credit
Your submitted script must:
Define all four functions with the exact names specified:
RevComp(x, type = "revcomp")RevCompVector(x, type = "revcomp")WriteFasta(seqs, file)TranslateDNA(x)
Include a usage example for each function as commented-out code in a
## Usagesection at the end of each task.Include brief comments explaining what each function does and what its arguments mean
Be runnable without errors — source the script with
source("HW4.R")before submitting to check thisHandle the case where the input sequence vector has no names (auto-assign
seq1,seq2, … inWriteFasta)
Recommended script structure
## ─────────────────────────────────────────────────────────────
## HW4: DNA Sequence Manipulation and Translation
## GEN242, Spring 2026
## Author: <your name>
## Date: <submission date>
## ─────────────────────────────────────────────────────────────
## Task 1: RevComp function
RevComp <- function(x, type = "revcomp") {
## ... your implementation ...
}
## Task 2a: RevCompVector function
RevCompVector <- function(x, type = "revcomp") {
## ... your implementation ...
}
## Task 2b: WriteFasta function
WriteFasta <- function(seqs, file) {
## ... your implementation ...
}
## Task 3: TranslateDNA function
TranslateDNA <- function(x) {
## ... your implementation ...
}
## ── Usage examples ────────────────────────────────────────────
## Task 1
# RevComp("ATGCATTGGACGTTAG", type = "rev")
# RevComp("ATGCATTGGACGTTAG", type = "comp")
# RevComp("ATGCATTGGACGTTAG", type = "revcomp")
## Task 2
# seqs <- c(seq1 = "ATGCATTGGACGTTAG", seq2 = "TTGGCAATCGA")
# RevCompVector(seqs, type = "revcomp")
# WriteFasta(seqs, file = "myseqs.fasta")
## Task 3
# TranslateDNA("ATGCATTGGACGTTAG")
# TranslateDNA(c("ATGCATTGGACGTTAG", "TTGGCAATCGA"))Due Date
This homework is due Tuesday, April 28th at 6:00 PM.
Homework Solutions
To be posted after the due date.