Introduction to R

GEN242: Data Analysis in Genome Biology

Thomas Girke

2026-04-10

Overview

Topics covered in this tutorial:

  • What is R and why use it?
  • R working environments (RStudio, Nvim-R-Tmux)
  • Installation of R, RStudio and packages
  • Navigating directories and basic syntax
  • Data types and data objects
  • Subsetting, utilities, and calculations
  • Reading and writing external data
  • Graphics in R (base graphics)
  • Analysis routine: data import, merging, filtering, plotting

Note

Homework: HW02 tasks are linked throughout these slides at the relevant sections.
All tasks are assembled into a single R script HW2.R submitted via GitHub.

What is R?

R is a powerful statistical environment and programming language for data analysis and visualization, widely used in bioinformatics and data science.

Why use R?

  • Complete statistical environment and programming language
  • Efficient functions and data structures for data analysis
  • Powerful, publication-quality graphics
  • Access to a fast-growing number of analysis packages
  • One of the most widely used languages in bioinformatics
  • Standard for data mining and biostatistical analysis
  • Free, open-source, available for all operating systems

Key package repositories

Repository Packages Focus
CRAN >14,000 General data analysis
Bioconductor >2,000 Bioscience data analysis
Omegahat >90 Programming interfaces

R Working Environments

Several IDEs support syntax highlighting and sending code to the R console:

RStudio / Posit

Key shortcuts in RStudio:

Shortcut Action
Ctrl+Enter Send code to R console
Ctrl+Shift+C Comment / uncomment
Ctrl+1 / Ctrl+2 Switch between editor and console

Nvim-R-Tmux

Terminal-based environment combining Neovim + R + Tmux. Ideal for working on the HPCC cluster.

Other editors

Emacs (ESS), VS Code, gedit, Notepad++, Eclipse — all support R to varying degrees.

Installation of R and Packages

Install R and RStudio

  1. Install R from CRAN
  2. Install RStudio from posit.co

Install CRAN packages

install.packages(c("pkg1", "pkg2"))
install.packages("pkg.zip", repos=NULL)   # install from local file

Install Bioconductor packages

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")        # install BiocManager if not available
BiocManager::version()                     # check Bioconductor version
BiocManager::install(c("pkg1", "pkg2"))   # install Bioc packages

Load packages

library("my_library")                                          # single package
lapply(c("lib1", "lib2"), require, character.only=TRUE)       # multiple packages

Explore a package

library(help="my_library")    # list functions
vignette("my_library")        # open manual (PDF or HTML)

Tip

For detailed Bioconductor install instructions see the Bioc Install page and the BiocManager vignette.

Working Routine for Tutorials

When working in R, a good practice is to write all commands directly into an R script, instead of the R console, and then send the commands for execution to the R console with the Ctrl+Enter shortcut in RStudio/Posit, or similar shortcuts in other R coding environments, such as Nvim-R. This way all work is preserved and can be reused in the future.

The following instructions in this section provide a short overview of the standard working routine users should use to load R-based tutorials into their R IDE.

Step 1. Download *.qmd, *.Rmd or *.R file. These so called source files are always linked on the top right corner of each tutorial or slide show. From within R the file download can be accomplished via download.file (see below), wget from the command-line or with the save function in a user’s web browser. The following downloads the Rmd file of this tutorial via download.file from the R console.

download.file("https://raw.githubusercontent.com/tgirke/GEN242/main/slides/rbasics/rbasics_slides.qmd", "rbasics.qmd") 

Step 2. Load *.qmd, *.Rmd or *.R file in Nvim-R or RStudio.

Step 3. Send code from code editor to R console by pushing Ctrl + Enter in RStudio or Enter in Nvim-R. In *.Rmd files the code lines are in so called code chunks and only those ones can be sent to the console. To obtain in Neovim a connected R session one has to initiate by pressing the \rf key combination. For details see here.

Getting Around

Starting and closing R

q()                    # quit R
# Save workspace image? [y/n/c]:

Warning

Answer n when asked to save the workspace. Saving .RData creates large files. Better practice: save your analysis as an R script and re-run it to restore your session.

ls()                              # list objects in current R session
dir()                             # list files in current working directory
getwd()                           # print path of current working directory
setwd("/home/user")               # change working directory

File information

list.files(path="./", pattern="*.txt$", full.names=TRUE)   # list files by pattern
file.exists(c("file1", "file2"))                            # check if files exist
file.info(list.files(path="./", pattern=".txt$", full.names=TRUE))  # file details

Basic Syntax

Assignment and general syntax

object <- ...                          # assignment operator (preferred over =)
object <- function_name(arguments)     # call a function
object <- object[arguments]            # subset an object
assign("x", function(arguments))       # alternative: assign()

Pipes

The %>% pipe from dplyr/magrittr chains operations left-to-right. New native R pipe is |>.

x %>% f(y)    # equivalent to f(x, y)

Makes code readable by avoiding deeply nested calls. Details in the dplyr tutorial.

Getting help

?function_name       # open help page for a function

Run scripts

Preferred version

Rscript my_script.R        # execute from command-line (preferred)

Older alternatives

source("my_script.R")      # execute R script from within R
R CMD BATCH my_script.R    # alternative

Data Types

Numeric

x <- c(1, 2, 3)
x
[1] 1 2 3
is.numeric(x)
[1] TRUE
as.character(x)    # convert to character
[1] "1" "2" "3"

Character

x <- c("1", "2", "3")
x
[1] "1" "2" "3"
is.character(x)
[1] TRUE
as.numeric(x)      # convert to numeric
[1] 1 2 3

Complex (mixed types — coerced to character)

c(1, "b", 3)       # numeric values coerced to character
[1] "1" "b" "3"

Logical

x <- 1:10 < 5
x                  # TRUE/FALSE vector
 [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
!x                 # negate
 [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
which(x)           # indices of TRUE values
[1] 1 2 3 4

Data Objects — Overview

Common object types

Type Dimensions Data types Example
vector 1D uniform c(1, 2, 3)
factor 1D grouping labels factor(c("a","b","a"))
matrix 2D uniform matrix(1:9, 3, 3)
data.frame 2D mixed data.frame(x=1:3, y=c("a","b","c"))
tibble 2D mixed modern data.frame
list any any list(name="Fred", age=30)
function code function(x) x^2

Naming rules

  • Object names should not start with a number
  • Avoid spaces and special characters like # in names

Vectors and Factors

Vectors (1D, uniform type)

myVec <- setNames(1:10, letters[1:10])   # named numeric vector
myVec[1:5]                                # subset by position
a b c d e 
1 2 3 4 5 
myVec[c(2,4,6,8)]                        # subset by multiple positions
b d f h 
2 4 6 8 
myVec[c("b", "d", "f")]                  # subset by name
b d f 
2 4 6 

Factors (1D, grouping information)

factor(c("dog", "cat", "mouse", "dog", "dog", "cat"))
[1] dog   cat   mouse dog   dog   cat  
Levels: cat dog mouse
# Levels: cat dog mouse

Factors encode categorical variables with defined levels — essential for statistical modeling.

Matrices and Data Frames

Matrices (2D, uniform type)

myMA <- matrix(1:30, 3, 10, byrow=TRUE)
class(myMA)
[1] "matrix" "array" 
myMA[1:2, ]                  # first two rows
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10
[2,]   11   12   13   14   15   16   17   18   19    20
myMA[1, , drop=FALSE]        # first row, keep matrix structure
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10
class(as.data.frame(myMA))   # convert to data.frame
[1] "data.frame"

Data Frames (2D, mixed types)

myDF <- data.frame(Col1=1:10, Col2=10:1)
myDF[1:2, ]
  Col1 Col2
1    1   10
2    2    9
class(as.matrix(myDF))       # convert to matrix
[1] "matrix" "array" 

Tibbles — modern data frames

library(tidyverse)
as_tibble(iris)              # nicer printing, same structure as data.frame
# A tibble: 150 × 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# ℹ 140 more rows

Tip

The iris dataset is built into R — no import needed. It is used throughout these examples.

Lists and Functions

Lists (containers for any object type)

myL <- list(name="Fred", wife="Mary", no.children=3, child.ages=c(4,7,9))
myL
$name
[1] "Fred"

$wife
[1] "Mary"

$no.children
[1] 3

$child.ages
[1] 4 7 9
myL[[4]][1:2]     # access fourth element, first two values
[1] 4 7

Lists are the most flexible R object — they can hold vectors, data frames, other lists, and functions all at once.

Functions (reusable pieces of code)

myfct <- function(arg1, arg2, ...) {
    function_body
}

Subsetting Data Objects

1. By position

myVec <- 1:26; names(myVec) <- LETTERS
myVec[1:4]          # first four elements
A B C D 
1 2 3 4 
myVec[-(1:4)]       # everything except first four
 E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z 
 5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

2. By logical vector

myLog <- myVec > 10
myVec[myLog]        # elements where condition is TRUE
 K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z 
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

3. By name

myVec[c("B", "K", "M")]
 B  K  M 
 2 11 13 

4. By $ sign (single column or list component)

iris$Species[1:8]
[1] setosa setosa setosa setosa setosa setosa setosa setosa
Levels: setosa versicolor virginica

Subsetting 2D objects

iris[1:4, ]                          # first 4 rows, all columns
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
iris[1:4, 1:2]                       # first 4 rows, first 2 columns
  Sepal.Length Sepal.Width
1          5.1         3.5
2          4.9         3.0
3          4.7         3.2
4          4.6         3.1
iris[iris$Species=="setosa", ]       # rows matching a condition
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
11          5.4         3.7          1.5         0.2  setosa
12          4.8         3.4          1.6         0.2  setosa
13          4.8         3.0          1.4         0.1  setosa
14          4.3         3.0          1.1         0.1  setosa
15          5.8         4.0          1.2         0.2  setosa
16          5.7         4.4          1.5         0.4  setosa
17          5.4         3.9          1.3         0.4  setosa
18          5.1         3.5          1.4         0.3  setosa
19          5.7         3.8          1.7         0.3  setosa
20          5.1         3.8          1.5         0.3  setosa
21          5.4         3.4          1.7         0.2  setosa
22          5.1         3.7          1.5         0.4  setosa
23          4.6         3.6          1.0         0.2  setosa
24          5.1         3.3          1.7         0.5  setosa
25          4.8         3.4          1.9         0.2  setosa
26          5.0         3.0          1.6         0.2  setosa
27          5.0         3.4          1.6         0.4  setosa
28          5.2         3.5          1.5         0.2  setosa
29          5.2         3.4          1.4         0.2  setosa
30          4.7         3.2          1.6         0.2  setosa
31          4.8         3.1          1.6         0.2  setosa
32          5.4         3.4          1.5         0.4  setosa
33          5.2         4.1          1.5         0.1  setosa
34          5.5         4.2          1.4         0.2  setosa
35          4.9         3.1          1.5         0.2  setosa
36          5.0         3.2          1.2         0.2  setosa
37          5.5         3.5          1.3         0.2  setosa
38          4.9         3.6          1.4         0.1  setosa
39          4.4         3.0          1.3         0.2  setosa
40          5.1         3.4          1.5         0.2  setosa
41          5.0         3.5          1.3         0.3  setosa
42          4.5         2.3          1.3         0.3  setosa
43          4.4         3.2          1.3         0.2  setosa
44          5.0         3.5          1.6         0.6  setosa
45          5.1         3.8          1.9         0.4  setosa
46          4.8         3.0          1.4         0.3  setosa
47          5.1         3.8          1.6         0.2  setosa
48          4.6         3.2          1.4         0.2  setosa
49          5.3         3.7          1.5         0.2  setosa
50          5.0         3.3          1.4         0.2  setosa

Important Utilities

Combining objects

c(1, 2, 3)
[1] 1 2 3
x <- 1:3; y <- 101:103
c(x, y)                   # concatenate vectors
[1]   1   2   3 101 102 103
ma <- cbind(x, y)         # bind as columns
rbind(ma, ma)             # bind as rows
     x   y
[1,] 1 101
[2,] 2 102
[3,] 3 103
[4,] 1 101
[5,] 2 102
[6,] 3 103

Dimensions and names

length(iris$Species)      # number of elements
[1] 150
dim(iris)                 # rows x columns
[1] 150   5
rownames(iris)[1:8]
[1] "1" "2" "3" "4" "5" "6" "7" "8"
colnames(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
names(myL)                # names of list components
[1] "name"        "wife"        "no.children" "child.ages" 

Sorting

sort(10:1)
sortindex <- order(iris[,1], decreasing=FALSE)
iris[sortindex, ][1:2, ]
iris[order(iris$Sepal.Length, iris$Sepal.Width), ][1:2, ]  # sort by multiple columns

Checking identity

myma <- iris[1:2,]
all(myma == iris[1:2,])       # all values equal?
[1] TRUE
identical(myma, iris[1:2,])   # strict identity?
[1] TRUE

Operators and Calculations

Comparison operators

1 == 1    # equal
[1] TRUE
1 != 2    # not equal
[1] TRUE
# also: <, >, <=, >=

Logical operators

x <- 1:10; y <- 10:1
x > y & x > 5    # AND
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
x > y | x > 5    # OR
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
!x                # NOT
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Basic calculations

x + y
 [1] 11 11 11 11 11 11 11 11 11 11
sum(x)
[1] 55
mean(x)
[1] 5.5
apply(iris[1:6, 1:3], 1, mean)    # row means (margin=1)
       1        2        3        4        5        6 
3.333333 3.100000 3.066667 3.066667 3.333333 3.666667 
apply(iris[1:6, 1:3], 2, mean)    # column means (margin=2)
Sepal.Length  Sepal.Width Petal.Length 
    4.950000     3.383333     1.450000 

Reading and Writing Data

Import tabular data

Widely used read.table and read.delim import functions

myDF <- read.delim("myData.tsv", sep="\t")           # tab-delimited file

Better alternative from readr package with better default arguments and performance. For details see here.

myTibble <- readr::read_tsv(myData.tsv") 

Import from Google Sheet directly

library(googlesheets4)
gs4_deauth()                                           # for public sheets
mysheet <- read_sheet("1U-32UcwZP1k3saKeaH1mbvEAOfZRdNHNkWK2GI1rpPM", skip=4)
myDF <- as.data.frame(mysheet)
library(readxl)
mysheet <- read_excel(targets_path, sheet="Sheet1")   # Excel files

Export tabular data

write.table(myDF, file="myfile.xls", sep="\t", quote=FALSE, col.names=NA)

Line-wise import/export

myDF <- readLines("myData.txt")           # import line by line
writeLines(month.name, "myData.txt")      # export line by line

Save and load R objects

mylist <- list(C1=iris[,1], C2=iris[,2])
saveRDS(mylist, "mylist.rds")             # save
mylist <- readRDS("mylist.rds")           # load

Note

HW02 — Task A: Sort iris by first column, subset first 12 rows, export to file, modify column names in a spreadsheet program, re-import with read.table.
→ HW02 instructions

Useful R Functions

Unique entries

length(iris$Sepal.Length)          # 150 total entries
[1] 150
length(unique(iris$Sepal.Length))  # number of unique values
[1] 35

Count occurrences

table(iris$Species)    # frequency table per group

    setosa versicolor  virginica 
        50         50         50 

Aggregate statistics

aggregate(iris[,1:4], by=list(iris$Species), FUN=mean, na.rm=TRUE)
     Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026

Set operations

month.name %in% c("May", "July")    # logical: which elements are in set
 [1] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

Merge data frames

frame1 <- iris[sample(1:nrow(iris), 30), ]
my_result <- merge(frame1, iris, by.x=0, by.y=0, all=TRUE)
# all=TRUE: outer join (keep all rows)
# all=FALSE: inner join (keep only common rows)

Graphics in R — Overview

Why R graphics?

  • Powerful environment for scientific visualization
  • Integrated with statistics infrastructure
  • Publication-quality, fully reproducible output
  • Supports LaTeX and Markdown via knitr

Four main graphics systems

System Level Package
Base R graphics Low + high built-in
grid Low-level built-in
lattice High-level lattice
ggplot2 High-level ggplot2

Key base graphics functions

plot, barplot, boxplot, hist, pie, pairs, image, heatmap

Tip

For new code, ggplot2 is generally recommended. Base R graphics remain useful for quick exploration and highly customized plots.

Scatter Plots

Sample dataset

set.seed(1410)
y <- matrix(runif(30), ncol=3, dimnames=list(letters[1:10], LETTERS[1:3]))

Basic scatter plot

plot(y[,1], y[,2])

All pairs

pairs(y)

With color and labels

plot(y[,1], y[,2], pch=20, col="red", main="Symbols and Labels")
text(y[,1]+0.03, y[,2], rownames(y))

Add regression line

plot(y[,1], y[,2])
myline <- lm(y[,2] ~ y[,1])
abline(myline, lwd=2)

summary(myline)

Call:
lm(formula = y[, 2] ~ y[, 1])

Residuals:
     Min       1Q   Median       3Q      Max 
-0.40357 -0.17912 -0.04299  0.22147  0.46623 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   0.5764     0.2110   2.732   0.0258 *
y[, 1]       -0.3647     0.3959  -0.921   0.3839  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3095 on 8 degrees of freedom
Multiple R-squared:  0.09589,   Adjusted R-squared:  -0.01712 
F-statistic: 0.8485 on 1 and 8 DF,  p-value: 0.3839

Important plot parameters

Argument Description
col color of symbols
pch symbol type (example(points) to see options)
lwd line/symbol width
cex.* font size controls
mar margin sizes c(bottom, left, top, right)
log="xy" log scale on both axes

Note

HW02 — Task B: Generate a scatter plot of iris columns 1 and 2, colored by Species. Use xlim/ylim to restrict data to the bottom-left quadrant.
→ HW02 instructions

Bar Plots, Histograms and More

Bar plot with legend

barplot(y[1:4,], ylim=c(0, max(y[1:4,])+0.3), beside=TRUE, legend=letters[1:4])

Tip

When input is a matrix, barplot uses column names as group labels and row names as within-group labels. Convert data.frame input with as.matrix() first.

Bar plot with error bars

bar <- barplot(m <- rowMeans(y) * 10, ylim=c(0, 10))
stdev <- sd(t(y))
arrows(bar, m, bar, m + stdev, length=0.15, angle=90)

Histogram and density plot

hist(y, freq=TRUE, breaks=10)

plot(density(y), col="red")

Save graphics to file

pdf("test.pdf")
plot(1:10, 1:10)
dev.off()         # always close the device!

Works the same for jpeg(), png(), svg(), tiff().

Note

HW02 — Task C: Calculate mean values per Species for first four iris columns. Organize as a matrix. Generate stacked and horizontally arranged bar plots.
→ HW02 instructions

Analysis Routine — Data Import

A step-by-step workflow using two sample biological datasets. This analysis routine is used by Homework 2D-H.

Step 1 — Download sample data

Open in Excel, save as tab-delimited text, then import:

my_mw <- read.delim(file="MolecularWeight_tair7.xls", header=TRUE, sep="\t")
my_mw[1:2,]
my_target <- read.delim(file="TargetP_analysis_tair7.xls", header=TRUE, sep="\t")
my_target[1:2,]

Or import directly from the web:

my_mw <- read.delim("https://faculty.ucr.edu/~tgirke/Documents/R_BioCond/Samples/MolecularWeight_tair7.xls",
                     header=TRUE, sep="\t")
my_target <- read.delim("https://faculty.ucr.edu/~tgirke/Documents/R_BioCond/Samples/TargetP_analysis_tair7.xls",
                          header=TRUE, sep="\t")

Analysis Routine — Merging Data Frames

Step 2 — Assign uniform ID column names

colnames(my_target)[1] <- "ID"
colnames(my_mw)[1] <- "ID"

Step 3 — Merge on common ID field (outer join)

my_mw_target <- merge(my_mw, my_target, by.x="ID", by.y="ID", all.x=TRUE)

Step 4 — Merge shortened table, then remove non-matching rows

my_mw_target2a <- merge(my_mw, my_target[1:40,], by.x="ID", by.y="ID", all.x=TRUE)
my_mw_target2 <- na.omit(my_mw_target2a)    # remove rows with NAs

Note

HW02 — Task D: Execute merge to return only common rows directly (without na.omit). Prove both methods return identical results.
HW02 — Task E: Replace all NA values in my_mw_target2a with zeros.

Analysis Routine — Filtering and String Operations

Step 5 — Filter rows by conditions

# Proteins with MW > 100,000 AND targeted to chloroplast (Loc == "C")
query <- my_mw_target[my_mw_target[,2] > 100000 & my_mw_target[,4] == "C", ]
query[1:4, ]
dim(query)

Note

HW02 — Task F: How many proteins have MW > 4,000 and < 5,000? Subset and sort by MW to verify.

Step 6 — Remove gene model extensions with regex

# AT1G01010.1 → AT1G01010  (remove everything from . onward)
my_mw_target3 <- data.frame(
    loci = gsub("\\..*", "", as.character(my_mw_target[,1]), perl=TRUE),
    my_mw_target
)
my_mw_target3[1:3, 1:8]

Note

HW02 — Task G: Retrieve rows where second column contains specific IDs using %in%. Also use the second column as a row index and repeat. Explain the difference between the two approaches.

Analysis Routine — Calculations and Export

Step 7 — Count duplicates

mycounts <- table(my_mw_target3[,1])[my_mw_target3[,1]]
my_mw_target4 <- cbind(my_mw_target3, Freq=mycounts[as.character(my_mw_target3[,1])])

Step 8 — Vectorized calculation (average AA weight)

data.frame(my_mw_target4, avg_AA_WT=(my_mw_target4[,3] / my_mw_target4[,4]))[1:2,]

Step 9 — Row-wise mean and standard deviation

mymean  <- apply(my_mw_target4[,6:9], 1, mean)
mystdev <- apply(my_mw_target4[,6:9], 1, sd, na.rm=TRUE)
data.frame(my_mw_target4, mean=mymean, stdev=mystdev)[1:2, 5:12]

Step 10 — Scatter plot

plot(my_mw_target4[1:500, 3:4], col="red")

Step 11 — Export results

write.table(my_mw_target4, file="my_file.xls", quote=FALSE, sep="\t", col.names=NA)

Note

HW02 — Task H: Assemble all commands from this exercise into HW2.R and run it:

source("HW2.R")    # from within R
Rscript HW2.R      # from command-line

HW02 Summary

Assemble all solutions into a single R script HW2.R and submit via GitHub.

Task Topic Key functions
A Sort iris, export, modify columns, re-import order, write.table, read.table
B Scatter plot iris col 1-2, colored by Species plot, xlim, ylim
C Mean matrix by Species, stacked & horizontal bars aggregate, barplot
D Merge returning only common rows; prove equivalence merge(all=FALSE), all()
E Replace NAs with zeros is.na, indexing
F Filter proteins by MW range 4,000–5,000 boolean indexing
G Subset rows by ID using %in% and row index %in%, rownames
H Assemble all code into HW2.R, run with source() source, Rscript

Submission path

Homework/HW2/HW2.R

Due: Thu, April 16th at 6:00 PM

Note

The preassembled workflow script for Task H is available here — it does not include solutions for Tasks A–C.