One of the main attractions of R is how easy it is to write custom functions and programs — even for users with no prior programming experience. Once the basic control structures are understood, R becomes a powerful environment for complex custom analyses of almost any type of data.
.Rmd and .qmd files extend R scripts with narrative text, results, and formatted output. They render to HTML, PDF, and other formats. Details in the R Markdown tutorial.
Control Structures — Operators
Comparison operators
Operator
Meaning
==
equal
!=
not equal
> / >=
greater than / or equal
< / <=
less than / or equal
Logical operators
Operator
Meaning
Scope
&
AND
element-wise on vectors
&&
AND
first element only — use in if statements
\|
OR
element-wise on vectors
\|\|
OR
first element only — use in if statements
!
NOT
Tip
Use && and || in if statements (they evaluate only the first element and short-circuit). Use & and | for element-wise operations on vectors.
Conditional Execution — if and ifelse
if statement — operates on a single logical value
if (TRUE) { statements_1} else { statements_2}
Warning
Keep } else { on the same line — avoid a newline before else or R will misparse the statement.
Examples
# Basic if / elseif (1==0) {print(1)} else {print(2) # runs this branch}# if / else if / else chainif (1==0) {print(1)} elseif (1==2) {print(2)} else {print(3) # runs this branch}
ifelse — vectorized conditional, operates on entire vectors
ifelse(test, true_value, false_value) # syntax
x <-1:10ifelse(x <5, sqrt(x), 0) # sqrt for values < 5, else 0
ifelse is much more efficient than a for loop with an if inside when operating on vectors.
for Loops
Iterate over elements of a sequence:
for (variable in sequence) { statements}
Example — compute row means (append approach)
mydf <- irismyve <-NULLfor (i inseq(along=mydf[,1])) { myve <-c(myve, mean(as.numeric(mydf[i, 1:3]))) # appends result each iteration}myve[1:8]
Warning
The append approach (c()) is slow for large objects — each iteration creates a new copy of the entire vector. Use the inject approach instead.
Inject approach — pre-allocate the result vector (much faster)
myve <-numeric(length(mydf[,1])) # pre-allocate vector of correct lengthfor (i inseq(along=myve)) { myve[i] <-mean(as.numeric(mydf[i, 1:3])) # assign result by index}myve[1:8]
Conditional stop inside a loop
Use stop() to break out of a loop with an error message when a condition is met:
x <-1:10z <-NULLfor (i inseq(along=x)) {if (x[i] <5) { z <-c(z, x[i]-1)print(z) } else {stop("values need to be < 5") # breaks loop and prints error }}
while Loop
Iterates as long as a condition remains TRUE:
while (condition) { statements}
Example
z <-0while (z <5) { z <- z +2# increment z each iterationprint(z) # prints: 2, 4, 6 (stops when z >= 5)}
Tip
Use while when the number of iterations is not known in advance. Use for when iterating over a known sequence or vector.
The apply Function Family
Apply functions avoid explicit loops and are often more readable and faster.
apply — apply a function over rows or columns of a matrix/data.frame
apply(X, MARGIN, FUN, ...)# X: matrix, array, or data.frame# MARGIN: 1 = rows, 2 = columns# FUN: function to apply
apply(iris[1:8, 1:3], 1, mean) # row-wise mean for first 8 rows, cols 1-3apply(iris[, 1:4], 2, mean) # column-wise mean for all numeric columns
tapply — apply a function to groups defined by a factor
tapply(vector, factor, FUN)
tapply(iris$Sepal.Length, iris$Species, mean) # mean Sepal.Length per species
lapply and sapply — apply a function to each element of a list or vector
l <-list(a=1:10, beta=exp(-3:3), logic=c(TRUE,FALSE,FALSE,TRUE))lapply(l, mean) # returns a listsapply(l, mean) # returns a vector or matrix when possiblevapply(l, mean, FUN.VALUE=numeric(1)) # safer: enforces output type
Often used with an inline anonymous function:
sapply(names(l), function(x) mean(l[[x]])) # same result, explicit element access
Choosing between lapply, sapply, vapply
Function
Returns
Best for
lapply
always a list
when output types may vary
sapply
vector/matrix if possible, else list
interactive use
vapply
vector/array of specified type
scripts — safer, faster
Loop Speed Performance
Looping over large data sets can be slow. The key principle: avoid growing objects inside loops and prefer vectorized operations over loops entirely.
# Per-column: genes passing filter in each contrastmatchingIDlist <-sapply(colnames(queryResult),function(x) names(queryResult[queryResult[,x], x]),simplify=FALSE)# Across columns: genes passing filter in > 2 contrastsmatchingID <-rowSums(queryResult) >2names(matchingID[matchingID]) # gene names meeting the threshold
Tip
Storing LFC and p-value as two parallel matrices in a list enables flexible, fast, zero-loop combinatorial filtering — a pattern worth reusing in any multi-contrast analysis.
Functions — Overview and Syntax
Functions are the primary way to organize and reuse code in R. Almost everything in R is a function.
Define a function
myfct <-function(arg1, arg2, ...) {# function body — operations on the arguments result <- arg1 + arg2return(result) # return value explicitly, or just: result}
Call a function
myfct(arg1=3, arg2=4) # with argument names (recommended)myfct(3, 4) # positional — order must match definition
Key rules
Concept
Rule
Naming
Avoid names of existing functions (e.g. don’t name a function mean)
Default args
Provide defaults with arg=value — caller can then omit them
Empty args
function() { ... } — valid for functions that always return the same value
...
Pass unknown arguments through to another function
Return value
Last unassigned expression, or explicit return()
Scope
Variables inside a function are local — invisible outside
Global assign
Use <<- to force a variable to exist in the global environment
myfct(x1=2, x2=5) # explicit: returns c(1, 25)myfct(2, 5) # positional: same resultmyfct(x1=2) # uses default x2=5: same resultmyfct # without () prints the function definition
Scope — variables inside functions are local
x <-10# global xmyfct2 <-function() { x <-99# local x — does not affect global xcat("inside:", x, "\n")}myfct2() # prints: inside: 99cat("outside:", x, "\n") # prints: outside: 10# Force global assignment with <<-myfct3 <-function() { x <<-99# modifies global x}myfct3()cat("outside:", x, "\n") # prints: outside: 99
Tip
Avoid <<- in general — it makes code harder to reason about. Prefer returning values explicitly with return() and assigning outside the function.
Useful Utilities — Debugging
R provides several tools for finding and fixing errors in code:
Function
Purpose
traceback()
Shows the call stack after an error
browser()
Insert a breakpoint — pauses execution and opens interactive prompt
debug(myfct)
Step through myfct line by line
undebug(myfct)
Remove debug mode from a function
options(error=recover)
On error, open interactive debugger at the call stack
options(error=NULL)
Reset to default error handling
# Example: use browser() as a breakpoint inside a functionmyfct <-function(x) {browser() # execution pauses here — inspect variables interactively result <- x^2return(result)}myfct(5)
R’s regex utilities work similarly to other languages. Main reference: ?regexp
Pattern matching with grep
month.name[grep("^A", month.name)] # months starting with Agrep("^J", month.name, value=TRUE) # same with value=TRUEgrepl("^A", month.name) # returns logical vector
String substitution with gsub and sub
# gsub: replace ALL matchesgsub("(i.*a)", "xxx_\\1", "virginica", perl=TRUE) # back reference with \\1# sub: replace FIRST match onlysub("a", "X", "banana") # returns "bXnana"
String operations
# Insert a character with back reference, then split on itx <-gsub("(a)", "\\1_", month.name[1], perl=TRUE) # "J_anu_ary"strsplit(x, "_") # split on "_"# Reverse a stringpaste(rev(unlist(strsplit("hello", NULL))), collapse="") # "olleh"
Import lines matching a pattern from a file
cat(month.name, file="months.txt", sep="\n") # write months to filex <-readLines("months.txt") # read all linesx[grep("^J", x, perl=TRUE)] # keep lines starting with J
myfct <-function(x) x^2mylist <-ls()n <-which(mylist %in%"myfct")get(mylist[n]) # retrieves the object named by the stringget(mylist[n])(2) # calls it as a function with argument 2eval(parse(text=mylist[n])) # alternative: parse string as expression
Timing and system calls
system.time(ls()) # measure time for an expressiondate() # current system date and timeSys.sleep(1) # pause R for 1 second
Task 1.1 — for loop with append (slow but instructive):
myve_for <-NULLfor (i inseq(along=myMA[,1])) { myve_for <-c(myve_for, mean(as.numeric(myMA[i,])))}
Task 1.2 — while loop:
z <-1; myve_while <-NULLwhile (z <=nrow(myMA)) { myve_while <-c(myve_while, mean(as.numeric(myMA[z,]))) z <- z +1}
Task 1.3 — confirm both methods give identical results:
all(myve_for == myve_while) # should return TRUE
Task 1.4 — apply loop:
myve_apply <-apply(myMA, 1, mean)
Task 1.5 — built-in rowMeans (fastest):
mymean <-rowMeans(myMA)# Compare all approaches side by side:myResult <-cbind(myMA, mean_for=myve_for, mean_while=myve_while,mean_apply=myve_apply, mean_rowMeans=mymean)myResult[1:4, -c(1,2,3)] # show only the mean columns
Programming Exercises (cont.)
Exercise 2 — Custom function for grouped column means
Task 2.1 — implement a function that computes means for user-specified column groups in any matrix or data frame:
myMA <-matrix(rnorm(100000), 10000, 10,dimnames=list(1:10000, paste("C", 1:10, sep="")))# Group columns: cols 1-3 → group 1, cols 4-6 → group 2, etc.myList <-tapply(colnames(myMA), c(1,1,1,2,2,2,3,3,4,4), list)names(myList) <-sapply(myList, paste, collapse="_")# Apply mean to each column groupmyMAmean <-sapply(myList, function(x) apply(myMA[,x], 1, mean))myMAmean[1:4,]