Contents

Alternative formats of this tutorial: .Rmd HTML, .Rmd Source, .R Source, PDF Slides

1 Overview

1.1 What is R?

R is a powerful statistical environment and programming language for the analysis and visualization of data. The associated Bioconductor and CRAN package repositories provide many additional R packages for statistical data analysis for a wide array of research areas. The R software is free and runs on all common operating systems.

1.2 Why Using R?

1.3 Books and Documentation

1.4 R Working Environments

1.5 Working environments (IDEs) for R

R Projects and Interfaces

Some R working environments with support for syntax highlighting and utilities to send code to the R console:

1.5.1 Example: RStudio

New integrated development environment (IDE) for R. Highly functional for both beginners and advanced.

RStudio IDE

Some userful shortcuts: Ctrl+Enter (send code), Ctrl+Shift+C (comment/uncomment), Ctrl+1/2 (switch window focus)

1.5.2 Example: Vim-R-Tmux

Terminal-based Working Environment for R: Vim-R-Tmux

Vim-R-Tmux IDE for R

2 R Package Repositories

3 Installation of R and Add-on Packages

  1. Install R for your operating system from CRAN.

  2. Install RStudio from RStudio.

  3. Install CRAN Packages from R console like this:

install.packages(c("pkg1", "pkg2")) 
install.packages("pkg.zip", repos=NULL)
  1. Install Bioconductor packages as follows:
source("http://www.bioconductor.org/biocLite.R")
library(BiocInstaller)
BiocVersion()
biocLite()
biocLite(c("pkg1", "pkg2"))
  1. For more details consult the Bioc Install page and BiocInstaller package.

4 Getting Around

4.1 Startup and Closing Behavior

5 Basic Syntax

General R command syntax

object <- function_name(arguments) 
object <- object[arguments] 

Finding help

?function_name

Load a library/package

library("my_library") 

List functions defined by a library

library(help="my_library")

Load library manual (PDF or HTML file)

vignette("my_library") 

Execute an R script from within R

source("my_script.R")

Execute an R script from command-line (the first of the three options is preferred)

$ Rscript my_script.R
$ R CMD BATCH my_script.R 
$ R --slave < my_script.R 

6 Data Types

Numeric data: 1, 2, 3, ...

x <- c(1, 2, 3)
x
## [1] 1 2 3
is.numeric(x)
## [1] TRUE
as.character(x)
## [1] "1" "2" "3"

Character data: "a", "b", "c", ...

x <- c("1", "2", "3")
x
## [1] "1" "2" "3"
is.character(x)
## [1] TRUE
as.numeric(x)
## [1] 1 2 3

Complex data: mix of both

c(1, "b", 3)
## [1] "1" "b" "3"

Logical data: TRUE of FALSE

x <- 1:10 < 5
x  
##  [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
!x
##  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
which(x) # Returns index for the 'TRUE' values in logical vector
## [1] 1 2 3 4

7 Data objects

7.1 Object types

Vectors (1D): numeric or character

myVec <- 1:10; names(myVec) <- letters[1:10]  
myVec[1:5]
## a b c d e 
## 1 2 3 4 5
myVec[c(2,4,6,8)]
## b d f h 
## 2 4 6 8
myVec[c("b", "d", "f")]
## b d f 
## 2 4 6

Factors (1D): vectors with grouping information

factor(c("dog", "cat", "mouse", "dog", "dog", "cat"))
## [1] dog   cat   mouse dog   dog   cat  
## Levels: cat dog mouse

Matrices (2D): two dimensional structures with data of same type

myMA <- matrix(1:30, 3, 10, byrow = TRUE) 
class(myMA)
## [1] "matrix"
myMA[1:2,]
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    2    3    4    5    6    7    8    9    10
## [2,]   11   12   13   14   15   16   17   18   19    20
myMA[1, , drop=FALSE]
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    2    3    4    5    6    7    8    9    10

Data Frames (2D): two dimensional objects with data of variable types

myDF <- data.frame(Col1=1:10, Col2=10:1) 
myDF[1:2, ]
##   Col1 Col2
## 1    1   10
## 2    2    9

Arrays: data structure with one, two or more dimensions

Lists: containers for any object type

myL <- list(name="Fred", wife="Mary", no.children=3, child.ages=c(4,7,9)) 
myL
## $name
## [1] "Fred"
## 
## $wife
## [1] "Mary"
## 
## $no.children
## [1] 3
## 
## $child.ages
## [1] 4 7 9
myL[[4]][1:2] 
## [1] 4 7

Functions: piece of code

myfct <- function(arg1, arg2, ...) { 
    function_body 
}

7.2 Subsetting of data objects

Subsetting by positive or negative index/position numbers

myVec <- 1:26; names(myVec) <- LETTERS 
myVec[1:4]
## A B C D 
## 1 2 3 4

Subsetting by same length logical vectors

myLog <- myVec > 10
myVec[myLog] 
##  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z 
## 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Subsetting by field names

myVec[c("B", "K", "M")]
##  B  K  M 
##  2 11 13

Subset with $ sign: references a single column or list component by its name

iris$Species[1:8]
## [1] setosa setosa setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica

8 Graphics example

Plotting example

barplot(1:10, col="green")


# Session Info

sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.3 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  utils     datasets  grDevices methods   base     
## 
## other attached packages:
## [1] ggplot2_2.0.0   limma_3.26.3    BiocStyle_1.8.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.3      codetools_0.2-14 digest_0.6.9     plyr_1.8.3       grid_3.2.3      
##  [6] gtable_0.1.2     formatR_1.2.1    magrittr_1.5     evaluate_0.8     scales_0.3.0    
## [11] stringi_1.0-1    rmarkdown_0.9.2  tools_3.2.3      stringr_1.0.0    munsell_0.4.2   
## [16] yaml_2.1.13      colorspace_1.2-6 htmltools_0.3    knitr_1.12