Graphics and Data Visualization in R
318 minute read
Overview
Graphics in R
- Powerful environment for visualizing scientific data
- Integrated graphics and statistics infrastructure
- Publication quality graphics
- Fully programmable
- Highly reproducible
- Full LaTeX, Sweave, knitr and R Markdown support.
- Vast number of R packages with graphics utilities
Documentation on Graphics in R
- General
- Interactive graphics
Graphics Environments
- Viewing and savings graphics in R
- On-screen graphics
- postscript, pdf, svg
- jpeg/png/wmf/tiff/…
- Four major graphics environments
Base Graphics
Overview
- Important high-level plotting functions
plot
: generic x-y plottingbarplot
: bar plotsboxplot
: box-and-whisker plothist
: histogramspie
: pie chartsdotchart
: cleveland dot plotsimage, heatmap, contour, persp
: functions to generate image-like plotsqqnorm, qqline, qqplot
: distribution comparison plotspairs, coplot
: display of multivariant data
- Help on these functions
?myfct
?plot
?par
Preferred Input Data Objects
- Matrices and data frames
- Vectors
- Named vectors
Scatter Plots
Basic scatter plots
Sample data set for subsequent plots
set.seed(1410)
y <- matrix(runif(30), ncol=3, dimnames=list(letters[1:10], LETTERS[1:3]))
y
## A B C
## a 0.26904539 0.47439030 0.4427788756
## b 0.53178658 0.31128960 0.3233293493
## c 0.93379571 0.04576263 0.0004628517
## d 0.14314802 0.12066723 0.4104402000
## e 0.57627063 0.83251909 0.9884746270
## f 0.49001235 0.38298651 0.8235850153
## g 0.66562596 0.70857731 0.7490944304
## h 0.50089252 0.24772695 0.2117313873
## i 0.57033245 0.06044799 0.8776291364
## j 0.04087422 0.85814118 0.1061618729
plot(y[,1], y[,2])

All pairs
pairs(y)

Plot labels
plot(y[,1], y[,2], pch=20, col="red", main="Symbols and Labels")
text(y[,1]+0.03, y[,2], rownames(y))

More examples
Print instead of symbols the row names
plot(y[,1], y[,2], type="n", main="Plot of Labels")
text(y[,1], y[,2], rownames(y))
Usage of important plotting parameters
grid(5, 5, lwd = 2)
op <- par(mar=c(8,8,8,8), bg="lightblue")
plot(y[,1], y[,2], type="p", col="red", cex.lab=1.2, cex.axis=1.2,
cex.main=1.2, cex.sub=1, lwd=4, pch=20, xlab="x label",
ylab="y label", main="My Main", sub="My Sub")
par(op)
Important arguments}
- mar
: specifies the margin sizes around the plotting area in order: c(bottom, left, top, right)
- col
: color of symbols
- pch
: type of symbols, samples: example(points)
- lwd
: size of symbols
- cex.*
: control font sizes
- For details see ?par
Add a regression line to a plot
plot(y[,1], y[,2])
myline <- lm(y[,2]~y[,1]); abline(myline, lwd=2)

summary(myline)
##
## Call:
## lm(formula = y[, 2] ~ y[, 1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40357 -0.17912 -0.04299 0.22147 0.46623
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5764 0.2110 2.732 0.0258 *
## y[, 1] -0.3647 0.3959 -0.921 0.3839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3095 on 8 degrees of freedom
## Multiple R-squared: 0.09589, Adjusted R-squared: -0.01712
## F-statistic: 0.8485 on 1 and 8 DF, p-value: 0.3839
Same plot as above, but on log scale
plot(y[,1], y[,2], log="xy")

Add a mathematical expression to a plot
plot(y[,1], y[,2]); text(y[1,1], y[1,2],
expression(sum(frac(1,sqrt(x^2*pi)))), cex=1.3)

Exercise 1
- Task 1: Generate scatter plot for first two columns in
iris
data frame and color dots by itsSpecies
column. - Task 2: Use the
xlim/ylim
arguments to set limits on the x- and y-axes so that all data points are restricted to the left bottom quadrant of the plot.
Structure of iris data set:
class(iris)
## [1] "data.frame"
iris[1:4,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
Line Plots
Single Data Set
plot(y[,1], type="l", lwd=2, col="blue")

Additional lines can be added to an existing plot with the lines()
function.
plot(y[,1], type="l", lwd=2, col="blue")
lines(y[,2], lwd=2, lty=1, col="red")
legend(8.3, 0.95, legend=c("Line 1", "Line 2"), col=c("blue", "red"), lty=1)

Many Data Sets
Alterntively, one can plot a line graph for all columns in data frame y
with the following approach. The split.screen
function is used in this example in a for
loop to overlay several line graphs in the same plot.
split.screen(c(1,1))
## [1] 1
plot(y[,1], ylim=c(0,1), xlab="Measurement", ylab="Intensity", type="l", lwd=2, col=1)
for(i in 2:length(y[1,])) {
screen(1, new=FALSE)
plot(y[,i], ylim=c(0,1), type="l", lwd=2, col=i, xaxt="n", yaxt="n", ylab="",
xlab="", main="", bty="n")
}

close.screen(all=TRUE)
Bar Plots
Basics
barplot(y[1:4,], ylim=c(0, max(y[1:4,])+0.3), beside=TRUE,
legend=letters[1:4])
text(labels=round(as.vector(as.matrix(y[1:4,])),2), x=seq(1.5, 13, by=1)
+sort(rep(c(0,1,2), 4)), y=as.vector(as.matrix(y[1:4,]))+0.04)

Error bars
bar <- barplot(m <- rowMeans(y) * 10, ylim=c(0, 10))
stdev <- sd(t(y))
arrows(bar, m, bar, m + stdev, length=0.15, angle = 90)

Mirrored bar plot
df <- data.frame(group = rep(c("Above", "Below"), each=10), x = rep(1:10, 2), y = c(runif(10, 0, 1), runif(10, -1, 0)))
plot(c(0,12), range(df$y), type = "n")
barplot(height = df$y[df$group == "Above"], add = TRUE, axes = FALSE)
barplot(height = df$y[df$group == "Below"], add = TRUE, axes = FALSE)

Bar plot of loan payments and amortization tables
The following imports a mortgage
payment function (from here) that calculates
monthly and annual mortgage/loan payments, generates amortization tables and
plots the results in form of a bar plot. A Shiny App using this function has been created
by Antoine Soetewey here.
source("https://raw.githubusercontent.com/tgirke/GEN242/main/content/en/tutorials/rgraphics/scripts/mortgage.R")
## The monthly mortgage payments and amortization rates can be calculted with the mortgage() function like this:
##
## m <- mortgage(P=500000, I=6, L=30, plotData=TRUE)
## P = principal (loan amount)
## I = annual interest rate
## L = length of the loan in years
m <- mortgage(P=250000, I=6, L=15, plotData=TRUE)
##
## The payments for this loan are:
##
## Monthly payment: $2109.642 (stored in m$monthPay)
##
## Total cost: $379735.6
##
## The amortization data for each of the 180 months are stored in "m$aDFmonth".
##
## The amortization data for each of the 15 years are stored in "m$aDFyear".

Histograms
hist(y, freq=TRUE, breaks=10)

Density Plots}
plot(density(y), col="red")

Pie Charts
pie(y[,1], col=rainbow(length(y[,1]), start=0.1, end=0.8), clockwise=TRUE)
legend("topright", legend=row.names(y), cex=1.3, bty="n", pch=15, pt.cex=1.8,
col=rainbow(length(y[,1]), start=0.1, end=0.8), ncol=1)

Color Selection Utilities
Default color palette and how to change it
palette()
## [1] "black" "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710" "gray62"
palette(rainbow(5, start=0.1, end=0.2))
palette()
## [1] "#FF9900" "#FFBF00" "#FFE600" "#F2FF00" "#CCFF00"
palette("default")
The gray
function allows to select any type of gray shades by providing values from 0 to 1
gray(seq(0.1, 1, by= 0.2))
## [1] "#1A1A1A" "#4D4D4D" "#808080" "#B3B3B3" "#E6E6E6"
Color gradients with colorpanel
function from gplots
library
library(gplots)
colorpanel(5, "darkblue", "yellow", "white")
Much more on colors in R see Earl Glynn’s color chart
Arranging Several Plots on Single Page
With par(mfrow=c(nrow, ncol))
one can define how several plots are arranged next to each other.
par(mfrow=c(2,3))
for(i in 1:6) plot(1:10)

Arranging Plots with Variable Width
The layout
function allows to divide the plotting device into variable numbers of rows and columns with the column-widths and the row-heights specified in the respective arguments.
nf <- layout(matrix(c(1,2,3,3), 2, 2, byrow=TRUE), c(3,7), c(5,5),
respect=TRUE)
# layout.show(nf)
for(i in 1:3) barplot(1:10)

Saving Graphics to Files
After the pdf()
command all graphs are redirected to file test.pdf
. Works for all common formats similarly: jpeg, png, ps, tiff, …
pdf("test.pdf"); plot(1:10, 1:10); dev.off()
Generates Scalable Vector Graphics (SVG) files that can be edited in vector graphics programs, such as InkScape.
svg("test.svg"); plot(1:10, 1:10); dev.off()
Exercise 2
Bar plots
- Task 1: Calculate the mean values for the
Species
components of the first four columns in theiris
data set. Organize the results in a matrix where the row names are the unique values from theiris Species
column and the column names are the same as in the first fouriris
columns. - Task 2: Generate two bar plots: one with stacked bars and one with horizontally arranged bars.
Structure of iris data set:
class(iris)
## [1] "data.frame"
iris[1:4,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
Grid Graphics
- What is
grid
?- Low-level graphics system
- Highly flexible and controllable system
- Does not provide high-level functions
- Intended as development environment for custom plotting functions
- Pre-installed on new R distributions
- Documentation and Help
lattice Graphics
- What is
lattice
?- High-level graphics system
- Developed by Deepayan Sarkar
- Implements Trellis graphics system from S-Plus
- Simplifies high-level plotting tasks: arranging complex graphical features
- Syntax similar to R’s base graphics
- Documentation and Help
Open a list of all functions available in the lattice package
library(lattice)
library(help=lattice)
Accessing and changing global parameters:
?lattice.options
?trellis.device
Scatter Plot Sample
library(lattice)
p1 <- xyplot(1:8 ~ 1:8 | rep(LETTERS[1:4], each=2), as.table=TRUE)
plot(p1)

Line Plot Sample
library(lattice)
p2 <- parallelplot(~iris[1:4] | Species, iris, horizontal.axis = FALSE,
layout = c(1, 3, 1))
plot(p2)

ggplot2 Graphics
- What is
ggplot2
?- High-level graphics system developed by Hadley Wickham
- Implements grammar of graphics from Leland Wilkinson
- Streamlines many graphics workflows for complex plots
- Syntax centered around main
ggplot
function - Simpler
qplot
function provides many shortcuts
- Documentation and Help
Design Concept of ggplot2
Plotting formalized and implemented by the grammar of graphics by Leland Wilkinson and Hadley Wickham (Wickham 2010, 2009; Wilkinson 2012). The plotting process
in ggplot2
is devided into layers including:
- Data: the actual data to be plotted
- Aesthetics: visual property of the objects in a plot (e.g. size, shape or color )
- Geometries: shapes used to represent data (e.g. bar or scatter plot)
- Facets: row and column layout of sub-plots
- Statistics: data models and summaries
- Coordinates: the plotting space
- Theme: styles to be used, such as fonts, backgrounds, etc.

ggplot2
Usage
ggplot
function accepts two main arguments- Data set to be plotted
- Aesthetic mappings provided by
aes
function
- Additional parameters such as geometric objects (e.g. points, lines, bars) are passed on by appending them with
+
as separator. - List of available
geom_*
functions see here - Settings of plotting theme can be accessed with the command
theme_get()
and its settings can be changed withtheme()
. - Preferred input data object
qplot
:data.frame
ortibble
(support forvector
,matrix
,...
)ggplot
:data.frame
ortibble
- Packages with convenience utilities to create expected inputs
dplyr
(plyr
)tidyr
andreshape2
qplot
Function
The syntax of qplot
is similar as R’s basic plot
function
- Arguments
x
: x-coordinates (e.g.col1
)y
: y-coordinates (e.g.col2
)data
:data.frame
ortibble
with corresponding column namesxlim, ylim
: e.g.xlim=c(0,10)
log
: e.g.log="x"
orlog="xy"
main
: main title; see?plotmath
for mathematical formulaxlab, ylab
: labels for the x- and y-axescolor
,shape
,size
...
: many arguments accepted byplot
function
qplot
: scatter plot basics
Create sample data, here 3 vectors: x
, y
and cat
library(ggplot2)
x <- sample(1:10, 10); y <- sample(1:10, 10); cat <- rep(c("A", "B"), 5)
Simple scatter plot
qplot(x, y, geom="point")

Prints dots with different sizes and colors
qplot(x, y, geom="point", size=x, color=cat,
main="Dot Size and Color Relative to Some Values")

Drops legend
qplot(x, y, geom="point", size=x, color=cat) +
theme(legend.position = "none")

Plot different shapes
qplot(x, y, geom="point", size=5, shape=cat)

Colored groups
p <- qplot(x, y, geom="point", size=x, color=cat,
main="Dot Size and Color Relative to Some Values") +
theme(legend.position = "none")
print(p)

Regression line
set.seed(1410)
dsmall <- diamonds[sample(nrow(diamonds), 1000), ]
p <- qplot(carat, price, data = dsmall) +
geom_smooth(method="lm")
print(p)

Local regression curve (loess)
p <- qplot(carat, price, data=dsmall, geom=c("point", "smooth"))
print(p) # Setting se=FALSE removes error shade

ggplot
Function
- More important than
qplot
to access full functionality ofggplot2
- Main arguments
- data set, usually a
data.frame
ortibble
- aesthetic mappings provided by
aes
function
- data set, usually a
- General
ggplot
syntaxggplot(data, aes(...)) + geom() + ... + stat() + ...
- Layer specifications
geom(mapping, data, ..., geom, position)
stat(mapping, data, ..., stat, position)
- Additional components
scales
coordinates
facet
aes()
mappings can be passed on to all components (ggplot, geom
, etc.). Effects are global when passed on toggplot()
and local for other components.x, y
color
: grouping vector (factor)group
: grouping vector (factor)
Changing Plotting Themes in ggplot
- Theme settings can be accessed with
theme_get()
- Their settings can be changed with
theme()
Example how to change background color to white
... + theme(panel.background=element_rect(fill = "white", colour = "black"))
Storing ggplot
Specifications
Plots and layers can be stored in variables
p <- ggplot(dsmall, aes(carat, price)) + geom_point()
p # or print(p)
Returns information about data and aesthetic mappings followed by each layer
summary(p)
Print dots with different sizes and colors
bestfit <- geom_smooth(method = "lm", se = F, color = alpha("steelblue", 0.5), size = 2)
p + bestfit # Plot with custom regression line
Syntax to pass on other data sets
p %+% diamonds[sample(nrow(diamonds), 100),]
Saves plot stored in variable p
to file
ggsave(p, file="myplot.pdf")
Standard R export functons for graphics work as well (see here).
ggplot
: scatter plots
Basic example
set.seed(1410)
dsmall <- as.data.frame(diamonds[sample(nrow(diamonds), 1000), ])
p <- ggplot(dsmall, aes(carat, price, color=color)) +
geom_point(size=4)
print(p)

Interactive version of above plot can be generated with the ggplotly
function from
the plotly
package.
library(plotly)
ggplotly(p)
Regression line
p <- ggplot(dsmall, aes(carat, price)) + geom_point() +
geom_smooth(method="lm", se=FALSE) +
theme(panel.background=element_rect(fill = "white", colour = "black"))
print(p)

Several regression lines
p <- ggplot(dsmall, aes(carat, price, group=color)) +
geom_point(aes(color=color), size=2) +
geom_smooth(aes(color=color), method = "lm", se=FALSE)
print(p)

Local regression curve (loess)
p <- ggplot(dsmall, aes(carat, price)) + geom_point() + geom_smooth()
print(p) # Setting se=FALSE removes error shade

ggplot
: line plot
p <- ggplot(iris, aes(Petal.Length, Petal.Width, group=Species,
color=Species)) + geom_line()
print(p)

Faceting
p <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_line(aes(color=Species), size=1) +
facet_wrap(~Species, ncol=1)
print(p)

Exercise 3
Scatter plots with ggplot2
- Task 1: Generate scatter plot for first two columns in
iris
data frame and color dots by itsSpecies
column. - Task 2: Use the
xlim
andylim
arguments to set limits on the x- and y-axes so that all data points are restricted to the left bottom quadrant of the plot. - Task 3: Generate corresponding line plot with faceting presenting the individual data sets in saparate plots.
Structure of iris
data set
class(iris)
## [1] "data.frame"
iris[1:4,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
Bar Plots
Sample Set: the following transforms the iris
data set into a ggplot2-friendly format.
Calculate mean values for aggregates given by Species
column in iris
data set
iris_mean <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=mean)
Calculate standard deviations for aggregates given by Species
column in iris
data set
iris_sd <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=sd)
Reformat iris_mean
with melt
from wide to long form as expected by ggplot2
. Newer
alternatives for restructuring data.frames
and tibbles
from wide into long form use the
gather
and pivot_longer
functions defined by the tidyr
package. Their usage is shown
below as well. The functions pivot_longer
and pivot_wider
are expected to provide the most
flexible long-term solution, but may not work in older R versions.
library(reshape2) # Defines melt function
df_mean <- melt(iris_mean, id.vars=c("Species"), variable.name = "Samples", value.name="Values")
df_mean2 <- tidyr::gather(iris_mean, !Species, key = "Samples", value = "Values")
df_mean3 <- tidyr::pivot_longer(iris_mean, !Species, names_to="Samples", values_to="Values")
Reformat iris_sd
with melt
df_sd <- melt(iris_sd, id.vars=c("Species"), variable.name = "Samples", value.name="Values")
Define standard deviation limits
limits <- aes(ymax = df_mean[,"Values"] + df_sd[,"Values"], ymin=df_mean[,"Values"] - df_sd[,"Values"])
Verical orientation
p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) +
geom_bar(position="dodge", stat="identity")
print(p)

To enforce that the bars are plotted in the order specified in the input data, one can instruct ggplot
to do so by turning the corresponding column (here Species
) into an ordered factor as follows.
df_mean$Species <- factor(df_mean$Species, levels=unique(df_mean$Species), ordered=TRUE)
In the above example this is not necessary since ggplot
uses this order already.
Horizontal orientation
p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) +
geom_bar(position="dodge", stat="identity") + coord_flip() +
theme(axis.text.y=element_text(angle=0, hjust=1))
print(p)

Faceting
p <- ggplot(df_mean, aes(Samples, Values)) + geom_bar(aes(fill = Species), stat="identity") +
facet_wrap(~Species, ncol=1)
print(p)
Error bars
p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) +
geom_bar(position="dodge", stat="identity") +
geom_errorbar(limits, position="dodge")
print(p)

Mirrored
df <- data.frame(group = rep(c("Above", "Below"), each=10), x = rep(1:10, 2), y = c(runif(10, 0, 1), runif(10, -1, 0)))
p <- ggplot(df, aes(x=x, y=y, fill=group)) +
geom_col()
print(p)

Changing Color Settings
library(RColorBrewer)
# display.brewer.all()
p <- ggplot(df_mean, aes(Samples, Values, fill=Species, color=Species)) +
geom_bar(position="dodge", stat="identity") + geom_errorbar(limits, position="dodge") +
scale_fill_brewer(palette="Blues") + scale_color_brewer(palette = "Greys")
print(p)

Using standard R color theme
p <- ggplot(df_mean, aes(Samples, Values, fill=Species, color=Species)) +
geom_bar(position="dodge", stat="identity") + geom_errorbar(limits, position="dodge") +
scale_fill_manual(values=c("red", "green3", "blue")) +
scale_color_manual(values=c("red", "green3", "blue"))
print(p)

Exercise 4
Bar plots
- Task 1: Calculate the mean values for the
Species
components of the first four columns in theiris
data set. Use themelt
function from thereshape2
package to bring the data into the expected format forggplot
. - Task 2: Generate two bar plots: one with stacked bars and one with horizontally arranged bars.
Structure of iris data set
class(iris)
## [1] "data.frame"
iris[1:4,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
Data reformatting example
Here for line plot
y <- matrix(rnorm(500), 100, 5, dimnames=list(paste("g", 1:100, sep=""), paste("Sample", 1:5, sep="")))
y <- data.frame(Position=1:length(y[,1]), y)
y[1:4, ] # First rows of input format expected by melt()
## Position Sample1 Sample2 Sample3 Sample4 Sample5
## g1 1 1.5336975 -1.0365027 -2.0276195 -0.4580396 -0.06460952
## g2 2 -2.0960304 2.1878704 0.7260334 0.8274617 0.24192162
## g3 3 -0.8233125 0.4250477 0.6526331 -0.4509962 -1.06778801
## g4 4 1.0961555 0.8101104 -0.3403762 -0.7222191 -0.72737741
df <- melt(y, id.vars=c("Position"), variable.name = "Samples", value.name="Values")
p <- ggplot(df, aes(Position, Values)) + geom_line(aes(color=Samples)) + facet_wrap(~Samples, ncol=1)
print(p)

Same data can be represented in box plot as follows
ggplot(df, aes(Samples, Values, fill=Samples)) + geom_boxplot() + geom_jitter(color="darkgrey")
Jitter Plots
p <- ggplot(dsmall, aes(color, price/carat)) +
geom_jitter(alpha = I(1 / 2), aes(color=color))
print(p)

Box plots
p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_boxplot()
print(p)

Violin plots
p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_violin()
print(p)

Same violin plot as interactive plot generated with ggplotly
, where the actual data points
are shown as well by including geom_jitter()
.
p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_violin() + geom_jitter(aes(color=color))
ggplotly(p)