plot
: generic x-y plottingbarplot
: bar plotsboxplot
: box-and-whisker plothist
: histogramspie
: pie chartsdotchart
: cleveland dot plotsimage, heatmap, contour, persp
: functions to generate image-like plotsqqnorm, qqline, qqplot
: distribution comparison plotspairs, coplot
: display of multivariant data?myfct
?plot
?par
Sample data set for subsequent plots
set.seed(1410)
y <- matrix(runif(30), ncol=3, dimnames=list(letters[1:10], LETTERS[1:3]))
y
## A B C
## a 0.26904539 0.47439030 0.4427788756
## b 0.53178658 0.31128960 0.3233293493
## c 0.93379571 0.04576263 0.0004628517
## d 0.14314802 0.12066723 0.4104402000
## e 0.57627063 0.83251909 0.9884746270
## f 0.49001235 0.38298651 0.8235850153
## g 0.66562596 0.70857731 0.7490944304
## h 0.50089252 0.24772695 0.2117313873
## i 0.57033245 0.06044799 0.8776291364
## j 0.04087422 0.85814118 0.1061618729
plot(y[,1], y[,2])
pairs(y)
plot(y[,1], y[,2], pch=20, col="red", main="Symbols and Labels")
text(y[,1]+0.03, y[,2], rownames(y))
Print instead of symbols the row names
plot(y[,1], y[,2], type="n", main="Plot of Labels")
text(y[,1], y[,2], rownames(y))
Usage of important plotting parameters
grid(5, 5, lwd = 2)
op <- par(mar=c(8,8,8,8), bg="lightblue")
plot(y[,1], y[,2], type="p", col="red", cex.lab=1.2, cex.axis=1.2,
cex.main=1.2, cex.sub=1, lwd=4, pch=20, xlab="x label",
ylab="y label", main="My Main", sub="My Sub")
par(op)
Important arguments} - mar
: specifies the margin sizes around the plotting area in order: c(bottom, left, top, right)
- col
: color of symbols - pch
: type of symbols, samples: example(points)
- lwd
: size of symbols - cex.*
: control font sizes - For details see ?par
Add a regression line to a plot
plot(y[,1], y[,2])
myline <- lm(y[,2]~y[,1]); abline(myline, lwd=2)
summary(myline)
##
## Call:
## lm(formula = y[, 2] ~ y[, 1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40357 -0.17912 -0.04299 0.22147 0.46623
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5764 0.2110 2.732 0.0258 *
## y[, 1] -0.3647 0.3959 -0.921 0.3839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3095 on 8 degrees of freedom
## Multiple R-squared: 0.09589, Adjusted R-squared: -0.01712
## F-statistic: 0.8485 on 1 and 8 DF, p-value: 0.3839
Same plot as above, but on log scale
plot(y[,1], y[,2], log="xy")
Add a mathematical expression to a plot
plot(y[,1], y[,2]); text(y[1,1], y[1,2],
expression(sum(frac(1,sqrt(x^2*pi)))), cex=1.3)
iris
data frame and color dots by its Species
column.xlim/ylim
arguments to set limits on the x- and y-axes so that all data points are restricted to the left bottom quadrant of the plot.Structure of iris data set:
class(iris)
## [1] "data.frame"
iris[1:4,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
plot(y[,1], type="l", lwd=2, col="blue")
Additional lines can be added to an existing plot with the lines()
function.
plot(y[,1], type="l", lwd=2, col="blue")
lines(y[,2], lwd=2, lty=1, col="red")
legend(8.3, 0.95, legend=c("Line 1", "Line 2"), col=c("blue", "red"), lty=1)
Alterntively, one can plot a line graph for all columns in data frame y
with the following approach. The split.screen
function is used in this example in a for
loop to overlay several line graphs in the same plot.
split.screen(c(1,1))
## [1] 1
plot(y[,1], ylim=c(0,1), xlab="Measurement", ylab="Intensity", type="l", lwd=2, col=1)
for(i in 2:length(y[1,])) {
screen(1, new=FALSE)
plot(y[,i], ylim=c(0,1), type="l", lwd=2, col=i, xaxt="n", yaxt="n", ylab="",
xlab="", main="", bty="n")
}
close.screen(all=TRUE)
barplot(y[1:4,], ylim=c(0, max(y[1:4,])+0.3), beside=TRUE,
legend=letters[1:4])
text(labels=round(as.vector(as.matrix(y[1:4,])),2), x=seq(1.5, 13, by=1)
+sort(rep(c(0,1,2), 4)), y=as.vector(as.matrix(y[1:4,]))+0.04)
bar <- barplot(m <- rowMeans(y) * 10, ylim=c(0, 10))
stdev <- sd(t(y))
arrows(bar, m, bar, m + stdev, length=0.15, angle = 90)
df <- data.frame(group = rep(c("Above", "Below"), each=10), x = rep(1:10, 2), y = c(runif(10, 0, 1), runif(10, -1, 0)))
plot(c(0,12), range(df$y), type = "n")
barplot(height = df$y[df$group == "Above"], add = TRUE, axes = FALSE)
barplot(height = df$y[df$group == "Below"], add = TRUE, axes = FALSE)
The following imports a mortgage
payment function (from here) that calculates monthly and annual mortgage/loan payments, generates amortization tables and plots the results in form of a bar plot. A Shiny App using this function has been created by Antoine Soetewey here.
source("https://raw.githubusercontent.com/tgirke/GEN242/main/content/en/tutorials/rgraphics/scripts/mortgage.R")
## The monthly mortgage payments and amortization rates can be calculted with the mortgage() function like this:
##
## m <- mortgage(P=500000, I=6, L=30, plotData=TRUE)
## P = principal (loan amount)
## I = annual interest rate
## L = length of the loan in years
m <- mortgage(P=250000, I=6, L=15, plotData=TRUE)
##
## The payments for this loan are:
##
## Monthly payment: $2109.642 (stored in m$monthPay)
##
## Total cost: $379735.6
##
## The amortization data for each of the 180 months are stored in "m$aDFmonth".
##
## The amortization data for each of the 15 years are stored in "m$aDFyear".
hist(y, freq=TRUE, breaks=10)
plot(density(y), col="red")
pie(y[,1], col=rainbow(length(y[,1]), start=0.1, end=0.8), clockwise=TRUE)
legend("topright", legend=row.names(y), cex=1.3, bty="n", pch=15, pt.cex=1.8,
col=rainbow(length(y[,1]), start=0.1, end=0.8), ncol=1)
Default color palette and how to change it
palette()
## [1] "black" "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710" "gray62"
palette(rainbow(5, start=0.1, end=0.2))
palette()
## [1] "#FF9900" "#FFBF00" "#FFE600" "#F2FF00" "#CCFF00"
palette("default")
The gray
function allows to select any type of gray shades by providing values from 0 to 1
gray(seq(0.1, 1, by= 0.2))
## [1] "#1A1A1A" "#4D4D4D" "#808080" "#B3B3B3" "#E6E6E6"
Color gradients with colorpanel
function from gplots
library
library(gplots)
colorpanel(5, "darkblue", "yellow", "white")
Much more on colors in R see Earl Glynn’s color chart
With par(mfrow=c(nrow, ncol))
one can define how several plots are arranged next to each other.
par(mfrow=c(2,3))
for(i in 1:6) plot(1:10)
The layout
function allows to divide the plotting device into variable numbers of rows and columns with the column-widths and the row-heights specified in the respective arguments.
nf <- layout(matrix(c(1,2,3,3), 2, 2, byrow=TRUE), c(3,7), c(5,5),
respect=TRUE)
# layout.show(nf)
for(i in 1:3) barplot(1:10)
After the pdf()
command all graphs are redirected to file test.pdf
. Works for all common formats similarly: jpeg, png, ps, tiff, …
pdf("test.pdf"); plot(1:10, 1:10); dev.off()
Generates Scalable Vector Graphics (SVG) files that can be edited in vector graphics programs, such as InkScape.
svg("test.svg"); plot(1:10, 1:10); dev.off()
Bar plots
Species
components of the first four columns in the iris
data set. Organize the results in a matrix where the row names are the unique values from the iris Species
column and the column names are the same as in the first four iris
columns.Structure of iris data set:
class(iris)
## [1] "data.frame"
iris[1:4,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
lattice
?
Open a list of all functions available in the lattice package
library(lattice)
library(help=lattice)
Accessing and changing global parameters:
?lattice.options
?trellis.device
library(lattice)
p1 <- xyplot(1:8 ~ 1:8 | rep(LETTERS[1:4], each=2), as.table=TRUE)
plot(p1)
library(lattice)
p2 <- parallelplot(~iris[1:4] | Species, iris, horizontal.axis = FALSE,
layout = c(1, 3, 1))
plot(p2)
ggplot2
?
ggplot
functionqplot
function provides many shortcutsggplot2
Plotting formalized and implemented by the grammar of graphics by Leland Wilkinson and Hadley Wickham (Wickham 2010, 2009; Wilkinson 2012). The plotting process in ggplot2
is devided into layers including:
ggplot2
Usageggplot
function accepts two main arguments
aes
function+
as separator.geom_*
functions see heretheme_get()
and its settings can be changed with theme()
.qplot
: data.frame
or tibble
(support for vector
, matrix
, ...
)ggplot
: data.frame
or tibble
dplyr
(plyr
)tidyr
and reshape2
qplot
FunctionThe syntax of qplot
is similar as R’s basic plot
function
x
: x-coordinates (e.g. col1
)y
: y-coordinates (e.g. col2
)data
: data.frame
or tibble
with corresponding column namesxlim, ylim
: e.g. xlim=c(0,10)
log
: e.g. log="x"
or log="xy"
main
: main title; see ?plotmath
for mathematical formulaxlab, ylab
: labels for the x- and y-axescolor
, shape
, size
...
: many arguments accepted by plot
functionqplot
: scatter plot basicsCreate sample data, here 3 vectors: x
, y
and cat
library(ggplot2)
x <- sample(1:10, 10); y <- sample(1:10, 10); cat <- rep(c("A", "B"), 5)
Simple scatter plot
qplot(x, y, geom="point")
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Prints dots with different sizes and colors
qplot(x, y, geom="point", size=x, color=cat,
main="Dot Size and Color Relative to Some Values")
Drops legend
qplot(x, y, geom="point", size=x, color=cat) +
theme(legend.position = "none")
Plot different shapes
qplot(x, y, geom="point", size=5, shape=cat)
p <- qplot(x, y, geom="point", size=x, color=cat,
main="Dot Size and Color Relative to Some Values") +
theme(legend.position = "none")
print(p)
set.seed(1410)
dsmall <- diamonds[sample(nrow(diamonds), 1000), ]
p <- qplot(carat, price, data = dsmall) +
geom_smooth(method="lm")
print(p)
p <- qplot(carat, price, data=dsmall, geom=c("point", "smooth"))
print(p) # Setting se=FALSE removes error shade
ggplot
Functionqplot
to access full functionality of ggplot2
data.frame
or tibble
aes
functionggplot
syntax
ggplot(data, aes(...)) + geom() + ... + stat() + ...
geom(mapping, data, ..., geom, position)
stat(mapping, data, ..., stat, position)
scales
coordinates
facet
aes()
mappings can be passed on to all components (ggplot, geom
, etc.). Effects are global when passed on to ggplot()
and local for other components.
x, y
color
: grouping vector (factor)group
: grouping vector (factor)ggplot
theme_get()
theme()
Example how to change background color to white
... + theme(panel.background=element_rect(fill = "white", colour = "black"))
ggplot
SpecificationsPlots and layers can be stored in variables
p <- ggplot(dsmall, aes(carat, price)) + geom_point()
p # or print(p)
Returns information about data and aesthetic mappings followed by each layer
summary(p)
Print dots with different sizes and colors
bestfit <- geom_smooth(method = "lm", se = F, color = alpha("steelblue", 0.5), size = 2)
p + bestfit # Plot with custom regression line
Syntax to pass on other data sets
p %+% diamonds[sample(nrow(diamonds), 100),]
Saves plot stored in variable p
to file
ggsave(p, file="myplot.pdf")
Standard R export functons for graphics work as well (see here).
ggplot
: scatter plotsset.seed(1410)
dsmall <- as.data.frame(diamonds[sample(nrow(diamonds), 1000), ])
p <- ggplot(dsmall, aes(carat, price, color=color)) +
geom_point(size=4)
print(p)
Interactive version of above plot can be generated with the ggplotly
function from the plotly
package.
library(plotly)
ggplotly(p)
p <- ggplot(dsmall, aes(carat, price)) + geom_point() +
geom_smooth(method="lm", se=FALSE) +
theme(panel.background=element_rect(fill = "white", colour = "black"))
print(p)
p <- ggplot(dsmall, aes(carat, price, group=color)) +
geom_point(aes(color=color), size=2) +
geom_smooth(aes(color=color), method = "lm", se=FALSE)
print(p)
p <- ggplot(dsmall, aes(carat, price)) + geom_point() + geom_smooth()
print(p) # Setting se=FALSE removes error shade
ggplot
: line plotp <- ggplot(iris, aes(Petal.Length, Petal.Width, group=Species,
color=Species)) + geom_line()
print(p)
p <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_line(aes(color=Species), size=1) +
facet_wrap(~Species, ncol=1)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
print(p)
Scatter plots with ggplot2
iris
data frame and color dots by its Species
column.xlim
and ylim
arguments to set limits on the x- and y-axes so that all data points are restricted to the left bottom quadrant of the plot.Structure of iris
data set
class(iris)
## [1] "data.frame"
iris[1:4,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
Sample Set: the following transforms the iris
data set into a ggplot2-friendly format.
Calculate mean values for aggregates given by Species
column in iris
data set
iris_mean <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=mean)
Calculate standard deviations for aggregates given by Species
column in iris
data set
iris_sd <- aggregate(iris[,1:4], by=list(Species=iris$Species), FUN=sd)
Reformat iris_mean
with melt
from wide to long form as expected by ggplot2
. Newer alternatives for restructuring data.frames
and tibbles
from wide into long form use the gather
and pivot_longer
functions defined by the tidyr
package. Their usage is shown below as well. The functions pivot_longer
and pivot_wider
are expected to provide the most flexible long-term solution, but may not work in older R versions.
library(reshape2) # Defines melt function
df_mean <- melt(iris_mean, id.vars=c("Species"), variable.name = "Samples", value.name="Values")
df_mean2 <- tidyr::gather(iris_mean, !Species, key = "Samples", value = "Values")
df_mean3 <- tidyr::pivot_longer(iris_mean, !Species, names_to="Samples", values_to="Values")
Reformat iris_sd
with melt
df_sd <- melt(iris_sd, id.vars=c("Species"), variable.name = "Samples", value.name="Values")
Define standard deviation limits
limits <- aes(ymax = df_mean[,"Values"] + df_sd[,"Values"], ymin=df_mean[,"Values"] - df_sd[,"Values"])
p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) +
geom_bar(position="dodge", stat="identity")
print(p)
To enforce that the bars are plotted in the order specified in the input data, one can instruct ggplot
to do so by turning the corresponding column (here Species
) into an ordered factor as follows.
df_mean$Species <- factor(df_mean$Species, levels=unique(df_mean$Species), ordered=TRUE)
In the above example this is not necessary since ggplot
uses this order already.
p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) +
geom_bar(position="dodge", stat="identity") + coord_flip() +
theme(axis.text.y=element_text(angle=0, hjust=1))
print(p)
p <- ggplot(df_mean, aes(Samples, Values)) + geom_bar(aes(fill = Species), stat="identity") +
facet_wrap(~Species, ncol=1)
print(p)
p <- ggplot(df_mean, aes(Samples, Values, fill = Species)) +
geom_bar(position="dodge", stat="identity") +
geom_errorbar(limits, position="dodge")
print(p)
df <- data.frame(group = rep(c("Above", "Below"), each=10), x = rep(1:10, 2), y = c(runif(10, 0, 1), runif(10, -1, 0)))
p <- ggplot(df, aes(x=x, y=y, fill=group)) +
geom_col()
print(p)
library(RColorBrewer)
# display.brewer.all()
p <- ggplot(df_mean, aes(Samples, Values, fill=Species, color=Species)) +
geom_bar(position="dodge", stat="identity") + geom_errorbar(limits, position="dodge") +
scale_fill_brewer(palette="Blues") + scale_color_brewer(palette = "Greys")
print(p)
p <- ggplot(df_mean, aes(Samples, Values, fill=Species, color=Species)) +
geom_bar(position="dodge", stat="identity") + geom_errorbar(limits, position="dodge") +
scale_fill_manual(values=c("red", "green3", "blue")) +
scale_color_manual(values=c("red", "green3", "blue"))
print(p)
Bar plots
Species
components of the first four columns in the iris
data set. Use the melt
function from the reshape2
package to bring the data into the expected format for ggplot
.Structure of iris data set
class(iris)
## [1] "data.frame"
iris[1:4,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
Here for line plot
y <- matrix(rnorm(500), 100, 5, dimnames=list(paste("g", 1:100, sep=""), paste("Sample", 1:5, sep="")))
y <- data.frame(Position=1:length(y[,1]), y)
y[1:4, ] # First rows of input format expected by melt()
## Position Sample1 Sample2 Sample3 Sample4 Sample5
## g1 1 1.5336975 -1.0365027 -2.0276195 -0.4580396 -0.06460952
## g2 2 -2.0960304 2.1878704 0.7260334 0.8274617 0.24192162
## g3 3 -0.8233125 0.4250477 0.6526331 -0.4509962 -1.06778801
## g4 4 1.0961555 0.8101104 -0.3403762 -0.7222191 -0.72737741
df <- melt(y, id.vars=c("Position"), variable.name = "Samples", value.name="Values")
p <- ggplot(df, aes(Position, Values)) + geom_line(aes(color=Samples)) + facet_wrap(~Samples, ncol=1)
print(p)
Same data can be represented in box plot as follows
ggplot(df, aes(Samples, Values, fill=Samples)) + geom_boxplot() + geom_jitter(color="darkgrey")
p <- ggplot(dsmall, aes(color, price/carat)) +
geom_jitter(alpha = I(1 / 2), aes(color=color))
print(p)
p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_boxplot()
print(p)
p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_violin()
print(p)
Same violin plot as interactive plot generated with ggplotly
, where the actual data points are shown as well by including geom_jitter()
.
p <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_violin() + geom_jitter(aes(color=color))
ggplotly(p)
p <- ggplot(dsmall, aes(carat)) + geom_density(aes(color = color))
print(p)
p <- ggplot(dsmall, aes(carat)) + geom_density(aes(fill = color))
print(p)
p <- ggplot(iris, aes(x=Sepal.Width)) +
geom_histogram(aes(y = ..density.., fill = ..count..), binwidth=0.2) +
geom_density()
print(p)
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
df <- data.frame(variable=rep(c("cat", "mouse", "dog", "bird", "fly")),
value=c(1,3,3,4,2))
p <- ggplot(df, aes(x = "", y = value, fill = variable)) +
geom_bar(width = 1, stat="identity") +
coord_polar("y", start=pi / 3) + ggtitle("Pie Chart")
print(p)
p <- ggplot(df, aes(x = variable, y = value, fill = variable)) +
geom_bar(width = 1, stat="identity") +
coord_polar("y", start=pi / 3) +
ggtitle("Pie Chart")
print(p)
Using grid
package
library(grid)
a <- ggplot(dsmall, aes(color, price/carat)) + geom_jitter(size=4, alpha = I(1 / 1.5), aes(color=color))
b <- ggplot(dsmall, aes(color, price/carat, color=color)) + geom_boxplot()
c <- ggplot(dsmall, aes(color, price/carat, fill=color)) + geom_boxplot() + theme(legend.position = "none")
grid.newpage() # Open a new page on grid device
pushViewport(viewport(layout = grid.layout(2, 2))) # Assign to device viewport with 2 by 2 grid layout
print(a, vp = viewport(layout.pos.row = 1, layout.pos.col = 1:2))
print(b, vp = viewport(layout.pos.row = 2, layout.pos.col = 1))
print(c, vp = viewport(layout.pos.row = 2, layout.pos.col = 2, width=0.3, height=0.3, x=0.8, y=0.8))
Using gridExtra
package
library(gridExtra)
grid.arrange(a, b, c, nrow = 2, ncol=2)
Also see patchwork
in ggplot2 book here.
library(grid)
print(a)
print(b, vp=viewport(width=0.3, height=0.3, x=0.8, y=0.8))
Spatial expression data can be visualized with the spatialHeatmap
package.
library(systemPipeR)
setlist5 <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20), D=sample(letters, 22), E=sample(letters, 18))
OLlist5 <- overLapper(setlist=setlist5, sep="_", type="vennsets")
vennPlot(OLlist5, mymain="", mysub="default", colmode=2, ccol=c("blue", "red"))
library(ComplexHeatmap)
setlist <- list(A=sample(letters, 18), B=sample(letters, 16), C=sample(letters, 20))
setma <- make_comb_mat(setlist)
UpSet(setma)
Plots depictions of small molecules with ChemmineR
package
library(ChemmineR)
data(sdfsample)
plot(sdfsample[1], print=FALSE)
There are many packages for plotting heatmaps in R. The following uses pheatmap
.
library(pheatmap); library("RColorBrewer")
y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
pheatmap(y, color=brewer.pal(9,"Blues"))
A variety of libraries are available for plotting receiver operating characteristic (ROC) curves in R:
Most commonly, in an ROC we plot the true positive rate (y-axis) against the false positive rate (x-axis) at decreasing thresholds. An illustrative example is provided in the ROCR
package where one wants to inspect the content of the ROCR.simple
object defining the structure of the input data in two vectors.
# install.packages("ROCR") # Install if necessary
library(ROCR)
data(ROCR.simple)
ROCR.simple
## $predictions
## [1] 0.612547843 0.364270971 0.432136142 0.140291078 0.384895941 0.244415489 0.970641299
## [8] 0.890172812 0.781781371 0.868751832 0.716680598 0.360168796 0.547983407 0.385240464
## [15] 0.423739359 0.101699993 0.628095575 0.744769966 0.657732644 0.490119891 0.072369921
## [22] 0.172741714 0.105722115 0.890078186 0.945548941 0.984667270 0.360180429 0.448687336
## [29] 0.014823599 0.543533783 0.292368449 0.701561487 0.715459280 0.714985914 0.120604738
## [36] 0.319672178 0.911723615 0.757325590 0.090988280 0.529402244 0.257402979 0.589909284
## [43] 0.708412104 0.326672910 0.086546283 0.879459891 0.362693564 0.230157183 0.779771989
## [50] 0.876086217 0.353281048 0.212014560 0.703293499 0.689075677 0.627012496 0.240911145
## [57] 0.402801992 0.134794140 0.120473353 0.665444679 0.536339509 0.623494622 0.885179651
## [64] 0.353777439 0.408939895 0.265686095 0.932159806 0.248500489 0.858876675 0.491735594
## [71] 0.151350957 0.694457482 0.496513160 0.123504905 0.499788081 0.310718619 0.907651100
## [78] 0.340078180 0.195097957 0.371936985 0.517308606 0.419560072 0.865639036 0.018527600
## [85] 0.539086009 0.005422562 0.772728821 0.703885141 0.348213542 0.277656869 0.458674210
## [92] 0.059045866 0.133257805 0.083685883 0.531958184 0.429650397 0.717845453 0.537091350
## [99] 0.212404891 0.930846938 0.083048377 0.468610247 0.393378108 0.663367560 0.349540913
## [106] 0.194398425 0.844415442 0.959417835 0.211378771 0.943432189 0.598162949 0.834803976
## [113] 0.576836208 0.380396459 0.161874325 0.912325837 0.642933593 0.392173971 0.122284044
## [120] 0.586857799 0.180631658 0.085993218 0.700501359 0.060413627 0.531464015 0.084254795
## [127] 0.448484671 0.938583020 0.531006532 0.785213140 0.905121019 0.748438143 0.605235403
## [134] 0.842974300 0.835981859 0.364288579 0.492596896 0.488179708 0.259278968 0.991096434
## [141] 0.757364019 0.288258273 0.773336236 0.040906997 0.110241034 0.760726142 0.984599159
## [148] 0.253271061 0.697235328 0.620501132 0.814586047 0.300973098 0.378092079 0.016694412
## [155] 0.698826511 0.658692553 0.470206008 0.501489336 0.239143340 0.050999138 0.088450984
## [162] 0.107031842 0.746588080 0.480100183 0.336592126 0.579511087 0.118555284 0.233160827
## [169] 0.461150807 0.370549294 0.770178504 0.537336015 0.463227453 0.790240205 0.883431431
## [176] 0.745110673 0.007746305 0.012653524 0.868331219 0.439399995 0.540221346 0.567043171
## [183] 0.035815400 0.806543942 0.248707470 0.696702150 0.081439129 0.336315317 0.126480399
## [190] 0.636728451 0.030235062 0.268138293 0.983494405 0.728536415 0.739554341 0.522384507
## [197] 0.858970526 0.383807972 0.606960209 0.138387070
##
## $labels
## [1] 1 1 0 0 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0
## [48] 1 0 1 1 0 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 1 0
## [95] 1 0 1 1 0 1 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 1 1 1 1 1 0 1 1 0 0 0 0 1 1
## [142] 0 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 0 0 1
## [189] 0 0 0 1 0 1 1 0 1 0 1 0
pred <- prediction(ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance( pred, "tpr", "fpr" )
plot(perf)
Obtain area under the curve (AUC)
auc <- performance( pred, "tpr", "fpr", measure = "auc")
auc@y.values[[1]]
## [1] 0.8341875
The plotROC
package plots ROCs with ggplot2
. The following generates a sample data set for several performance results, here three. For convenience the data are first arranged in a data.frame
in wide format. Next, the melt_roc
is used to convert the data.frame
into the long format as required by ggplot
.
library(plotROC)
set.seed(2529)
D.ex <- rbinom(200, size = 1, prob = .5); M1 <- rnorm(200, mean = D.ex, sd = .65); M2 <- rnorm(200, mean = D.ex, sd = 1.1); M3 <- rnorm(200, mean = D.ex, sd = 1.5)
perfDF <- data.frame(D = D.ex, D.str = c("TRUE", "Ill")[D.ex + 1], M1 = M1, M2 = M2, M3=M3, stringsAsFactors = FALSE)
perfDF[1:4,] # wide format
## D D.str M1 M2 M3
## 1 1 Ill 1.4811716 -1.57133510 0.6001155
## 2 1 Ill 0.6199448 1.34364757 -1.0240867
## 3 0 TRUE 0.5761334 0.05523887 -0.0199723
## 4 1 Ill 0.8543320 2.04131649 2.5675524
long_perfDF <- melt_roc(perfDF, "D", c("M1", "M2", "M3")) # transformed into long format for ggplot
long_perfDF[1:4,] # long format
## D M name
## M11 1 1.4811716 M1
## M12 1 0.6199448 M1
## M13 0 0.5761334 M1
## M14 1 0.8543320 M1
After converting the sample data into the long format the results can be plotted with geom_roc
, where several ROCs are combined in a single plot and the corresponding AUC values are shown in the legend.
multi_roc <- ggplot(long_perfDF, aes(d = D, m = M, color = name)) + geom_roc(n.cuts=0)
auc_df <- calc_auc(multi_roc) # calculate AUC values
auc_str <- paste0(auc_df$name, ": ", round(auc_df$AUC, 2))
multi_roc + scale_color_manual(name="AUC:", labels=auc_str, values=seq_along(auc_str))
The ape
package provides many useful utilities for phylogenetic analysis and tree plotting. Another useful package for plotting trees is ggtree
. The following example plots two trees face to face with links to identical leaf labels.
library(ape)
tree1 <- rtree(40)
tree2 <- rtree(20)
association <- cbind(tree2$tip.label, tree2$tip.label)
cophyloplot(tree1, tree2, assoc = association,
length.line = 4, space = 28, gap = 3)
ggbio
ggbio
?
ggbio
package (Yin, Cook, and Lawrence 2012) facilitates plotting of complex genome data objects, such as read alignments (SAM/BAM), genomic context/annotation information (gff/txdb), variant calls (VCF/BCF), and more. To easily compare these data sets, it extends the faceting facility of ggplot2
to genome browser-like tracks.GRanges
, GAlignments
, VCF
, etc. For more details, see Table 1.1 of the ggbio
vignette here.ggbio
’s convenience plotting function is autoplot
. For more customizable plots, one can use the generic ggplot
function.ggplot2
plotting components, ggbio
defines serval new components useful for genomic data visualization. A detailed list is given in Table 1.2 of the vignette here.library(ggbio)
df1 <- data.frame(time = 1:100, score = sin((1:100)/20)*10)
p1 <- qplot(data = df1, x = time, y = score, geom = "line")
df2 <- data.frame(time = 30:120, score = sin((30:120)/20)*10, value = rnorm(120-30 +1))
p2 <- ggplot(data = df2, aes(x = time, y = score)) + geom_line() + geom_point(size = 2, aes(color = value))
tracks(time1 = p1, time2 = p2) + xlim(1, 40) + theme_tracks_sunset()
GRanges
objects are essential for storing alignment or annotation ranges in R/Bioconductor. The following creates a sample GRanges
object and plots its content.
library(GenomicRanges)
set.seed(1); N <- 100; gr <- GRanges(seqnames = sample(c("chr1", "chr2", "chr3"), size = N, replace = TRUE), IRanges(start = sample(1:300, size = N, replace = TRUE), width = sample(70:75, size = N,replace = TRUE)), strand = sample(c("+", "-"), size = N, replace = TRUE), value = rnorm(N, 10, 3), score = rnorm(N, 100, 30), sample = sample(c("Normal", "Tumor"), size = N, replace = TRUE), pair = sample(letters, size = N, replace = TRUE))
autoplot(gr, aes(color = strand, fill = strand), facets = strand ~ seqnames)
## Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
## ℹ Please use the `rows` argument instead.
## ℹ The deprecated feature was likely used in the ggbio package.
## Please report the issue at <https://github.com/lawremi/ggbio/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
autoplot(gr, aes(color = strand, fill = strand), facets = strand ~ seqnames, stat = "coverage")
pos <- sapply(coverage(gr[strand(gr)=="+"]), as.numeric)
pos <- data.frame(Chr=rep(names(pos), sapply(pos, length)), Strand=rep("+", length(unlist(pos))), Position=unlist(sapply(pos, function(x) 1:length(x))), Coverage=as.numeric(unlist(pos)))
neg <- sapply(coverage(gr[strand(gr)=="-"]), as.numeric)
neg <- data.frame(Chr=rep(names(neg), sapply(neg, length)), Strand=rep("-", length(unlist(neg))), Position=unlist(sapply(neg, function(x) 1:length(x))), Coverage=-as.numeric(unlist(neg)))
covdf <- rbind(pos, neg)
p <- ggplot(covdf, aes(Position, Coverage, fill=Strand)) +
geom_col() +
facet_wrap(~Chr)
p
ggplot(gr) +
layout_circle(aes(fill = seqnames), geom = "rect")
More complex circular example
seqlengths(gr) <- c(400, 500, 700)
values(gr)$to.gr <- gr[sample(1:length(gr), size = length(gr))]
idx <- sample(1:length(gr), size = 50)
gr <- gr[idx]
ggplot() +
layout_circle(gr, geom = "ideo", fill = "gray70", radius = 7, trackWidth = 3) +
layout_circle(gr, geom = "bar", radius = 10, trackWidth = 4, aes(fill = score, y = score)) +
layout_circle(gr, geom = "point", color = "red", radius = 14, trackWidth = 3, grid = TRUE, aes(y = score)) +
layout_circle(gr, geom = "link", linked.to = "to.gr", radius = 6, trackWidth = 1)
To make the following example work, please download and unpack this data archive containing GFF, BAM and VCF sample files.
library(rtracklayer); library(GenomicFeatures); library(Rsamtools); library(GenomicAlignments); library(VariantAnnotation)
ga <- readGAlignments("./data/SRR064167.fastq.bam", use.names=TRUE, param=ScanBamParam(which=GRanges("Chr5", IRanges(4000, 8000))))
p1 <- autoplot(ga, geom = "rect")
p2 <- autoplot(ga, geom = "line", stat = "coverage")
vcf <- readVcf(file="data/varianttools_gnsap.vcf", genome="ATH1")
p3 <- autoplot(vcf[seqnames(vcf)=="Chr5"], type = "fixed") + xlim(4000, 8000) + theme(legend.position = "none", axis.text.y = element_blank(), axis.ticks.y=element_blank())
txdb <- makeTxDbFromGFF(file="./data/TAIR10_GFF3_trunc.gff", format="gff3")
p4 <- autoplot(txdb, which=GRanges("Chr5", IRanges(4000, 8000)), names.expr = "gene_id")
tracks(Reads=p1, Coverage=p2, Variant=p3, Transcripts=p4, heights = c(0.3, 0.2, 0.1, 0.35)) + ylab("")
See autoplot
demo here
View genome data in IGV
File -> Load from URL...
https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064154.fastq.bam
https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064155.fastq.bam
https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064166.fastq.bam
https://cluster.hpcc.ucr.edu/~tgirke/Teaching/GEN242/bam_samples/SRR064167.fastq.bam
Chr1:49,457-51,457
in position menu on top.For viewing BAM files in IGV as part of systemPipeR
workflows.
systemPipeR
: utilities for building NGS analysis pipelines.library("systemPipeR")
symLink2bam(sysargs=args, htmldir=c("~/.html/", "somedir/"),
urlbase="http://myserver.edu/~username/",
urlfile="IGVurl.txt")
Open IGV before running the following routine. Alternatively, open IGV from within R with startIGV("lm")
. Note this may not work on all systems.
library(SRAdb)
myurls <- readLines("http://cluster.hpcc.ucr.edu/~tgirke/Documents/R_BioCond/Samples/bam_urls.txt")
#startIGV("lm") # opens IGV
sock <- IGVsocket()
session <- IGVsession(files=myurls,
sessionFile="session.xml",
genome="A. thaliana (TAIR10)")
IGVload(sock, session)
IGVgoto(sock, 'Chr1:45296-47019')
sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 11 (bullseye)
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
## [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/Los_Angeles
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 grid stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] VariantAnnotation_1.46.0 GenomicFeatures_1.52.0 AnnotationDbi_1.62.1
## [4] rtracklayer_1.60.0 ggbio_1.48.0 ape_5.7-1
## [7] plotROC_2.3.0 ROCR_1.0-11 pheatmap_1.0.12
## [10] ChemmineR_3.52.0 ComplexHeatmap_2.16.0 systemPipeR_2.6.0
## [13] ShortRead_1.58.0 GenomicAlignments_1.36.0 SummarizedExperiment_1.30.1
## [16] Biobase_2.60.0 MatrixGenerics_1.12.0 matrixStats_0.63.0
## [19] BiocParallel_1.34.1 Rsamtools_2.16.0 Biostrings_2.68.0
## [22] XVector_0.40.0 GenomicRanges_1.52.0 GenomeInfoDb_1.36.0
## [25] IRanges_2.34.0 S4Vectors_0.38.1 BiocGenerics_0.46.0
## [28] gridExtra_2.3 RColorBrewer_1.1-3 reshape2_1.4.4
## [31] plotly_4.10.1 ggplot2_3.4.2 lattice_0.21-8
## [34] BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] splines_4.3.0 BiocIO_1.10.0 bitops_1.0-7
## [4] filelock_1.0.2 tibble_3.2.1 graph_1.78.0
## [7] XML_3.99-0.14 rpart_4.1.19 lifecycle_1.0.3
## [10] doParallel_1.0.17 OrganismDbi_1.42.0 ensembldb_2.24.0
## [13] crosstalk_1.2.0 backports_1.4.1 magrittr_2.0.3
## [16] Hmisc_5.1-0 sass_0.4.6 rmarkdown_2.21
## [19] jquerylib_0.1.4 yaml_2.3.7 DBI_1.1.3
## [22] zlibbioc_1.46.0 purrr_1.0.1 AnnotationFilter_1.24.0
## [25] biovizBase_1.48.0 RCurl_1.98-1.12 nnet_7.3-19
## [28] rappdirs_0.3.3 circlize_0.4.15 GenomeInfoDbData_1.2.10
## [31] codetools_0.2-19 DelayedArray_0.26.2 xml2_1.3.4
## [34] DT_0.27 tidyselect_1.2.0 shape_1.4.6
## [37] farver_2.1.1 BiocFileCache_2.8.0 base64enc_0.1-3
## [40] jsonlite_1.8.4 GetoptLong_1.0.5 ellipsis_0.3.2
## [43] Formula_1.2-5 iterators_1.0.14 foreach_1.5.2
## [46] tools_4.3.0 progress_1.2.2 Rcpp_1.0.10
## [49] glue_1.6.2 xfun_0.39 mgcv_1.8-42
## [52] dplyr_1.1.2 withr_2.5.0 BiocManager_1.30.20
## [55] fastmap_1.1.1 GGally_2.1.2 latticeExtra_0.6-30
## [58] fansi_1.0.4 digest_0.6.31 R6_2.5.1
## [61] colorspace_2.1-0 rsvg_2.4.0 Cairo_1.6-0
## [64] jpeg_0.1-10 dichromat_2.0-0.1 biomaRt_2.56.0
## [67] RSQLite_2.3.1 utf8_1.2.3 tidyr_1.3.0
## [70] generics_0.1.3 data.table_1.14.8 prettyunits_1.1.1
## [73] httr_1.4.6 htmlwidgets_1.6.2 S4Arrays_1.0.1
## [76] pkgconfig_2.0.3 gtable_0.3.3 blob_1.2.4
## [79] hwriter_1.3.2.1 htmltools_0.5.5 RBGL_1.76.0
## [82] ProtGenerics_1.32.0 clue_0.3-64 scales_1.2.1
## [85] png_0.1-8 knitr_1.42 rstudioapi_0.14
## [88] rjson_0.2.21 checkmate_2.2.0 nlme_3.1-162
## [91] curl_5.0.0 cachem_1.0.8 GlobalOptions_0.1.2
## [94] stringr_1.5.0 parallel_4.3.0 foreign_0.8-84
## [97] restfulr_0.0.15 pillar_1.9.0 reshape_0.8.9
## [100] vctrs_0.6.2 dbplyr_2.3.2 cluster_2.1.4
## [103] htmlTable_2.4.1 evaluate_0.21 magick_2.7.4
## [106] cli_3.6.1 compiler_4.3.0 rlang_1.1.1
## [109] crayon_1.5.2 labeling_0.4.2 interp_1.1-4
## [112] plyr_1.8.8 stringi_1.7.12 viridisLite_0.4.2
## [115] deldir_1.0-6 munsell_0.5.0 lazyeval_0.2.2
## [118] Matrix_1.5-4 BSgenome_1.68.0 hms_1.1.3
## [121] bit64_4.0.5 KEGGREST_1.40.0 highr_0.10
## [124] memoise_2.0.1 bslib_0.4.2 bit_4.0.5
Wickham, Hadley. 2009. “ggplot2: elegant graphics for data analysis.” http://had.co.nz/ggplot2/book.
———. 2010. “A Layered Grammar of Graphics.” J. Comput. Graph. Stat. 19 (1): 3–28. https://doi.org/10.1198/jcgs.2009.07098.
Wilkinson, Leland. 2012. “The Grammar of Graphics.” In Handbook of Computational Statistics: Concepts and Methods, edited by James E Gentle, Wolfgang Karl Härdle, and Yuichi Mori, 375–414. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-21551-3\_13.
Yin, T, D Cook, and M Lawrence. 2012. “Ggbio: An R Package for Extending the Grammar of Graphics for Genomic Data.” Genome Biol. 13 (8). https://doi.org/10.1186/gb-2012-13-8-r77.
Zhang, H, P Meltzer, and S Davis. 2013. “RCircos: An R Package for Circos 2D Track Plots.” BMC Bioinformatics 14: 244–44. https://doi.org/10.1186/1471-2105-14-244.