What is ggplot2
?
High-level graphics system
Implements grammar of graphics from Leland Wilkinson
Streamlines many graphics workflows for complex plots
Syntax centered around main ggplot
function
Simpler qplot
function provides many shortcuts
Documentation and Help
ggplot2
Usage
ggplot
function accepts two arguments
Data set to be plotted
Aesthetic mappings provided by aes
function
Additional parameters such as geometric objects (e.g. points, lines, bars) are passed on by appending them with +
as separator.
List of available geom_*
functions see here
Settings of plotting theme can be accessed with the command theme_get()
and its settings can be changed with theme()
.
Preferred input data object
qplot
: data.frame
(support for vector
, matrix
, ...
)
ggplot
: data.frame
Packages with convenience utilities to create expected inputs
qplot
Function
The syntax of qplot
is similar as R’s basic plot
function
Arguments
x
: x-coordinates (e.g. col1
)
y
: y-coordinates (e.g. col2
)
data
: data frame with corresponding column names
xlim, ylim
: e.g. xlim=c(0,10)
log
: e.g. log="x"
or log="xy"
main
: main title; see ?plotmath
for mathematical formula
xlab, ylab
: labels for the x- and y-axes
color
, shape
, size
...
: many arguments accepted by plot
function
qplot
: scatter plot basics
Create sample data
library ( ggplot2 )
x <- sample ( 1 : 10 , 10 ); y <- sample ( 1 : 10 , 10 ); cat <- rep ( c ( "A" , "B" ), 5 )
Simple scatter plot
qplot ( x , y , geom = "point" )
Prints dots with different sizes and colors
qplot ( x , y , geom = "point" , size = x , color = cat ,
main = "Dot Size and Color Relative to Some Values" )
Drops legend
qplot ( x , y , geom = "point" , size = x , color = cat ) +
theme ( legend.position = "none" )
Plot different shapes
qplot ( x , y , geom = "point" , size = 5 , shape = cat )
Colored groups
p <- qplot ( x , y , geom = "point" , size = x , color = cat ,
main = "Dot Size and Color Relative to Some Values" ) +
theme ( legend.position = "none" )
print ( p )
Regression line
set.seed ( 1410 )
dsmall <- diamonds [ sample ( nrow ( diamonds ), 1000 ), ]
p <- qplot ( carat , price , data = dsmall ) +
geom_smooth ( method = "lm" )
print ( p )
Local regression curve (loess)
p <- qplot ( carat , price , data = dsmall , geom = c ( "point" , "smooth" ))
print ( p ) # Setting se = FALSE removes error shade
ggplot
Function
More important than qplot
to access full functionality of ggplot2
Main arguments
data set, usually a data.frame
aesthetic mappings provided by aes
function
General ggplot
syntax
ggplot(data, aes(...)) + geom() + ... + stat() + ...
Layer specifications
geom(mapping, data, ..., geom, position)
stat(mapping, data, ..., stat, position)
Additional components
aes()
mappings can be passed on to all components (ggplot, geom
, etc.). Effects are global when passed on to ggplot()
and local for other components.
x, y
color
: grouping vector (factor)
group
: grouping vector (factor)
Changing Plotting Themes in ggplot
Theme settings can be accessed with theme_get()
Their settings can be changed with theme()
Example how to change background color to white
... + theme ( panel.background = element_rect ( fill = "white" , colour = "black" ))
Storing ggplot
Specifications
Plots and layers can be stored in variables
p <- ggplot ( dsmall , aes ( carat , price )) + geom_point ()
p # or print ( p )
Returns information about data and aesthetic mappings followed by each layer
summary ( p )
Print dots with different sizes and colors
bestfit <- geom_smooth ( methodw = "lm" , se = F , color = alpha ( "steelblue" , 0.5 ), size = 2 )
p + bestfit # Plot with custom regression line
Syntax to pass on other data sets
p %+% diamonds [ sample ( nrow ( diamonds ), 100 ),]
Saves plot stored in variable p
to file
ggsave ( p , file = "myplot.pdf" )
ggplot
: scatter plots
Basic example
p <- ggplot ( dsmall , aes ( carat , price , color = color )) +
geom_point ( size = 4 )
print ( p )
Regression line
p <- ggplot ( dsmall , aes ( carat , price )) + geom_point () +
geom_smooth ( method = "lm" , se = FALSE ) +
theme ( panel.background = element_rect ( fill = "white" , colour = "black" ))
print ( p )
Several regression lines
p <- ggplot ( dsmall , aes ( carat , price , group = color )) +
geom_point ( aes ( color = color ), size = 2 ) +
geom_smooth ( aes ( color = color ), method = "lm" , se = FALSE )
print ( p )
Local regression curve (loess)
p <- ggplot ( dsmall , aes ( carat , price )) + geom_point () + geom_smooth ()
print ( p ) # Setting se = FALSE removes error shade
ggplot
: line plot
p <- ggplot ( iris , aes ( Petal.Length , Petal.Width , group = Species ,
color = Species )) + geom_line ()
print ( p )
Faceting
p <- ggplot ( iris , aes ( Sepal.Length , Sepal.Width )) +
geom_line ( aes ( color = Species ), size = 1 ) +
facet_wrap ( ~ Species , ncol = 1 )
print ( p )
Exercise 3
Scatter plots with ggplot2
Task 1 : Generate scatter plot for first two columns in \Rfunction{iris} data frame and color dots by its \Rfunction{Species} column.
Task 2 : Use the \Rfunarg{xlim, ylim} functionss to set limits on the x- and y-axes so that all data points are restricted to the left bottom quadrant of the plot.
Task 3 : Generate corresponding line plot with faceting show individual data sets in saparate plots.
Structure of iris
data set
class ( iris )
## [1] "data.frame"
iris [ 1 : 4 ,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
table ( iris $ Species )
##
## setosa versicolor virginica
## 50 50 50
Bar plots
Sample Set: the following transforms the iris
data set into a ggplot2-friendly format.
Calculate mean values for aggregates given by Species
column in iris
data set
iris_mean <- aggregate ( iris [, 1 : 4 ], by = list ( Species = iris $ Species ), FUN = mean )
Calculate standard deviations for aggregates given by Species
column in iris
data set
iris_sd <- aggregate ( iris [, 1 : 4 ], by = list ( Species = iris $ Species ), FUN = sd )
Reformat iris_mean
with melt
library ( reshape2 ) # Defines melt function
df_mean <- melt ( iris_mean , id.vars = c ( "Species" ), variable.name = "Samples" , value.name = "Values" )
Reformat iris_sd
with melt
df_sd <- melt ( iris_sd , id.vars = c ( "Species" ), variable.name = "Samples" , value.name = "Values" )
Define standard deviation limits
limits <- aes ( ymax = df_mean [, "Values" ] + df_sd [, "Values" ], ymin = df_mean [, "Values" ] - df_sd [, "Values" ])
Verical orientation
p <- ggplot ( df_mean , aes ( Samples , Values , fill = Species )) +
geom_bar ( position = "dodge" , stat = "identity" )
print ( p )
Horizontal orientation
p <- ggplot ( df_mean , aes ( Samples , Values , fill = Species )) +
geom_bar ( position = "dodge" , stat = "identity" ) + coord_flip () +
theme ( axis.text.y = element_text ( angle = 0 , hjust = 1 ))
print ( p )
Faceting
p <- ggplot ( df_mean , aes ( Samples , Values )) + geom_bar ( aes ( fill = Species ), stat = "identity" ) +
facet_wrap ( ~ Species , ncol = 1 )
print ( p )
Error bars
p <- ggplot ( df_mean , aes ( Samples , Values , fill = Species )) +
geom_bar ( position = "dodge" , stat = "identity" ) + geom_errorbar ( limits , position = "dodge" )
print ( p )
Mirrored
df <- data.frame ( group = rep ( c ( "Above" , "Below" ), each = 10 ), x = rep ( 1 : 10 , 2 ), y = c ( runif ( 10 , 0 , 1 ), runif ( 10 , -1 , 0 )))
p <- ggplot ( df , aes ( x = x , y = y , fill = group )) +
geom_bar ( stat = "identity" , position = "identity" )
print ( p )
Changing Color Settings
library ( RColorBrewer )
# display.brewer.all()
p <- ggplot ( df_mean , aes ( Samples , Values , fill = Species , color = Species )) +
geom_bar ( position = "dodge" , stat = "identity" ) + geom_errorbar ( limits , position = "dodge" ) +
scale_fill_brewer ( palette = "Blues" ) + scale_color_brewer ( palette = "Greys" )
print ( p )
Using standard colors
p <- ggplot ( df_mean , aes ( Samples , Values , fill = Species , color = Species )) +
geom_bar ( position = "dodge" , stat = "identity" ) + geom_errorbar ( limits , position = "dodge" ) +
scale_fill_manual ( values = c ( "red" , "green3" , "blue" )) +
scale_color_manual ( values = c ( "red" , "green3" , "blue" ))
print ( p )
Exercise 4
Bar plots
Task 1 : Calculate the mean values for the Species
components of the first four columns in the iris
data set. Use the melt
function from the reshape2
package to bring the data into the expected format for ggplot
.
Task 2 : Generate two bar plots: one with stacked bars and one with horizontally arranged bars.
Structure of iris data set
class ( iris )
## [1] "data.frame"
iris [ 1 : 4 ,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
table ( iris $ Species )
##
## setosa versicolor virginica
## 50 50 50
Here for line plot
y <- matrix ( rnorm ( 500 ), 100 , 5 , dimnames = list ( paste ( "g" , 1 : 100 , sep = "" ), paste ( "Sample" , 1 : 5 , sep = "" )))
y <- data.frame ( Position = 1 : length ( y [, 1 ]), y )
y [ 1 : 4 , ] # First rows of input format expected by melt ()
## Position Sample1 Sample2 Sample3 Sample4 Sample5
## g1 1 1.32942477 -1.2084007 -0.1958190 -0.4236177 1.7139697
## g2 2 0.92190035 -0.3471160 3.3238031 -1.2340292 -0.3985408
## g3 3 0.01878173 0.8007423 -0.1884464 -0.7419688 -0.5565102
## g4 4 1.95620993 1.7876584 -0.4402745 0.3671016 0.3966960
df <- melt ( y , id.vars = c ( "Position" ), variable.name = "Samples" , value.name = "Values" )
p <- ggplot ( df , aes ( Position , Values )) + geom_line ( aes ( color = Samples )) + facet_wrap ( ~ Samples , ncol = 1 )
print ( p )
Same data can be represented in box plot as follows
ggplot ( df , aes ( Samples , Values , fill = Samples )) + geom_boxplot ()
Jitter Plots
p <- ggplot ( dsmall , aes ( color , price / carat )) +
geom_jitter ( alpha = I ( 1 / 2 ), aes ( color = color ))
print ( p )
Box plots
p <- ggplot ( dsmall , aes ( color , price / carat , fill = color )) + geom_boxplot ()
print ( p )
Density plots
Line coloring
p <- ggplot ( dsmall , aes ( carat )) + geom_density ( aes ( color = color ))
print ( p )
Area coloring
p <- ggplot ( dsmall , aes ( carat )) + geom_density ( aes ( fill = color ))
print ( p )
Histograms
p <- ggplot ( iris , aes ( x = Sepal.Width )) + geom_histogram ( aes ( y = ..density.. ,
fill = ..count.. ), binwidth = 0.2 ) + geom_density ()
print ( p )
Pie Chart
df <- data.frame ( variable = rep ( c ( "cat" , "mouse" , "dog" , "bird" , "fly" )),
value = c ( 1 , 3 , 3 , 4 , 2 ))
p <- ggplot ( df , aes ( x = "" , y = value , fill = variable )) +
geom_bar ( width = 1 , stat = "identity" ) +
coord_polar ( "y" , start = pi / 3 ) + ggtitle ( "Pie Chart" )
print ( p )
Wind Rose Pie Chart
p <- ggplot ( df , aes ( x = variable , y = value , fill = variable )) +
geom_bar ( width = 1 , stat = "identity" ) + coord_polar ( "y" , start = pi / 3 ) +
ggtitle ( "Pie Chart" )
print ( p )
Arranging Graphics on Page
library ( grid )
a <- ggplot ( dsmall , aes ( color , price / carat )) + geom_jitter ( size = 4 , alpha = I ( 1 / 1.5 ), aes ( color = color ))
b <- ggplot ( dsmall , aes ( color , price / carat , color = color )) + geom_boxplot ()
c <- ggplot ( dsmall , aes ( color , price / carat , fill = color )) + geom_boxplot () + theme ( legend.position = "none" )
grid.newpage () # Open a new page on grid device
pushViewport ( viewport ( layout = grid.layout ( 2 , 2 ))) # Assign to device viewport with 2 by 2 grid layout
print ( a , vp = viewport ( layout.pos.row = 1 , layout.pos.col = 1 : 2 ))
print ( b , vp = viewport ( layout.pos.row = 2 , layout.pos.col = 1 ))
print ( c , vp = viewport ( layout.pos.row = 2 , layout.pos.col = 2 , width = 0.3 , height = 0.3 , x = 0.8 , y = 0.8 ))
Inserting Graphics into Plots
library ( grid )
print ( a )
print ( b , vp = viewport ( width = 0.3 , height = 0.3 , x = 0.8 , y = 0.8 ))