Introduction to ggplot2

Learning objectives

After today’s lecture, you’ll be able to:

  • Understand the basic syntax of ggplot.
  • Create basic plots: bar, points, lines, boxplots, error bars, etc.
  • Create color palettes and use colors effectively: qualitative, secuential, and diverging palettes.
  • Customize the theme of a plot.

Grammar of Graphics

  • gglot2 is an R package for creating graphics.

  • Created by Hadley Wickham and is considered to be part of the tidyverse.

  • Compose graphs by combining independent components: versatile!

  • If you learn the grammar then you will end up creating better graphics in less time.

Create graphs for publications

Create graphs for publications

  • Tallian, A., Mattisson, J., Samelius, G., Odden, J., Mishra, C., Linnell, J.D., Lkhagvajav, P. and Johansson, O., 2023. Wild versus domestic prey: Variation in the kill-site behavior of two large felids. Global Ecology and Conservation, e02650.

  • Semper-Pascual, A., Sheil, D., Beaudrot, L. et al. Occurrence dynamics of mammals in protected tropical forests respond to human presence and activities. Nat Ecol Evol 7, 1092-1103 (2023). https://doi.org/10.1038/s41559-023-02060-6

Data structure

  • Wide format
species 2007 2008 2009
Adelie 3750 NA NA
Adelie 3800 NA NA
Adelie 3250 NA NA
Adelie NA NA NA
Adelie 3450 NA NA
Adelie 3650 NA NA
Adelie 3625 NA NA
Adelie 4675 NA NA
Adelie 3475 NA NA
Adelie 4250 NA NA
  • Long format
species year island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Adelie 2007 Torgersen 39.1 18.7 181 3750 male
Adelie 2007 Torgersen 39.5 17.4 186 3800 female
Adelie 2007 Torgersen 40.3 18.0 195 3250 female
Adelie 2007 Torgersen NA NA NA NA NA
Adelie 2007 Torgersen 36.7 19.3 193 3450 female
Adelie 2007 Torgersen 39.3 20.6 190 3650 male
Adelie 2007 Torgersen 38.9 17.8 181 3625 female
Adelie 2007 Torgersen 39.2 19.6 195 4675 male
Adelie 2007 Torgersen 34.1 18.1 193 3475 NA
Adelie 2007 Torgersen 42.0 20.2 190 4250 NA
  • Long format data.

  • Each row is an observation point and each column is a variable.

  • Data wrangle BEFORE you graph: tidyr::pivot_longer()

ggplot2

Mapping components

  • 6 main building blocks, each with their own arguments.
ggplot(data = data, mapping = aes(x = x, y = y)) +
  geom_*( ) +
  coord_*( ) +
  facet_*( ) +
  scale_*( ) +
  theme_*( )

ggplot()

ggplot(data = penguins,
       mapping = aes(x = body_mass_g, 
                     y = flipper_length_mm)) 

  • ggplot(): graphing space.

  • data

    • data frame or tibble in long format.
    • reference object for all subsequent arguments and functions.
  • aes()

    • defines the axes and uses column names.
    • x=x, y=y, some only need x=x.
    • x, y included here will be used to build the entire plot.

geom_* :: different types of plots

ggplot(data = data) +
  geom_*(aes(x = x, 
             y = y, 
             color = color, # points, lines, error bars
             shape = shape, # see pch numbers 
             fill = fill), # bars, columns, boxplots, violins
         alpha=0.3, # transparency 
         shape = pch, # change the point shape; this is a number or vector of numbers
         position = position_dodge() # bar plots are not stacked
  ) 

  • x,y, but some only need x.
  • x, y included here used only for this specific geom.
  • Other aesthetic arguments: color, fill, shape (pch) take column names.
  • Static arguments outside the aes(): color, fill, shape, alpha (transparency, 0-1), position, size, or linewidth.

Different types of geoms

Pause to code some plots

scale_*()

  • Lets you change the visual values of a group aesthetic: colors, fills, shapes (scale_manual).
  • Discrete and continuous scales.
  • Predetermined color palettes: ggthemes::scale_color_colorblind()
  • Use xlab('x-axis title') or ylab('y-axis title') or ggtitle('title)
  • labs(title, subtitle, caption, alt)
  • Change x- or y- limits by using x_lims(c(0,1))
  • Find more info here

scale_*()

ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 color=island), size=3)+
  ggthemes::scale_color_colorblind()

ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 shape=island), size = 3.5)+
  scale_shape_discrete() #up to 6 discernible shapes

scale_*()

scale_*() functions can modify:

  • Position via scale_x_*() or scale_y_*()

  • Colors via scale_color_*() and scale_fill_*()

  • Transparency via scale_alpha_*()

  • Sizes via scale_size_*()

  • Shapes via scale_shape_*()

    • * can take the following forms:
      • axes: continuous, discrete, reverse, log10, sqrt, date, time.
      • Colors & fill: continuous, discrete, manual, gradient, hue, brewer.
      • Transparency: continuous, discrete, manual, ordinal, identity, date.
      • Sizes: continuous, discrete, manual, ordinal, identity, area, date.
      • Shapes and line types: continuous, discrete, manual, ordinal, identity.

Color palettes

Color palette types:

  • Generally, there are 3 types of palettes:

    • Sequential: data that goes from low to high.
    • Diverging: put equal emphasis on mid-range values and extremes.
    • Qualitative: best for categorical data. Visual differences are given by hues.

facet_*()

  • We have two options facet_wrap() and facet_grid().
  • Facets divide a plot into subplots based on a variable in the dataset.
  • Allows for comparison across groups.
ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 color=island))+
  facet_wrap(~island)

ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 color=island))+
  facet_grid(species~island)

facet_*()

theme_*(): pre-established themes.

ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 color=island))+
  theme_classic()

  • Modifies the overall visual defaults of a plot.

    • titles
    • labels
    • fonts
    • background
    • gridlines
    • legends
  • theme() and theme_().

    • theme_ will have predefined themes.
    • theme will help you customize and personalize the overall look of your plot.
    • You can start with a predefined theme and then customize it with theme_.
  • theme() will include element_ functions to modify different areas.

  • Predefined ggplot2 themes: theme_classic(), theme_gray(), theme_bw(), theme_linedraw(), theme_light(), theme_dark(), theme_minimal(), theme_void()

theme_*(): modifying elements in the theme.

ggplot(penguins) +
  geom_point(aes(x = body_mass_g, 
                 y = flipper_length_mm, 
                 color=island))+
  theme(plot.background = element_rect(colour = 'green', fill = 'gray80'), 
        panel.background = element_rect(colour = 'orange', size = 3, fill = 'pink'),
        panel.grid.major = element_line(color = 'blue', size = 2), 
        legend.position = 'bottom', 
        axis.title = element_text(size = 20))

theme_*(): modifying elements in the theme.

Let’s improve our initial plots

Auxiliary packages

  • Here is a list with the current ggplot2 geoms
  • Other packages with additional geoms, here is a list:

Free online books