Amanda R. Hart
  • Home
  • About
  • Research
  • Publications
  • Blog
  • Contact

How To: Use case_when

R
Author

Amanda R. Hart

Published

May 21, 2020

How To: use case_when to fill a new data column based on data from another column


I used the tidyverse package and its case_when() function to create a new column filled with data based on the information in an existing data column.

To demonstrate, load a TidyTuesday dataset from 3/31/20:

library(tidyverse)
brewer_size <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-31/brewer_size.csv")

For plotting purposes, I wanted to ensure that the brewer_size information would be listed in plots from smallest production (zero barrels of beer) to largest production (total production). To do this I created a new column that would default to the desired order when plotted and filled it using the case_when() function.

updated_brewer_size <- brewer_size %>% mutate(brewer_size_ordered = case_when(
        brewer_size == "Zero Barrels" ~ "A",
        brewer_size == "Under 1 Barrel" ~ "B",
        brewer_size == "1 to 1,000 Barrels" ~ "C",
        brewer_size == "1,001 to 7,500 Barrels" ~ "D",
        brewer_size == "7,501 to 15,000 Barrels" ~ "E",
        brewer_size == "5,001 to 30,000 Barrels" ~ "F",
        brewer_size == "30,001 to 60,000 Barrels" ~ "G",
        brewer_size == "60,001 to 100,000 Barrels" ~ "H",
        brewer_size == "100,001 to 500,000 Barrels" ~ "I",
        brewer_size == "500,001 to 1,000,000 Barrels" ~ "J",
        brewer_size == "1,000,001 to 6,000,000 Barrels" ~ "K",
        brewer_size == "6,000,001 Barrels and Over" ~ "L",
        brewer_size == "1,000,000 to 6,000,000 Barrels" ~ "M",
        brewer_size == "2,000,000 to 6,000,000 Barrels" ~ "N",
        brewer_size == "1,00,001 to 1,999,999 Barrels" ~ "O",
        brewer_size == "Total" ~ "P"))

Note: the information provided to the case_when() function must be 2-sided where on the left you have a statement that could be passed to which() and on the right of the ~ you have the value to replace it with in the new column.

head(updated_brewer_size)
# A tibble: 6 × 7
   year brewer_size                    n_of_br…¹ total…² taxab…³ total…⁴ brewe…⁵
  <dbl> <chr>                              <dbl>   <dbl>   <dbl>   <dbl> <chr>  
1  2009 6,000,001 Barrels and Over            18  1.71e8  1.60e8 3639970 L      
2  2009 1,000,001 to 6,000,000 Barrels         4  9.97e6  9.59e6   14548 K      
3  2009 500,001 to 1,000,000 Barrels           7  4.83e6  4.54e6   21563 J      
4  2009 100,001 to 500,000 Barrels            27  5.42e6  4.47e6  128000 I      
5  2009 60,001 to 100,000 Barrels             19  1.50e6  1.22e6   95732 H      
6  2009 30,001 to 60,000 Barrels              32  1.41e6  1.23e6   14369 G      
# … with abbreviated variable names ¹​n_of_brewers, ²​total_barrels,
#   ³​taxable_removals, ⁴​total_shipped, ⁵​brewer_size_ordered