- fundamentals

nest_by() shares some similarities with group_by(), as both group multiple rows of a data frame by the unique values of the columns specified as arguments.

df %>%
  group_by(Country)
A tibble: 525461 x 8
Groups: Country [40]

The difference is in the output as nest_by() creates a list-column called data, where every element of it is a data frame containing the rows pertaining to a unique value of the nesting column

df %>%
  nest_by(Country)
A tibble: 40 x 2
Rowwise: Country

or to every existing combination of them in case we specify more than one.

df %>%
  nest_by(Country, `Customer ID`)
A tibble: 4401 x 3
Rowwise: Country, Customer ID

Notice how the rows are ordered by the nesting columns.

It is therefore similar to group_nest(),

df %>%
  group_nest(Country)
A tibble: 40 x 2

besides the fact that nest_by() returns a rowwise data frame, so we can subsequently apply the same manipulations to each data frame independently.

df %>%
  nest_by(Country) %>%
  reframe(Values = dim(data)) %>%
  bind_cols(tibble(Dimension = rep(c("Rows", "Columns"), 40)))
A tibble: 80 x 3

If we want to go back to the original data, we can use unnest() from tidyr.

df %>%
  nest_by(Country) %>%
  tidyr:::unnest(cols = c(data))
A tibble: 525461 x 8
Groups: Country [40]

The order of the columns has changed though (the nesting column has been moved to the front), the rows’ order is the one of the nesting column plus we obtained a grouped data frame.

In case we want to change the nesting column, we have to apply unnest(), ungroup() and then nest_by() again.

df %>%
  nest_by(Country) %>%
  tidyr:::unnest(cols = c(data)) %>%
  ungroup() %>%
  nest_by(`Customer ID`)
A tibble: 4384 x 2
Rowwise: Customer ID

As it is not possible to override it as we can with group_by().

df %>%
  group_by(Country) %>%
  group_by(`Customer ID`)
A tibble: 525461 x 8
Groups: Customer ID [4384]
df %>%
  nest_by(Country) %>%
  nest_by(`Customer ID`)
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `Customer ID` is not found.

Unless it is by the same columns, but that is redundant.

df %>%
  nest_by(Country) %>%
  nest_by(Country)
A tibble: 40 x 2
Rowwise: Country
df %>%
  nest_by(Country, `Customer ID`) %>%
  nest_by(Country, `Customer ID`)
A tibble: 4401 x 3
Rowwise: Country, Customer ID

To unnest we can also use reframe(), which ungroups the output as well.

df %>%
  nest_by(Country) %>%
  reframe(data)
A tibble: 525461 x 8

We can access every data frame like this,

df %>%
  nest_by(Country) %>%
  ungroup() %>%
  select(data) %>%
  slice(1) %>%
  tidyr::unnest(cols = c(data))
A tibble: 654 x 7

also without selecting data if we want to show the nesting column as well.

df %>%
  nest_by(Country) %>%
  ungroup() %>%
  slice(1) %>%
  tidyr::unnest(cols = c(data))
A tibble: 654 x 8

Without ungrouping we would not get the desired output but we would unnest the whole data frame.

df %>%
  nest_by(Country) %>%
  select(data) %>%
  slice(1) %>%
  tidyr::unnest(cols = c(data))
Adding missing grouping variables: `Country`
A tibble: 525461 x 8
Groups: Country [40]

Like group_by(), we can use nest_by() with expressions (the documentation states that “Computations are not allowed in nest_by()” though).

df %>%
  nest_by(CustomerID = as.character(`Customer ID`))
A tibble: 4384 x 2
Rowwise: CustomerID
df %>%
  nest_by(Price_Rank = dense_rank(Price))
A tibble: 1606 x 2
Rowwise: Price_Rank

Which are equivalent to performing a mutate() call beforehand.

df %>%
  mutate(CustomerID = as.character(`Customer ID`)) %>%
  nest_by(CustomerID)
A tibble: 4384 x 2
Rowwise: CustomerID
df %>%
  mutate(Price_Rank = min_rank(Price)) %>%
  nest_by(Price_Rank)
A tibble: 1606 x 2
Rowwise: Price_Rank

- .key

We can change the name of the list-column with the .key argument

df %>%
  nest_by(Country, .key = "list of dfs")
A tibble: 40 x 2
Rowwise: Country

- .keep

With the .keep argument instead we control whether or not to keep the nesting column in the data frames.

df %>%
  nest_by(Country , .keep = TRUE) %>%
  ungroup() %>%
  select(data) %>%
  slice(1) %>%
  tidyr::unnest(cols = c(data))
A tibble: 654 x 8

Ultimately, nest_by() is equivalent to these lines of code.

df %>%
  group_by(Country) %>%
  summarise(data = list(pick(everything()))) %>%
  rowwise()
A tibble: 40 x 2
Rowwise:

- with group_by()

On a grouped data frame, nest_by() only works without an argument, as it inherits the grouping column from group_by().

df %>%
  group_by(Country) %>%
  nest_by()
A tibble: 40 x 2
Rowwise: Country
df %>%
  group_by(Country) %>%
  nest_by(`Customer ID`)
Error in `nest_by()`:
! Can't re-group while nesting
ℹ Either `ungroup()` first or don't supply arguments to `nest_by()

Also if we specify the same column.

df %>%
  group_by(Country) %>%
  nest_by(Country)
Error in `nest_by()`:
! Can't re-group while nesting
ℹ Either `ungroup()` first or don't supply arguments to `nest_by()