nest_by() shares some similarities with
group_by(), as both group multiple rows of a data frame by
the unique values of the columns specified as arguments.
df %>%
group_by(Country)The difference is in the output as nest_by() creates a
list-column called data, where every element of it is a data frame
containing the rows pertaining to a unique value of the nesting
column
df %>%
nest_by(Country)or to every existing combination of them in case we specify more than one.
df %>%
nest_by(Country, `Customer ID`)Notice how the rows are ordered by the nesting columns.
It is therefore similar to group_nest(),
df %>%
group_nest(Country)besides the fact that nest_by() returns a rowwise data
frame, so we can subsequently apply the same manipulations to each data
frame independently.
df %>%
nest_by(Country) %>%
reframe(Values = dim(data)) %>%
bind_cols(tibble(Dimension = rep(c("Rows", "Columns"), 40)))If we want to go back to the original data, we can use
unnest() from tidyr.
df %>%
nest_by(Country) %>%
tidyr:::unnest(cols = c(data))The order of the columns has changed though (the nesting column has been moved to the front), the rows’ order is the one of the nesting column plus we obtained a grouped data frame.
In case we want to change the nesting column, we have to apply
unnest(), ungroup() and then
nest_by() again.
df %>%
nest_by(Country) %>%
tidyr:::unnest(cols = c(data)) %>%
ungroup() %>%
nest_by(`Customer ID`)As it is not possible to override it as we can with
group_by().
df %>%
group_by(Country) %>%
group_by(`Customer ID`)df %>%
nest_by(Country) %>%
nest_by(`Customer ID`)Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `Customer ID` is not found.
Unless it is by the same columns, but that is redundant.
df %>%
nest_by(Country) %>%
nest_by(Country)df %>%
nest_by(Country, `Customer ID`) %>%
nest_by(Country, `Customer ID`)To unnest we can also use reframe(), which ungroups the
output as well.
df %>%
nest_by(Country) %>%
reframe(data)We can access every data frame like this,
df %>%
nest_by(Country) %>%
ungroup() %>%
select(data) %>%
slice(1) %>%
tidyr::unnest(cols = c(data))also without selecting data if we want to show the nesting column as well.
df %>%
nest_by(Country) %>%
ungroup() %>%
slice(1) %>%
tidyr::unnest(cols = c(data))Without ungrouping we would not get the desired output but we would unnest the whole data frame.
df %>%
nest_by(Country) %>%
select(data) %>%
slice(1) %>%
tidyr::unnest(cols = c(data))Adding missing grouping variables: `Country`
Like group_by(), we can use nest_by() with
expressions (the documentation states that “Computations are not allowed
in nest_by()” though).
df %>%
nest_by(CustomerID = as.character(`Customer ID`))df %>%
nest_by(Price_Rank = dense_rank(Price))Which are equivalent to performing a mutate() call
beforehand.
df %>%
mutate(CustomerID = as.character(`Customer ID`)) %>%
nest_by(CustomerID)df %>%
mutate(Price_Rank = min_rank(Price)) %>%
nest_by(Price_Rank)We can change the name of the list-column with the .key
argument
df %>%
nest_by(Country, .key = "list of dfs")With the .keep argument instead we control whether or
not to keep the nesting column in the data frames.
df %>%
nest_by(Country , .keep = TRUE) %>%
ungroup() %>%
select(data) %>%
slice(1) %>%
tidyr::unnest(cols = c(data))Ultimately, nest_by() is equivalent to these lines of
code.
df %>%
group_by(Country) %>%
summarise(data = list(pick(everything()))) %>%
rowwise()On a grouped data frame, nest_by() only works without an
argument, as it inherits the grouping column from
group_by().
df %>%
group_by(Country) %>%
nest_by()df %>%
group_by(Country) %>%
nest_by(`Customer ID`)Error in `nest_by()`:
! Can't re-group while nesting
ℹ Either `ungroup()` first or don't supply arguments to `nest_by()
Also if we specify the same column.
df %>%
group_by(Country) %>%
nest_by(Country)Error in `nest_by()`:
! Can't re-group while nesting
ℹ Either `ungroup()` first or don't supply arguments to `nest_by()