nest_by()
shares some similarities with
group_by()
, as both group multiple rows of a data frame by
the unique values of the columns specified as arguments.
%>%
df group_by(Country)
The difference is in the output as nest_by()
creates a
list-column called data, where every element of it is a data frame
containing the rows pertaining to a unique value of the nesting
column
%>%
df nest_by(Country)
or to every existing combination of them in case we specify more than one.
%>%
df nest_by(Country, `Customer ID`)
Notice how the rows are ordered by the nesting columns.
It is therefore similar to group_nest()
,
%>%
df group_nest(Country)
besides the fact that nest_by()
returns a rowwise data
frame, so we can subsequently apply the same manipulations to each data
frame independently.
%>%
df nest_by(Country) %>%
reframe(Values = dim(data)) %>%
bind_cols(tibble(Dimension = rep(c("Rows", "Columns"), 40)))
If we want to go back to the original data, we can use
unnest()
from tidyr
.
%>%
df nest_by(Country) %>%
:::unnest(cols = c(data)) tidyr
The order of the columns has changed though (the nesting column has been moved to the front), the rows’ order is the one of the nesting column plus we obtained a grouped data frame.
In case we want to change the nesting column, we have to apply
unnest()
, ungroup()
and then
nest_by()
again.
%>%
df nest_by(Country) %>%
:::unnest(cols = c(data)) %>%
tidyrungroup() %>%
nest_by(`Customer ID`)
As it is not possible to override it as we can with
group_by()
.
%>%
df group_by(Country) %>%
group_by(`Customer ID`)
%>%
df nest_by(Country) %>%
nest_by(`Customer ID`)
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `Customer ID` is not found.
Unless it is by the same columns, but that is redundant.
%>%
df nest_by(Country) %>%
nest_by(Country)
%>%
df nest_by(Country, `Customer ID`) %>%
nest_by(Country, `Customer ID`)
To unnest we can also use reframe()
, which ungroups the
output as well.
%>%
df nest_by(Country) %>%
reframe(data)
We can access every data frame like this,
%>%
df nest_by(Country) %>%
ungroup() %>%
select(data) %>%
slice(1) %>%
::unnest(cols = c(data)) tidyr
also without selecting data if we want to show the nesting column as well.
%>%
df nest_by(Country) %>%
ungroup() %>%
slice(1) %>%
::unnest(cols = c(data)) tidyr
Without ungrouping we would not get the desired output but we would unnest the whole data frame.
%>%
df nest_by(Country) %>%
select(data) %>%
slice(1) %>%
::unnest(cols = c(data)) tidyr
Adding missing grouping variables: `Country`
Like group_by()
, we can use nest_by()
with
expressions (the documentation states that “Computations are not allowed
in nest_by()
” though).
%>%
df nest_by(CustomerID = as.character(`Customer ID`))
%>%
df nest_by(Price_Rank = dense_rank(Price))
Which are equivalent to performing a mutate()
call
beforehand.
%>%
df mutate(CustomerID = as.character(`Customer ID`)) %>%
nest_by(CustomerID)
%>%
df mutate(Price_Rank = min_rank(Price)) %>%
nest_by(Price_Rank)
We can change the name of the list-column with the .key
argument
%>%
df nest_by(Country, .key = "list of dfs")
With the .keep
argument instead we control whether or
not to keep the nesting column in the data frames.
%>%
df nest_by(Country , .keep = TRUE) %>%
ungroup() %>%
select(data) %>%
slice(1) %>%
::unnest(cols = c(data)) tidyr
Ultimately, nest_by()
is equivalent to these lines of
code.
%>%
df group_by(Country) %>%
summarise(data = list(pick(everything()))) %>%
rowwise()
On a grouped data frame, nest_by()
only works without an
argument, as it inherits the grouping column from
group_by()
.
%>%
df group_by(Country) %>%
nest_by()
%>%
df group_by(Country) %>%
nest_by(`Customer ID`)
Error in `nest_by()`:
! Can't re-group while nesting
ℹ Either `ungroup()` first or don't supply arguments to `nest_by()
Also if we specify the same column.
%>%
df group_by(Country) %>%
nest_by(Country)
Error in `nest_by()`:
! Can't re-group while nesting
ℹ Either `ungroup()` first or don't supply arguments to `nest_by()