When applying a verb to a grouped data frame, the rows’ order of the
output can be different from the input’s one, as most frequently it is
re-arranged by the grouping columns. Grouping columns that can be
retrieved at any time using group_keys()
, a grouped data
frames utility function.
%>%
df group_by(Country) %>%
group_keys()
This page will then serve as a brief memory aid to consult when in
doubt about a specific verb behavior. More information can be found in
each verb specific page or in the group_by() / .by
one.
Among the main verbs, select()
, rename()
andrelocate()
will not be discussed here as the
manipulations they perform don’t affect the rows’ order.
filter()
doesn’t arrange the output by the grouping
columns.
%>%
df group_by(Country) %>%
filter(row_number() <= 5)
arrange()
ignores the grouping, therefore the new row
order will be dictated only by the verb itself.
%>%
df group_by(Country) %>%
arrange(desc(row_number()))
slice()
and its helpers rearrange the output by the
grouping columns,
%>%
df group_by(Country) %>%
slice(1:5)
except when we use the .by
argument, as in this case the
original row order will be preserved.
%>%
df slice(1:5, .by = Country)
mutate()
doesn’t modify the rows’ order.
%>%
df group_by(Country) %>%
mutate(Avg_Price = mean(Price))
summarise()
arranges the rows by the grouping columns
when using group_by()
,
%>%
df group_by(Country) %>%
summarise(Avg_Price = mean(Price))
and preserves the original order with .by
.
%>%
df summarise(Avg_Price = mean(Price), .by = Country)
reframe()
behaves like summarise()
,
arranging the rows by the grouping columns when using
group_by()
,
%>%
df group_by(Country) %>%
reframe(Price_Quantile_Value = quantile(Price, c(0.25, 0.75)), prob = c(0.25, 0.75))
and preserving the original order with .by
.
%>%
df reframe(Price_Quantile_Value = quantile(Price, c(0.25, 0.75)), prob = c(0.25, 0.75), .by = Country)
count()
outputs a data frame ordered by the grouping
columns; ties within them are resolved by the values of the argument fed
to the verb.
%>%
df group_by(Country) %>%
count(Price)
tally()
reorders the output by the grouping columns.
%>%
df group_by(Country) %>%
tally()
distinct()
doesn’t rearrange the rows by the grouping
columns.
%>%
df group_by(Country) %>%
distinct(Price)
So, to summarise, the verbs that don’t modify the rows’ order by the
grouping columns are filter()
, arrange()
,
mutate()
and distinct()
.
The ones that do are slice()
, summarise()
,
reframe()
, count()
and tally()
.
For the first three the rows’ order is not modified if we use
.by
.
Beside the rows’ order, some verbs differ as well in regard to maintaining the outputted data frame grouped or not.
As could be gathered from previous examples, the verbs that keep the
data frame grouped are filter()
, arrange()
,
slice()
, mutate()
, count()
and
distinct()
. select()
, rename()
and relocate()
have this behavior as well.
summarise()
instead removes the last grouping column, so
if there is only one the data frame is returned ungrouped.
reframe()
removes all grouping columns, and
tally()
, being a wrapper for
summarise(n = n())
behaves like the aforementioned
verb.
If we use .by
for grouping, the output will always be
ungrouped, as that is the purpose of that argument, to group only for
the operation it is used for.