When applying a verb to a grouped data frame, the rows’ order of the output can be different from the input’s one, as most frequently it is re-arranged by the grouping columns. Grouping columns that can be retrieved at any time using group_keys(), a grouped data frames utility function.

df %>%
  group_by(Country) %>%
  group_keys()
A tibble: 40 x 1

This page will then serve as a brief memory aid to consult when in doubt about a specific verb behavior. More information can be found in each verb specific page or in the group_by() / .by one. Among the main verbs, select(), rename() andrelocate() will not be discussed here as the manipulations they perform don’t affect the rows’ order.

- filter()

filter() doesn’t arrange the output by the grouping columns.

df %>%
  group_by(Country) %>%
  filter(row_number() <= 5)
A tibble: 200 x 8
Groups: Country [40]

- arrange()

arrange() ignores the grouping, therefore the new row order will be dictated only by the verb itself.

df %>%
  group_by(Country) %>%
  arrange(desc(row_number()))
A tibble: 525461 x 8
Groups: Country [40]

- slice()

slice() and its helpers rearrange the output by the grouping columns,

df %>%
  group_by(Country) %>%
  slice(1:5)
A tibble: 200 x 8
Groups: Country [40]

except when we use the .by argument, as in this case the original row order will be preserved.

df %>%
  slice(1:5, .by = Country)
A tibble: 200 x 8

- mutate()

mutate() doesn’t modify the rows’ order.

df %>%
  group_by(Country) %>%
  mutate(Avg_Price = mean(Price))
A tibble: 525461 x 9
Groups: Country [40]

- summarise()

summarise() arranges the rows by the grouping columns when using group_by(),

df %>%
  group_by(Country) %>%
  summarise(Avg_Price = mean(Price))
A tibble: 40 x 2

and preserves the original order with .by.

df %>%
  summarise(Avg_Price = mean(Price), .by = Country)
A tibble: 40 x 2

- reframe()

reframe() behaves like summarise(), arranging the rows by the grouping columns when using group_by(),

df %>%
  group_by(Country) %>%
  reframe(Price_Quantile_Value = quantile(Price, c(0.25, 0.75)), prob = c(0.25, 0.75))
A tibble: 80 x 3

and preserving the original order with .by.

df %>%
  reframe(Price_Quantile_Value = quantile(Price, c(0.25, 0.75)), prob = c(0.25, 0.75), .by = Country)
A tibble: 80 x 3

- count()

count() outputs a data frame ordered by the grouping columns; ties within them are resolved by the values of the argument fed to the verb.

df %>%
  group_by(Country) %>%
  count(Price)
A tibble: 3059 x 3
Groups: Country [40]

- tally()

tally() reorders the output by the grouping columns.

df %>%
  group_by(Country) %>%
  tally()
A tibble: 40 x 2

- distinct()

distinct() doesn’t rearrange the rows by the grouping columns.

df %>%
  group_by(Country) %>%
  distinct(Price)
A tibble: 3059 x 2
Groups: Country [40]


So, to summarise, the verbs that don’t modify the rows’ order by the grouping columns are filter(), arrange(), mutate() and distinct().
The ones that do are slice(), summarise(), reframe(), count() and tally(). For the first three the rows’ order is not modified if we use .by.


- grouped or ungrouped output?

Beside the rows’ order, some verbs differ as well in regard to maintaining the outputted data frame grouped or not.

As could be gathered from previous examples, the verbs that keep the data frame grouped are filter(), arrange(), slice(), mutate(), count() and distinct(). select(), rename() and relocate() have this behavior as well.

summarise() instead removes the last grouping column, so if there is only one the data frame is returned ungrouped. reframe() removes all grouping columns, and tally(), being a wrapper for summarise(n = n()) behaves like the aforementioned verb.

If we use .by for grouping, the output will always be ungrouped, as that is the purpose of that argument, to group only for the operation it is used for.