data-masking

- fundamentals

arrange() reorders the rows of a data frame by the values of one or several columns.

In case of numerical ones, the default puts the smallest values first.

df %>%
  arrange(Quantity)
A tibble: 525461 x 8

We need to use a minus (-) or to wrap the column in desc() if we are interested in seeing the largest ones on top.

df %>%
  arrange(-Quantity)
A tibble: 525461 x 8
df %>%
  arrange(desc(Quantity))
A tibble: 525461 x 8

The documentation doesn’t specify what happens in case of ties but we can assume that rows with a smaller row index come first, preserving their original row order then.

In case of character columns, the order is alphabetical as defined by the default “C” locale.

df %>%
  arrange(Description)
A tibble: 525461 x 8
df %>%
  arrange(desc(Description))
A tibble: 525461 x 8

- .locale

We can change it to the locale of our choice with the .locale argument.

df %>%
  arrange(Description, .locale = "en")
A tibble: 525461 x 8

In case we want to arrange character columns by a custom order, it is possible by transforming them in ordered factors.

df %>%
  mutate(Description_Factor = factor(Description, levels = sample(unique(df$Description), 
                                                                  length(unique(df$Description))), ordered = TRUE)) %>%
  arrange(Description_Factor)
A tibble: 525461 x 9

arrange() is a data-masking function, so it accepts expressions as input.

df %>%
  arrange(Quantity ^ 2)
A tibble: 525461 x 8
df %>%
  arrange(min_rank(Quantity))
A tibble: 525461 x 8

NAs are placed at the bottom, but we can have them on top by using this syntax (that exploits the data-masking nature of arrange() putting the FALSEs, that are equal to 0, first).

df %>%
  arrange(!is.na(Description))
A tibble: 525461 x 8

If we use more columns, the subsequent ones will be used to resolve ties (notice how rows 2 and 3 swap place in the second line of code).

df %>%
  arrange(desc(Quantity))
A tibble: 525461 x 8
df %>%
  arrange(desc(Quantity), StockCode)
A tibble: 525461 x 8

- with group_by()

arrange() ignores grouping. In this way we can have rows from different groups at the top.

df %>%
  group_by(Description) %>%
  arrange(Quantity)
A tibble: 525461 x 8
Groups: Description [4644]

In case we want to order by the grouping column as well, we need to add it, before or after the other columns as the situation requires.

df %>%
  group_by(Description) %>%
  arrange(Quantity, Description)
A tibble: 525461 x 8
Groups: Description [4644]
df %>%
  group_by(Description) %>%
  arrange(Description, Quantity)
A tibble: 525461 x 8
Groups: Description [4644]

- .by_group

Instead of specifying the grouping column, we can set the .by_group argument to TRUE. This will use the grouping column as the first one specified, like in the previous example.

df %>%
  group_by(Description) %>% 
  arrange(Quantity, .by_group = TRUE)
A tibble: 525461 x 8
Groups: Description [4644]

- using expressions

If we use an expression within arrange(), that will be evaluated on all the data frame and non per groups.
Therefore in the following example the verb doesn’t modify the rows’ order as sum(Quantity) returns the same value for all the rows.

df %>%
  group_by(Description) %>%
  arrange(desc(sum(Quantity)))
A tibble: 525461 x 8
Groups: Description [4644]

So we need to be explicit with a mutate() call and use the resulting additional column to arrange by.

df %>%
  group_by(Description) %>%
  mutate(Total_Quantity = sum(Quantity)) %>%
  arrange(desc(Total_Quantity))
A tibble: 525461 x 9
Groups: Description [4644]