data-masking
arrange()
reorders the rows of a data frame by the
values of one or several columns.
In case of numerical ones, the default puts the smallest values first.
%>%
df arrange(Quantity)
We need to use a minus (-
) or to wrap the column in
desc()
if we are interested in seeing the largest ones on
top.
%>%
df arrange(-Quantity)
%>%
df arrange(desc(Quantity))
The documentation doesn’t specify what happens in case of ties but we can assume that rows with a smaller row index come first, preserving their original row order then.
In case of character columns, the order is alphabetical as defined by the default “C” locale.
%>%
df arrange(Description)
%>%
df arrange(desc(Description))
We can change it to the locale of our choice with the
.locale
argument.
%>%
df arrange(Description, .locale = "en")
In case we want to arrange character columns by a custom order, it is possible by transforming them in ordered factors.
%>%
df mutate(Description_Factor = factor(Description, levels = sample(unique(df$Description),
length(unique(df$Description))), ordered = TRUE)) %>%
arrange(Description_Factor)
arrange()
is a data-masking function, so it accepts
expressions as input.
%>%
df arrange(Quantity ^ 2)
%>%
df arrange(min_rank(Quantity))
NAs are placed at the bottom, but we can have them on top by using
this syntax (that exploits the data-masking nature of
arrange()
putting the FALSEs, that are equal to 0,
first).
%>%
df arrange(!is.na(Description))
If we use more columns, the subsequent ones will be used to resolve ties (notice how rows 2 and 3 swap place in the second line of code).
%>%
df arrange(desc(Quantity))
%>%
df arrange(desc(Quantity), StockCode)
arrange()
ignores grouping. In this way we can have rows
from different groups at the top.
%>%
df group_by(Description) %>%
arrange(Quantity)
In case we want to order by the grouping column as well, we need to add it, before or after the other columns as the situation requires.
%>%
df group_by(Description) %>%
arrange(Quantity, Description)
%>%
df group_by(Description) %>%
arrange(Description, Quantity)
Instead of specifying the grouping column, we can set the
.by_group
argument to TRUE. This will use the grouping
column as the first one specified, like in the previous example.
%>%
df group_by(Description) %>%
arrange(Quantity, .by_group = TRUE)
If we use an expression within arrange()
, that will be
evaluated on all the data frame and non per groups.
Therefore in the following example the verb doesn’t modify the rows’
order as sum(Quantity)
returns the same value for all the
rows.
%>%
df group_by(Description) %>%
arrange(desc(sum(Quantity)))
So we need to be explicit with a mutate()
call and use
the resulting additional column to arrange by.
%>%
df group_by(Description) %>%
mutate(Total_Quantity = sum(Quantity)) %>%
arrange(desc(Total_Quantity))