tidy-select
With group_by()
we can group together several rows by
the different unique values of a column. If we apply a manipulation we
then get results that are specific to each group.
%>%
df group_by(Country) %>%
summarise(Avg_Price = mean(Price))
rowwise()
instead creates groups composed of one single
row.
%>%
df rowwise()
This permits to apply aggregate functions, inside a
mutate()
or summarise()
call, to values placed
on different columns but on the same row.
%>%
df rowwise() %>%
mutate(Quantity_Price_Avg = mean(c(Quantity, Price)), .keep = "used")
Without rowwise()
R
would use at once all
the values in the Quantity
and Price
columns
to calculate the mean, thus returning the same output for all the rows
and not row specific averages like above.
%>%
df mutate(Quantity_Price_Avg = mean(c(Quantity, Price)), .keep = "used")
rowwise()
is not needed though when we use arithmetic
operators, as they are vectorized.
%>%
df rowwise() %>%
mutate(Total_Expense = Quantity * Price, .keep = "used")
%>%
df mutate(Total_Expense = Quantity * Price, .keep = "used")
%>%
df rowwise() %>%
mutate(Quantity_plus_Price = Quantity + Price, .keep = "used")
%>%
df mutate(Quantity_plus_Price = Quantity + Price, .keep = "used")
tidy-select
In case we need to select several columns, we can use
c_across()
, which we can say it is a c()
specific to rowwise()
that employs tidy-select syntax.
%>%
df slice(1:1000) %>%
rowwise() %>%
mutate(Sums_Numeric = sum(c_across(where(is.numeric))), .keep = "used")
For more complicated expressions beware that we still need to
concatenate with c()
.
%>%
df slice(1:1000) %>%
rowwise() %>%
mutate(Quantity_plus_Price = sum(c_across(c(where(is.numeric), -`Customer ID`))), .keep = "used")
Otherwise we will get an error.
%>%
df slice(1:1000) %>%
rowwise() %>%
mutate(Quantity_plus_Price = sum(c_across(where(is.numeric), -`Customer ID`)))
Error in `mutate()`:
ℹ In argument: `Quantity_plus_Price = sum(c_across(where(is.numeric),
-`Customer ID`))`.
ℹ In row 1.
Caused by error in `c_across()`:
! unused argument (-`Customer ID`)
Just like with across()
, with which you might associate
c_across()
with but the way you use them is very different
as is their output (across()
applies the same manipulation
to all the values of several columns).
%>%
df mutate(across(c(where(is.numeric), -`Customer ID`), sum), .keep = "used")
%>%
df mutate(across(where(is.numeric), -`Customer ID`), sum)
Error in `mutate()`:
ℹ In argument: `across(where(is.numeric), -`Customer ID`)`.
Caused by error:
! object 'Customer ID' not found
Another key difference is that c_across()
returns
vectors while across()
a data frame.
%>%
df mutate(Quantity_plus_Price = across(c(where(is.numeric), -`Customer ID`), sum), .keep = "used")
In some of the examples I had to select the first 1000 rows with
slice()
; that is because rowwise()
is not
particularly fast, so, if applicable, it’s better to use already
existing R
rows based functions like rowSums()
and rowMeans()
.
rowSums()
and rowMeans()
take a data frame
as the input, so we need to use across()
, or even better
pick()
, as well.
%>%
df rowwise() %>%
mutate(Quantity_plus_Price = sum(c(Quantity, Price)), .keep = "used")
%>%
df mutate(Quantity_plus_Price = rowSums(across(c(Quantity, Price))), .keep = "used")
%>%
df mutate(Quantity_plus_Price = rowSums(pick(Quantity, Price)), .keep = "used")
Likewise if we want to find the minimum value of several columns we
can opt for pmin()
. This function doesn’t need
across()
or pick()
as it takes vectors as
inputs.
%>%
df rowwise() %>%
mutate(Lowest_Value = min(c(Quantity, Price)), .keep = "used")
%>%
df mutate(Lowest_Value = pmin(Quantity, Price), .keep = "used")
If we apply summarise()
to a rowwise data frame, the
output will not be grouped.
%>%
df rowwise() %>%
mutate(Quantity_plus_Price = sum(c(Quantity, Price)), .keep = "used")
%>%
df rowwise() %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price)))
Similarly to what happens with group_by()
when only one
column is used.
%>%
df group_by(Country) %>%
mutate(Quantity_plus_Price = sum(c(Quantity, Price)), .keep = "used")
%>%
df group_by(Country) %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price)))
Unlike group_by()
though, even with
.groups = "keep"
.
%>%
df group_by(Country) %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price)), .groups = "keep")
%>%
df rowwise() %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price)), .groups = "keep")
The general good practice of piping an additional
ungroup()
is recommended then when not using
summarise()
.
%>%
df rowwise() %>%
mutate(Quantity_plus_Price = sum(c(Quantity, Price))) %>%
ungroup()
rowwise()
can take one or several columns as arguments,
with a tidy-select syntax.
%>%
df rowwise(`Customer ID`)
%>%
df rowwise(7, Invoice)
%>%
df rowwise(where(is.numeric))
When specified, after a summarise()
call they will be
kept and used as group_by()
grouping columns (differently
from group_by()
the output is not ordered by them
though).
%>%
df rowwise() %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price)))
%>%
df rowwise(`Customer ID`) %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price)))
`summarise()` has grouped output by 'Customer ID'. You can override using the
`.groups` argument.
%>%
df rowwise(`Customer ID`, Invoice) %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price)))
`summarise()` has grouped output by 'Customer ID', 'Invoice'. You can override
using the `.groups` argument.
This can be useful for piping additional manipulations that require a grouped data frame (notice how here the output is ordered by the grouping column instead).
%>%
df rowwise(`Customer ID`) %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price))) %>%
summarise(Avg_Quantity_plus_Price_per_Customer = mean(Quantity_plus_Price))
`summarise()` has grouped output by 'Customer ID'. You can override using the
`.groups` argument.
With mutate()
instead it keeps the
rowwise()
grouping.
%>%
df rowwise(`Customer ID`) %>%
mutate(Quantity_plus_Price = sum(c(Quantity, Price)))
If we use c_across()
, the columns set as arguments of
rowwise()
will not be selected even if they answer to the
condition.
%>%
df slice(1:1000) %>%
rowwise(`Customer ID`) %>%
mutate(Sums_Numeric = sum(c_across(where(is.numeric))), .keep = "used")
%>%
df slice(1:1000) %>%
rowwise(`Customer ID`) %>%
summarise(Sums_Numeric = sum(c_across(where(is.numeric))))
`summarise()` has grouped output by 'Customer ID'. You can override using the
`.groups` argument.
rowwise()
though doesn’t permit to group with expression
like group_by()
(it is not a data-masking function).
%>%
df rowwise(as.character(`Customer ID`)) %>%
mutate(Quantity_plus_Price = sum(c(Quantity, Price)))
Error in `rowwise()`:
! Problem while evaluating `` as.character(`Customer ID`) ``.
Caused by error:
! object 'Customer ID' not found
We can provide functions that return more than one value in we wrap them in a list. The values will be returned inside a list-column.
%>%
df slice(1:1000) %>%
rowwise() %>%
mutate(Quantiles = list(quantile(c(Quantity, Price), prob = c(0.25, 0.75))), .keep = "used")
rowwise()
can come handy when we have a list-column if
we want to access the values inside each of its elements (as each of
them is a single row).
%>%
df group_nest(Country) %>%
rowwise() %>%
mutate(N_Rows = nrow(data))
%>%
df group_nest(Country) %>%
rowwise() %>%
mutate(Avg_Quantity = mean(data$Quantity))
If we apply rowwise()
on a grouped data frame, with a
mutate()
call it overrides the grouping inheriting the
grouping column.
%>%
df group_by(Country) %>%
rowwise() %>%
mutate(Quantity_plus_Price = sum(c(Quantity, Price)), .keep = "used")
With a summarise()
call it ungroups, performs the
rowwise()
calculations and then regroups the output by the
original column.
%>%
df group_by(Country) %>%
rowwise() %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price)))
`summarise()` has grouped output by 'Country'. You can override using the
`.groups` argument.
rowwise()
doesn’t accept arguments in these situations
though, returning an error.
%>%
df group_by(Country) %>%
rowwise(`Customer ID`) %>%
mutate(Quantity_plus_Price = sum(c(Quantity, Price)))
Error in `rowwise()`:
! Can't re-group when creating rowwise data.
ℹ Either first `ungroup()` or call `rowwise()` without arguments.
%>%
df group_by(Country) %>%
rowwise(`Customer ID`) %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price)))
Error in `rowwise()`:
! Can't re-group when creating rowwise data.
ℹ Either first `ungroup()` or call `rowwise()` without arguments.
Even when the argument is the same as the grouping column.
%>%
df group_by(Country) %>%
rowwise(Country) %>%
mutate(Quantity_plus_Price = sum(c(Quantity, Price)))
Error in `rowwise()`:
! Can't re-group when creating rowwise data.
ℹ Either first `ungroup()` or call `rowwise()` without arguments.
%>%
df group_by(Country) %>%
rowwise(Country) %>%
summarise(Quantity_plus_Price = sum(c(Quantity, Price)))
Error in `rowwise()`:
! Can't re-group when creating rowwise data.
ℹ Either first `ungroup()` or call `rowwise()` without arguments.