tidy-select

- fundamentals

With group_by() we can group together several rows by the different unique values of a column. If we apply a manipulation we then get results that are specific to each group.

df %>%
  group_by(Country) %>%
  summarise(Avg_Price = mean(Price))

A tibble: 40 x 2

rowwise() instead creates groups composed of one single row.

df %>%
  rowwise()

A tibble: 525461 x 8

Rowwise:

This permits to apply aggregate functions, inside a mutate() or summarise() call, to values placed on different columns but on the same row.

df %>%
  rowwise() %>%
  mutate(Quantity_Price_Avg = mean(c(Quantity, Price)), .keep = "used")

A tibble: 525461 x 3

Rowwise:

Without rowwise() R would use at once all the values in the Quantity and Price columns to calculate the mean, thus returning the same output for all the rows and not row specific averages like above.

df %>%
  mutate(Quantity_Price_Avg = mean(c(Quantity, Price)), .keep = "used")

A tibble: 525461 x 3

rowwise() is not needed though when we use arithmetic operators, as they are vectorized.

df %>%
  rowwise() %>%
  mutate(Total_Expense = Quantity * Price, .keep = "used")

A tibble: 525461 x 3

Rowwise:

df %>%
  mutate(Total_Expense = Quantity * Price, .keep = "used")

A tibble: 525461 x 3

df %>%
  rowwise() %>%
  mutate(Quantity_plus_Price = Quantity + Price, .keep = "used")

A tibble: 525461 x 3

Rowwise:

df %>%
  mutate(Quantity_plus_Price = Quantity + Price, .keep = "used")

A tibble: 525461 x 3

- c_across()

tidy-select

In case we need to select several columns, we can use c_across(), which we can say it is a c() specific to rowwise() that employs tidy-select syntax.

df %>%
  slice(1:1000) %>%
  rowwise() %>%
  mutate(Sums_Numeric = sum(c_across(where(is.numeric))), .keep = "used")

A tibble: 1000 x 4

Rowwise:

For more complicated expressions beware that we still need to concatenate with c().

df %>%
  slice(1:1000) %>%
  rowwise() %>%
  mutate(Quantity_plus_Price = sum(c_across(c(where(is.numeric), -`Customer ID`))), .keep = "used")

A tibble: 1000 x 3

Rowwise:

Otherwise we will get an error.

df %>%
  slice(1:1000) %>%
  rowwise() %>%
  mutate(Quantity_plus_Price = sum(c_across(where(is.numeric), -`Customer ID`)))

Error in `mutate()`:
ℹ In argument: `Quantity_plus_Price = sum(c_across(where(is.numeric),
  -`Customer ID`))`.
ℹ In row 1.
Caused by error in `c_across()`:
! unused argument (-`Customer ID`)

Just like with across(), with which you might associate c_across() with but the way you use them is very different as is their output (across() applies the same manipulation to all the values of several columns).

df %>%
  mutate(across(c(where(is.numeric), -`Customer ID`), sum), .keep = "used")

A tibble: 525461 x 2

df %>%
  mutate(across(where(is.numeric), -`Customer ID`), sum)

Error in `mutate()`:
ℹ In argument: `across(where(is.numeric), -`Customer ID`)`.
Caused by error:
! object 'Customer ID' not found

Another key difference is that c_across() returns vectors while across() a data frame.

df %>%
  mutate(Quantity_plus_Price = across(c(where(is.numeric), -`Customer ID`), sum), .keep = "used")

A tibble: 525461 x 3

- using dedicated functions instead

In some of the examples I had to select the first 1000 rows with slice(); that is because rowwise() is not particularly fast, so, if applicable, it’s better to use already existing R rows based functions like rowSums() and rowMeans().

rowSums() and rowMeans() take a data frame as the input, so we need to use across(), or even better pick(), as well.

df %>%
  rowwise() %>%
  mutate(Quantity_plus_Price = sum(c(Quantity, Price)), .keep = "used")

A tibble: 525461 x 3

Rowwise:

df %>%
  mutate(Quantity_plus_Price = rowSums(across(c(Quantity, Price))), .keep = "used")

A tibble: 525461 x 3

df %>%
  mutate(Quantity_plus_Price = rowSums(pick(Quantity, Price)), .keep = "used")

A tibble: 525461 x 3

Likewise if we want to find the minimum value of several columns we can opt for pmin(). This function doesn’t need across() or pick() as it takes vectors as inputs.

df %>%
  rowwise() %>%
  mutate(Lowest_Value = min(c(Quantity, Price)), .keep = "used")

A tibble: 525461 x 3

Rowwise:

df %>%
  mutate(Lowest_Value = pmin(Quantity, Price), .keep = "used")

A tibble: 525461 x 3

- ungrouped or grouped output

If we apply summarise() to a rowwise data frame, the output will not be grouped.

df %>%
  rowwise() %>%
  mutate(Quantity_plus_Price = sum(c(Quantity, Price)), .keep = "used")

A tibble: 525461 x 3

Rowwise:

df %>%
  rowwise() %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price)))

A tibble: 525461 x 1

Similarly to what happens with group_by() when only one column is used.

df %>%
  group_by(Country) %>%
  mutate(Quantity_plus_Price = sum(c(Quantity, Price)), .keep = "used")

A tibble: 525461 x 4

Groups: Country [40]

df %>%
  group_by(Country) %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price)))

A tibble: 40 x 2

Unlike group_by() though, even with .groups = "keep".

df %>%
  group_by(Country) %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price)), .groups = "keep")

A tibble: 40 x 2

Groups: Country [40]

df %>%
  rowwise() %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price)), .groups = "keep")

A tibble: 525461 x 1

The general good practice of piping an additional ungroup() is recommended then when not using summarise().

df %>%
  rowwise() %>%
  mutate(Quantity_plus_Price = sum(c(Quantity, Price))) %>%
  ungroup()

A tibble: 525461 x 9

- columns as arguments

rowwise() can take one or several columns as arguments, with a tidy-select syntax.

df %>%
  rowwise(`Customer ID`)

A tibble: 525461 x 8

Rowwise: Customer ID

df %>%
  rowwise(7, Invoice)

A tibble: 525461 x 8

Rowwise: Customer ID, Invoice

df %>%
  rowwise(where(is.numeric))

A tibble: 525461 x 8

Rowwise: Quantity, Price, Customer ID

When specified, after a summarise() call they will be kept and used as group_by() grouping columns (differently from group_by() the output is not ordered by them though).

df %>%
  rowwise() %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price)))

A tibble: 525461 x 1

df %>%
  rowwise(`Customer ID`) %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price)))

`summarise()` has grouped output by 'Customer ID'. You can override using the
`.groups` argument.

A tibble: 525461 x 2

Groups: Customer ID [4384]

df %>%
  rowwise(`Customer ID`, Invoice) %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price)))

`summarise()` has grouped output by 'Customer ID', 'Invoice'. You can override
using the `.groups` argument.

A tibble: 525461 x 3

Groups: Customer ID, Invoice [28816]

This can be useful for piping additional manipulations that require a grouped data frame (notice how here the output is ordered by the grouping column instead).

df %>%
  rowwise(`Customer ID`) %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price))) %>%
  summarise(Avg_Quantity_plus_Price_per_Customer = mean(Quantity_plus_Price))

`summarise()` has grouped output by 'Customer ID'. You can override using the
`.groups` argument.

A tibble: 4384 x 2

With mutate() instead it keeps the rowwise() grouping.

df %>%
  rowwise(`Customer ID`) %>%
  mutate(Quantity_plus_Price = sum(c(Quantity, Price)))

A tibble: 525461 x 9

Rowwise: Customer ID

If we use c_across(), the columns set as arguments of rowwise() will not be selected even if they answer to the condition.

df %>%
  slice(1:1000) %>%
  rowwise(`Customer ID`) %>%
  mutate(Sums_Numeric = sum(c_across(where(is.numeric))), .keep = "used")

A tibble: 1000 x 4

Rowwise: Customer ID

df %>%
  slice(1:1000) %>%
  rowwise(`Customer ID`) %>%
  summarise(Sums_Numeric = sum(c_across(where(is.numeric))))

`summarise()` has grouped output by 'Customer ID'. You can override using the
`.groups` argument.

A tibble: 1000 x 2

Groups: Customer ID [50]

rowwise() though doesn’t permit to group with expression like group_by() (it is not a data-masking function).

df %>%
  rowwise(as.character(`Customer ID`)) %>%
  mutate(Quantity_plus_Price = sum(c(Quantity, Price)))

Error in `rowwise()`:
! Problem while evaluating `` as.character(`Customer ID`) ``.
Caused by error:
! object 'Customer ID' not found

- with multiple outputs functions

We can provide functions that return more than one value in we wrap them in a list. The values will be returned inside a list-column.

df %>%
  slice(1:1000) %>%
  rowwise() %>%
  mutate(Quantiles = list(quantile(c(Quantity, Price), prob = c(0.25, 0.75))), .keep = "used")

A tibble: 1000 x 3

Rowwise:

- with list-columns

rowwise() can come handy when we have a list-column if we want to access the values inside each of its elements (as each of them is a single row).

df %>%
  group_nest(Country) %>%
  rowwise() %>%
  mutate(N_Rows = nrow(data))

A tibble: 40 x 3

Rowwise:

df %>%
  group_nest(Country) %>%
  rowwise() %>%
  mutate(Avg_Quantity = mean(data$Quantity))

A tibble: 40 x 3

Rowwise:

- with group_by()

If we apply rowwise() on a grouped data frame, with a mutate() call it overrides the grouping inheriting the grouping column.

df %>%
  group_by(Country) %>%
  rowwise() %>%
  mutate(Quantity_plus_Price = sum(c(Quantity, Price)), .keep = "used")

A tibble: 525461 x 4

Rowwise: Country

With a summarise() call it ungroups, performs the rowwise() calculations and then regroups the output by the original column.

df %>%
  group_by(Country) %>%
  rowwise() %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price)))

`summarise()` has grouped output by 'Country'. You can override using the
`.groups` argument.

A tibble: 525461 x 2

Groups: Country [40]

rowwise() doesn’t accept arguments in these situations though, returning an error.

df %>%
  group_by(Country) %>%
  rowwise(`Customer ID`) %>%
  mutate(Quantity_plus_Price = sum(c(Quantity, Price)))

Error in `rowwise()`:
! Can't re-group when creating rowwise data.
ℹ Either first `ungroup()` or call `rowwise()` without arguments.

df %>%
  group_by(Country) %>%
  rowwise(`Customer ID`) %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price)))

Error in `rowwise()`:
! Can't re-group when creating rowwise data.
ℹ Either first `ungroup()` or call `rowwise()` without arguments.

Even when the argument is the same as the grouping column.

df %>%
  group_by(Country) %>%
  rowwise(Country) %>%
  mutate(Quantity_plus_Price = sum(c(Quantity, Price)))

Error in `rowwise()`:
! Can't re-group when creating rowwise data.
ℹ Either first `ungroup()` or call `rowwise()` without arguments.

df %>%
  group_by(Country) %>%
  rowwise(Country) %>%
  summarise(Quantity_plus_Price = sum(c(Quantity, Price)))

Error in `rowwise()`:
! Can't re-group when creating rowwise data.
ℹ Either first `ungroup()` or call `rowwise()` without arguments.