- fundamentals

tally() is a wrapper for summarise(n = n()) and as such it returns the number of rows of a data frame.

df %>%
  tally()
A tibble: 1 x 1
df %>% 
  summarise(n = n())
A tibble: 1 x 1

tally(), like n(), doesn’t need an argument and if supplied with an unnamed one it uses it as the value for wt, so attention is required.

df %>%
  tally(Quantity)
A tibble: 1 x 1
df %>%
  tally(wt = Quantity)
A tibble: 1 x 1

- wt

data-masking

Speaking about wt, it modifies the function inside summarise() from n() to sum(wt),

df %>%
  tally(wt = Quantity)
A tibble: 1 x 1
df %>% 
  summarise(n = sum(Quantity))
A tibble: 1 x 1

Therefore outputting not the number of rows but the summation of the values of the column specified in it.

We can’t obviously use wt with columns whose values can’t be summed.

df %>%
  tally(wt = Description)
Error in `tally()`:
ℹ In argument: `n = sum(Description, na.rm = TRUE)`.
Caused by error in `sum()`:
! invalid 'type' (character) of argument

It can be used though with more than one column.

df %>%
  tally(wt = c(Quantity, Price))
A tibble: 1 x 1

In these instances it is equivalent to

df %>% 
  summarise(n = sum(Quantity, Price))
A tibble: 1 x 1

wt accepts expressions as well,

df %>%
  tally(wt = Quantity / 2)
A tibble: 1 x 1

as long as their output can be summed of course.

df %>%
  tally(wt = as.character(Quantity / 2))
Error in `tally()`:
ℹ In argument: `n = sum(as.character(Quantity/2), na.rm =
  TRUE)`.
Caused by error in `sum()`:
! invalid 'type' (character) of argument

There are some functions where the summation is inconsequential,

df %>%
  tally(wt = n_distinct(`Customer ID`))
A tibble: 1 x 1

as they return just one addend.

df %>%
  summarise(n = sum(n_distinct(`Customer ID`)))
A tibble: 1 x 1
df %>%
  summarise(n = n_distinct(`Customer ID`))
A tibble: 1 x 1

so using wt gives us the possibility to employ them in a tally() call, if we wish to do so.

If we have an already aggregated data frame,

df %>%
  count(Country)
A tibble: 40 x 2

we can use wt to retrieve the original total number of rows.

df %>%
  count(Country) %>%
  tally(wt = n)
A tibble: 1 x 1

- name

Another optional argument is name, to change n to a custom denomination.

df %>%
  tally(name = "Total_Number_of_Rows")
A tibble: 1 x 1

- with group_by()

When used on a grouped data frame, tally() returns the number of rows for each group thus becoming equivalent to a count() call,

df %>%
  group_by(Country) %>%
  tally()
A tibble: 40 x 2
df %>%
  count(Country)
A tibble: 40 x 2

except from the fact that tally(), like summarise(), removes the most recent grouping when we have a data frame grouped by more than one column,

df %>%
  group_by(Country, `Customer ID`) %>%
  tally() %>%
  group_vars()
## [1] "Country"
df %>%
  group_by(Country, `Customer ID`) %>%
  summarise(n = n()) %>%
  group_vars()
## `summarise()` has grouped output by 'Country'. You can override using
## the `.groups` argument.
## [1] "Country"

while count() removes all the groups.

df %>%
  count(Country, `Customer ID`) %>%
  group_vars()
## character(0)

- sort

With a grouped data frame, we can use the sort argument to arrange the rows by n, in descending order.

df %>%
  group_by(Country) %>%
  tally(sort = TRUE)
A tibble: 40 x 2

Using sort = TRUE we can see as well that NAs are counted as one value, as per n() functionality.

df %>%
  group_by(`Customer ID`) %>%
  tally(sort = TRUE)
A tibble: 4384 x 2

- add_tally()

tally()’s variant, add_tally(), uses mutate() instead of summarise() therefore adding a column named n with the same value for all of the rows.

df %>%
  add_tally()
A tibble: 525461 x 9
df %>%
  mutate(n = n())
A tibble: 525461 x 9

add_tally() is consistent with the arguments of tally().

df %>%
  tally(wt = Quantity, name = "Total_Quantity")
A tibble: 1 x 1

- with group_by()

On a grouped data frame all the rows pertaining to the same group will share the same value.

df %>%
  group_by(Country) %>%
  add_tally()
A tibble: 525461 x 9
Groups: Country [40]

Differently from tally(), add_tally() doesn’t ungroup the output, even with several column,

df %>%
  group_by(Country, `Customer ID`) %>%
  add_tally()
A tibble: 525461 x 9
Groups: Country, Customer ID [4401]

as that is not a property of mutate().

df %>%
  group_by(Country, `Customer ID`) %>%
  mutate(n = n())
A tibble: 525461 x 9
Groups: Country, Customer ID [4401]

- sort

With a grouped data frame we can use the sort argument, that will arrange the rows by the descending order of n.

df %>%
  group_by(Country) %>%
  add_tally(sort = TRUE)
A tibble: 525461 x 9
Groups: Country [40]