tally()
is a wrapper for summarise(n = n())
and as such it returns the number of rows of a data frame.
%>%
df tally()
%>%
df summarise(n = n())
tally()
, like n()
, doesn’t need an argument
and if supplied with an unnamed one it uses it as the value for
wt
, so attention is required.
%>%
df tally(Quantity)
%>%
df tally(wt = Quantity)
data-masking
Speaking about wt
, it modifies the function inside
summarise()
from n()
to
sum(wt)
,
%>%
df tally(wt = Quantity)
%>%
df summarise(n = sum(Quantity))
Therefore outputting not the number of rows but the summation of the values of the column specified in it.
We can’t obviously use wt
with columns whose values
can’t be summed.
%>%
df tally(wt = Description)
Error in `tally()`:
ℹ In argument: `n = sum(Description, na.rm = TRUE)`.
Caused by error in `sum()`:
! invalid 'type' (character) of argument
It can be used though with more than one column.
%>%
df tally(wt = c(Quantity, Price))
In these instances it is equivalent to
%>%
df summarise(n = sum(Quantity, Price))
wt
accepts expressions as well,
%>%
df tally(wt = Quantity / 2)
as long as their output can be summed of course.
%>%
df tally(wt = as.character(Quantity / 2))
Error in `tally()`:
ℹ In argument: `n = sum(as.character(Quantity/2), na.rm =
TRUE)`.
Caused by error in `sum()`:
! invalid 'type' (character) of argument
There are some functions where the summation is inconsequential,
%>%
df tally(wt = n_distinct(`Customer ID`))
as they return just one addend.
%>%
df summarise(n = sum(n_distinct(`Customer ID`)))
%>%
df summarise(n = n_distinct(`Customer ID`))
so using wt
gives us the possibility to employ them in a
tally(
) call, if we wish to do so.
If we have an already aggregated data frame,
%>%
df count(Country)
we can use wt
to retrieve the original total number of
rows.
%>%
df count(Country) %>%
tally(wt = n)
Another optional argument is name
, to change
n
to a custom denomination.
%>%
df tally(name = "Total_Number_of_Rows")
When used on a grouped data frame, tally()
returns the
number of rows for each group thus becoming equivalent to a
count()
call,
%>%
df group_by(Country) %>%
tally()
%>%
df count(Country)
except from the fact that tally()
, like
summarise()
, removes the most recent grouping when we have
a data frame grouped by more than one column,
%>%
df group_by(Country, `Customer ID`) %>%
tally() %>%
group_vars()
## [1] "Country"
%>%
df group_by(Country, `Customer ID`) %>%
summarise(n = n()) %>%
group_vars()
## `summarise()` has grouped output by 'Country'. You can override using
## the `.groups` argument.
## [1] "Country"
while count()
removes all the groups.
%>%
df count(Country, `Customer ID`) %>%
group_vars()
## character(0)
With a grouped data frame, we can use the sort
argument
to arrange the rows by n
, in descending order.
%>%
df group_by(Country) %>%
tally(sort = TRUE)
Using sort = TRUE
we can see as well that NAs are
counted as one value, as per n()
functionality.
%>%
df group_by(`Customer ID`) %>%
tally(sort = TRUE)
tally()
’s variant, add_tally()
, uses
mutate()
instead of summarise()
therefore
adding a column named n
with the same value for all of the
rows.
%>%
df add_tally()
%>%
df mutate(n = n())
add_tally()
is consistent with the arguments of
tally()
.
%>%
df tally(wt = Quantity, name = "Total_Quantity")
On a grouped data frame all the rows pertaining to the same group will share the same value.
%>%
df group_by(Country) %>%
add_tally()
Differently from tally()
, add_tally()
doesn’t ungroup the output, even with several column,
%>%
df group_by(Country, `Customer ID`) %>%
add_tally()
as that is not a property of mutate()
.
%>%
df group_by(Country, `Customer ID`) %>%
mutate(n = n())
With a grouped data frame we can use the sort
argument,
that will arrange the rows by the descending order of
n
.
%>%
df group_by(Country) %>%
add_tally(sort = TRUE)