coalesce() takes more than one vector and,
(x <- c("x", "x", NA, "x"))## [1] "x" "x" NA "x"
(y <- c("y", "y", "y", "y"))## [1] "y" "y" "y" "y"
when the first one specified has NAs elements, they are replaced with elements from the following one that share the same positions.
coalesce(x, y)## [1] "x" "x" "y" "x"
The replacement doesn’t work backwards so when there are no NAs to
replace in the first vector specified, coalesce() just
returns it.
coalesce(y, x)## [1] "y" "y" "y" "y"
In case two elements from different vectors are eligible to
substitute, coalesce() uses the the one from the vector
specified first.
(z <- c("z", "z", "z", "z"))## [1] "z" "z" "z" "z"
coalesce(x, y, z)## [1] "x" "x" "y" "x"
coalesce(x, z, y)## [1] "x" "x" "z" "x"
The second vector can also be a scalar, in this case it will be recycled along the length of the first vector.
coalesce(x, "w")## [1] "x" "x" "w" "x"
In a data frame, we can use this last property to replace the NAs in a column with another value of choice.
df %>%
filter(StockCode == "85166B") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, "missing customer"), .keep = "used")Instead of a single value, we can use several if the length of the vector containing the new values is equal to the size of the column containing the values to be replaced.
df %>%
filter(Invoice == "C489881") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, c("first missing customer", "second missing customer")), .keep = "used")Recycling when length > 1 is not allowed in dplyr,
even if we use lowest common denominators of the values to be replaced
(2 and 8 in the following example),
df %>%
filter(StockCode == "85166B") %>%
select(`Customer ID`)df %>%
filter(StockCode == "85166B") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, c("first missing customer", "second missing customer")), .keep = "used")Error in `mutate()`:
ℹ In argument: `New_CustomerID = coalesce(`Customer ID`,
c("first missing customer", "second missing customer"))`.
Caused by error in `coalesce()`:
! Can't recycle `..1` (size 39) to match `..2` (size 2).
or of the size of the column (3 and 39 in this one).
df %>%
filter(StockCode == "85166B") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, c("first missing customer", "second missing customer", "third missing costumer")), .keep = "used")Error in `mutate()`:
ℹ In argument: `New_CustomerID = coalesce(...)`.
Caused by error in `coalesce()`:
! Can't recycle `..1` (size 39) to match `..2` (size 3).
We can by all means substitute the missing values of one column with
the values of another, as per our introduction to
coalesce().
df %>%
filter(StockCode == "85166B") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
For_missing_CustID = "absent",
New_CustomerID = coalesce(`Customer ID`, For_missing_CustID), .keep = "used")coalesce() accepts functions as the values to be
replaced.
pos <- c("first ", "second ")
cust <- "missing customer"df %>%
filter(Invoice == "C489881") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, paste0(pos, cust)), .keep = "used")With the .ptype argument we can change the type of the
output.
coalesce(c(1, NA, 1, 1), 0, .ptype = logical())## [1] TRUE FALSE TRUE TRUE
And with the .size one we can override its length, for
example to recycle unitary vectors to a size of choice.
coalesce(NA, 1)## [1] 1
coalesce(NA, 1, .size = 6)## [1] 1 1 1 1 1 1
A grouped data frame doesn’t condition coalesce().
df %>%
group_by(Invoice) %>%
filter(StockCode == "85166B") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, "missing customer"), .keep = "used")