- fundamentals

coalesce() takes more than one vector and,

(x <- c("x", "x", NA, "x"))
## [1] "x" "x" NA  "x"
(y <- c("y", "y", "y", "y"))
## [1] "y" "y" "y" "y"

when the first one specified has NAs elements, they are replaced with elements from the following one that share the same positions.

coalesce(x, y)
## [1] "x" "x" "y" "x"

The replacement doesn’t work backwards so when there are no NAs to replace in the first vector specified, coalesce() just returns it.

coalesce(y, x)
## [1] "y" "y" "y" "y"

In case two elements from different vectors are eligible to substitute, coalesce() uses the the one from the vector specified first.

(z <- c("z", "z", "z", "z"))
## [1] "z" "z" "z" "z"
coalesce(x, y, z)
## [1] "x" "x" "y" "x"
coalesce(x, z, y)
## [1] "x" "x" "z" "x"

The second vector can also be a scalar, in this case it will be recycled along the length of the first vector.

coalesce(x, "w")
## [1] "x" "x" "w" "x"

- usage with a data frame

In a data frame, we can use this last property to replace the NAs in a column with another value of choice.

df %>%
  filter(StockCode == "85166B") %>%
  mutate(`Customer ID` = as.character(`Customer ID`),
         New_CustomerID = coalesce(`Customer ID`, "missing customer"), .keep = "used")
A tibble: 39 x 2

Instead of a single value, we can use several if the length of the vector containing the new values is equal to the size of the column containing the values to be replaced.

df %>%
  filter(Invoice == "C489881") %>%
  mutate(`Customer ID` = as.character(`Customer ID`),
         New_CustomerID = coalesce(`Customer ID`, c("first missing customer", "second missing customer")), .keep = "used")
A tibble: 2 x 2

Recycling when length > 1 is not allowed in dplyr, even if we use lowest common denominators of the values to be replaced (2 and 8 in the following example),

df %>%
  filter(StockCode == "85166B") %>%
  select(`Customer ID`)
A tibble: 39 x 1
df %>%
  filter(StockCode == "85166B") %>%
  mutate(`Customer ID` = as.character(`Customer ID`),
         New_CustomerID = coalesce(`Customer ID`, c("first missing customer", "second missing customer")), .keep = "used")
Error in `mutate()`:
ℹ In argument: `New_CustomerID = coalesce(`Customer ID`,
  c("first missing customer", "second missing customer"))`.
Caused by error in `coalesce()`:
! Can't recycle `..1` (size 39) to match `..2` (size 2).

or of the size of the column (3 and 39 in this one).

df %>%
  filter(StockCode == "85166B") %>%
  mutate(`Customer ID` = as.character(`Customer ID`),
         New_CustomerID = coalesce(`Customer ID`, c("first missing customer", "second missing customer", "third missing costumer")), .keep = "used")
Error in `mutate()`:
ℹ In argument: `New_CustomerID = coalesce(...)`.
Caused by error in `coalesce()`:
! Can't recycle `..1` (size 39) to match `..2` (size 3).

We can by all means substitute the missing values of one column with the values of another, as per our introduction to coalesce().

df %>%
  filter(StockCode == "85166B") %>%
  mutate(`Customer ID` = as.character(`Customer ID`),
         For_missing_CustID = "absent",
         New_CustomerID = coalesce(`Customer ID`, For_missing_CustID), .keep = "used")
A tibble: 39 x 3

- with functions

coalesce() accepts functions as the values to be replaced.

pos <- c("first ", "second ")
cust <- "missing customer"
df %>%
  filter(Invoice == "C489881") %>%
  mutate(`Customer ID` = as.character(`Customer ID`),
         New_CustomerID = coalesce(`Customer ID`, paste0(pos, cust)), .keep = "used")
A tibble: 2 x 2

- .ptype

With the .ptype argument we can change the type of the output.

coalesce(c(1, NA, 1, 1), 0, .ptype = logical())
## [1]  TRUE FALSE  TRUE  TRUE

- .size

And with the .size one we can override its length, for example to recycle unitary vectors to a size of choice.

coalesce(NA, 1)
## [1] 1
coalesce(NA, 1, .size = 6)
## [1] 1 1 1 1 1 1

- with group_by()

A grouped data frame doesn’t condition coalesce().

df %>%
  group_by(Invoice) %>%
  filter(StockCode == "85166B") %>%
  mutate(`Customer ID` = as.character(`Customer ID`),
         New_CustomerID = coalesce(`Customer ID`, "missing customer"), .keep = "used")
A tibble: 39 x 3
Groups: Invoice [39]