coalesce()
takes more than one vector and,
<- c("x", "x", NA, "x")) (x
## [1] "x" "x" NA "x"
<- c("y", "y", "y", "y")) (y
## [1] "y" "y" "y" "y"
when the first one specified has NAs elements, they are replaced with elements from the following one that share the same positions.
coalesce(x, y)
## [1] "x" "x" "y" "x"
The replacement doesn’t work backwards so when there are no NAs to
replace in the first vector specified, coalesce()
just
returns it.
coalesce(y, x)
## [1] "y" "y" "y" "y"
In case two elements from different vectors are eligible to
substitute, coalesce()
uses the the one from the vector
specified first.
<- c("z", "z", "z", "z")) (z
## [1] "z" "z" "z" "z"
coalesce(x, y, z)
## [1] "x" "x" "y" "x"
coalesce(x, z, y)
## [1] "x" "x" "z" "x"
The second vector can also be a scalar, in this case it will be recycled along the length of the first vector.
coalesce(x, "w")
## [1] "x" "x" "w" "x"
In a data frame, we can use this last property to replace the NAs in a column with another value of choice.
%>%
df filter(StockCode == "85166B") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, "missing customer"), .keep = "used")
Instead of a single value, we can use several if the length of the vector containing the new values is equal to the size of the column containing the values to be replaced.
%>%
df filter(Invoice == "C489881") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, c("first missing customer", "second missing customer")), .keep = "used")
Recycling when length > 1 is not allowed in dplyr
,
even if we use lowest common denominators of the values to be replaced
(2 and 8 in the following example),
%>%
df filter(StockCode == "85166B") %>%
select(`Customer ID`)
%>%
df filter(StockCode == "85166B") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, c("first missing customer", "second missing customer")), .keep = "used")
Error in `mutate()`:
ℹ In argument: `New_CustomerID = coalesce(`Customer ID`,
c("first missing customer", "second missing customer"))`.
Caused by error in `coalesce()`:
! Can't recycle `..1` (size 39) to match `..2` (size 2).
or of the size of the column (3 and 39 in this one).
%>%
df filter(StockCode == "85166B") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, c("first missing customer", "second missing customer", "third missing costumer")), .keep = "used")
Error in `mutate()`:
ℹ In argument: `New_CustomerID = coalesce(...)`.
Caused by error in `coalesce()`:
! Can't recycle `..1` (size 39) to match `..2` (size 3).
We can by all means substitute the missing values of one column with
the values of another, as per our introduction to
coalesce()
.
%>%
df filter(StockCode == "85166B") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
For_missing_CustID = "absent",
New_CustomerID = coalesce(`Customer ID`, For_missing_CustID), .keep = "used")
coalesce()
accepts functions as the values to be
replaced.
<- c("first ", "second ")
pos <- "missing customer" cust
%>%
df filter(Invoice == "C489881") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, paste0(pos, cust)), .keep = "used")
With the .ptype
argument we can change the type of the
output.
coalesce(c(1, NA, 1, 1), 0, .ptype = logical())
## [1] TRUE FALSE TRUE TRUE
And with the .size
one we can override its length, for
example to recycle unitary vectors to a size of choice.
coalesce(NA, 1)
## [1] 1
coalesce(NA, 1, .size = 6)
## [1] 1 1 1 1 1 1
A grouped data frame doesn’t condition coalesce()
.
%>%
df group_by(Invoice) %>%
filter(StockCode == "85166B") %>%
mutate(`Customer ID` = as.character(`Customer ID`),
New_CustomerID = coalesce(`Customer ID`, "missing customer"), .keep = "used")