- the magrittr pipe operator %>%

In this handbook we extensively used the pipe operator %>%, which permits us to avoid using intermediate objects or nesting several functions, like in the following example.

slice_head(filter(df, Country == "United Kingdom"), n = 5)

A tibble: 5 x 8

That results in lines of code that are easier to decipher and to extend.

df %>%
  filter(Country == "United Kingdom") %>%
  slice_head(n = 5)

A tibble: 5 x 8

The pipe operator works by feeding the object on its left as the first argument of the function on its right. Many dplyr functions have .data as the first argument, so we can see why the pipe is so useful and largely adopted, as with it we can “carry” the data frame through different transformations in one single “piece” of code.

The pipe operator can also be nested inside functions, usually to modify the value of an argument.

df %>%
  filter(Country == "United Kingdom") %>%
  slice_head(n = 5.4 %>% floor)

A tibble: 5 x 8

Notice how we wrote floor instead of floor(). This is another property of the pipe: when only one argument is needed, we can omit the empty parentheses.

- the dot as a placeholder

When instead we want to feed the object on the left to an argument that is not the first, we can use a dot (.) as a placeholder.

df %>% 
  pull(Price) %>% 
  slice_min(df, order_by = ., n = 10, with_ties = FALSE)

A tibble: 10 x 8

In other cases we might need to use the object on the left several times, usually as .data and for one of its properties, like one of its dimensions,

df %>%
  filter(row_number() < nrow(df) / 2)

A tibble: 262730 x 8

but it is not a problem to use the dot as many times as we need.

df %>%
  filter(., row_number() < nrow(.) / 2)

A tibble: 262730 x 8

Plus it is necessary in case we performed some transformations on the object, as df will refer to the unmodified one.

df %>%
  slice(1:5) %>%
  filter(., row_number() < nrow(df) / 2)

A tibble: 5 x 8

df %>%
  slice(1:5) %>%
  filter(., row_number() < nrow(.) / 2)

A tibble: 2 x 8

In the last examples we specified the dot also as the first argument but it is not really necessary as, when we use the placeholder inside a nested function (nrow() in our case), the default is use the object on the left as the first argument of the nesting function as well. So we usually remove it.

df %>%
  slice(1:5) %>%  
  filter(row_number() < nrow(.) / 2)

A tibble: 2 x 8

This default can be problematic when we use functions that don’t need .data as the first argument but we can override this behavior by embracing the function on the right with curly braces ({}).

df %>% 
  slice(1:5) %>%  
  pull(Price) %>%
  {c(mean(.), sd(.))}

## [1] 4.760000 2.833373

Without the curly braces dplyr will use the object on the left as the first argument, effectively concatenating it in this case with the wished output.

df %>% 
  slice(1:5) %>%
  pull(Price) %>%
  c(mean(.), sd(.))

## [1] 6.950000 6.750000 6.750000 2.100000 1.250000 4.760000 2.833373

Also, we can freely pipe functions like lm() and use the dot notation in their formulas, as it will not be mistaken for the placeholder (that we here used for the data argument).

df %>%
  select(Quantity, Price) %>%
  lm(Price ~ ., .)

## 
## Call:
## lm(formula = Price ~ ., data = .)
## 
## Coefficients:
## (Intercept)     Quantity  
##    4.715993    -0.002627

We must be careful with grouped data frames though, as the dot placeholder doesn’t refer to each specific group but to the whole data frame, so in this case it is not the the number of rows of each group that is divided by 2 but the overall total number of rows of the whole data frame.

df %>%
  group_by(Country) %>%
  filter(., row_number() < nrow(.) / 2)

A tibble: 302339 x 8

Groups: Country [40]

Just like if we used df instead of . inside nrow().

df %>%
  group_by(Country) %>%
  filter(row_number() < nrow(df) / 2)

A tibble: 302339 x 8

Groups: Country [40]

If we wanted to preserve, for each group, the rows whose index is less than half its number of rows, we can use n(), which instead refers to the number of rows of the group.

df %>%
  group_by(Country) %>%
  filter(row_number() < n() / 2)

A tibble: 262698 x 8

Groups: Country [40]

The pipe has other functionalities but, as they are not strictly related to dplyr, they will not be discussed here. The reference manual (https://cran.r-project.org/web/packages/magrittr/magrittr.pdf) is a good place to start to investigate them.

- other magrittr operators

The pipe operator %>% is from the magrittr package and it is loaded when loading dplyr. magrittr has other operators though and when we want to use them we need to load the package.

library(magrittr)

- the “tee” pipe %T>%

The “tee” pipe %T>% lets you “bypass” a function in the chain while still outputting its results, essentially “carrying” the object on its left to the function after the immediately next one. This can be useful when we want an output from a function but this output it is not usable by the following one. For example we might want to output both a graph and a summary table from a data frame.

df %>%
  filter(Country == "Korea") %>%
  select(Quantity) %T>%
  plot() %>%
  table()

## Quantity
## -48 -12  -8  -6  -5  -4  -3   3   4   5   6   8   9  10  12  24  36  48 
##   1   1   1   3   1   2   1   1   2   2  17   3   1   4  11   8   1   3

- the “exposition” pipe %$%

If we want to pipe functions, like many base R ones, that work with vectors and don’t have .data as their first argument, we can use the “exposition” pipe %$%, which “exposes” the column’s names of the data frame to made them usable, for example, by a function like table().

df %>%
  table(Country)

## Error: object 'Country' not found

df %$%
  table(Country)

## Country
##            Australia              Austria              Bahrain 
##                  654                  537                  107 
##              Belgium              Bermuda               Brazil 
##                 1054                   34                   62 
##               Canada      Channel Islands               Cyprus 
##                   77                  906                  554 
##              Denmark                 EIRE              Finland 
##                  428                 9670                  354 
##               France              Germany               Greece 
##                 5772                 8129                  517 
##            Hong Kong              Iceland               Israel 
##                   76                   71                   74 
##                Italy                Japan                Korea 
##                  731                  224                   63 
##              Lebanon            Lithuania                Malta 
##                   13                  154                  172 
##          Netherlands              Nigeria               Norway 
##                 2769                   32                  369 
##               Poland             Portugal                  RSA 
##                  194                 1101                  111 
##            Singapore                Spain               Sweden 
##                  117                 1278                  902 
##          Switzerland             Thailand United Arab Emirates 
##                 1187                   76                  432 
##       United Kingdom          Unspecified                  USA 
##               485852                  310                  244 
##          West Indies 
##                   54

Another way to circumvent the issue is by pulling or selecting the vector from the data frame with the appropriate function.

df %>%
  pull(Country) %>%
  table()

## .
##            Australia              Austria              Bahrain 
##                  654                  537                  107 
##              Belgium              Bermuda               Brazil 
##                 1054                   34                   62 
##               Canada      Channel Islands               Cyprus 
##                   77                  906                  554 
##              Denmark                 EIRE              Finland 
##                  428                 9670                  354 
##               France              Germany               Greece 
##                 5772                 8129                  517 
##            Hong Kong              Iceland               Israel 
##                   76                   71                   74 
##                Italy                Japan                Korea 
##                  731                  224                   63 
##              Lebanon            Lithuania                Malta 
##                   13                  154                  172 
##          Netherlands              Nigeria               Norway 
##                 2769                   32                  369 
##               Poland             Portugal                  RSA 
##                  194                 1101                  111 
##            Singapore                Spain               Sweden 
##                  117                 1278                  902 
##          Switzerland             Thailand United Arab Emirates 
##                 1187                   76                  432 
##       United Kingdom          Unspecified                  USA 
##               485852                  310                  244 
##          West Indies 
##                   54

df %>% 
  select(Country) %>%
  table()

## Country
##            Australia              Austria              Bahrain 
##                  654                  537                  107 
##              Belgium              Bermuda               Brazil 
##                 1054                   34                   62 
##               Canada      Channel Islands               Cyprus 
##                   77                  906                  554 
##              Denmark                 EIRE              Finland 
##                  428                 9670                  354 
##               France              Germany               Greece 
##                 5772                 8129                  517 
##            Hong Kong              Iceland               Israel 
##                   76                   71                   74 
##                Italy                Japan                Korea 
##                  731                  224                   63 
##              Lebanon            Lithuania                Malta 
##                   13                  154                  172 
##          Netherlands              Nigeria               Norway 
##                 2769                   32                  369 
##               Poland             Portugal                  RSA 
##                  194                 1101                  111 
##            Singapore                Spain               Sweden 
##                  117                 1278                  902 
##          Switzerland             Thailand United Arab Emirates 
##                 1187                   76                  432 
##       United Kingdom          Unspecified                  USA 
##               485852                  310                  244 
##          West Indies 
##                   54

- the “assignment” pipe %<>%

This pipe is used to assign the output of a chain to its first element, besides outputting it. It must be used as the first pipe in the chain.

library(ggplot2) 
UK_clients_plot <- df
UK_clients_plot %<>%
  filter(Country == "United Kingdom") %>%
  ggplot(aes(`Customer ID`)) +
  geom_bar()

## Warning: Removed 106429 rows containing non-finite values (`stat_count()`).

Without it we would have written

UK_clients_plot <- df %>%
  filter(Country == "United Kingdom") %>%
  ggplot(aes(`Customer ID`)) +
  geom_bar()

but that doesn’t show the graph, unless we wrap everything with parentheses.

(UK_clients_plot <- df %>%
  filter(Country == "United Kingdom") %>%
  ggplot(aes(`Customer ID`)) +
  geom_bar())

## Warning: Removed 106429 rows containing non-finite values (`stat_count()`).

Be aware that it can be quite dangerous to use as it rewrites the first element of the chain (that is why we first copied df to UK_clients_plot, to not overwrite our data frame with a plot).

It can be useful to quickly update a column though,

df %$%
  table(Country, useNA = "ifany")

## Country
##            Australia              Austria              Bahrain 
##                  654                  537                  107 
##              Belgium              Bermuda               Brazil 
##                 1054                   34                   62 
##               Canada      Channel Islands               Cyprus 
##                   77                  906                  554 
##              Denmark                 EIRE              Finland 
##                  428                 9670                  354 
##               France              Germany               Greece 
##                 5772                 8129                  517 
##            Hong Kong              Iceland               Israel 
##                   76                   71                   74 
##                Italy                Japan                Korea 
##                  731                  224                   63 
##              Lebanon            Lithuania                Malta 
##                   13                  154                  172 
##          Netherlands              Nigeria               Norway 
##                 2769                   32                  369 
##               Poland             Portugal                  RSA 
##                  194                 1101                  111 
##            Singapore                Spain               Sweden 
##                  117                 1278                  902 
##          Switzerland             Thailand United Arab Emirates 
##                 1187                   76                  432 
##       United Kingdom          Unspecified                  USA 
##               485852                  310                  244 
##          West Indies 
##                   54

df$Country %<>% na_if("Unspecified")
df %$%
  table(Country, useNA = "ifany")

## Country
##            Australia              Austria              Bahrain 
##                  654                  537                  107 
##              Belgium              Bermuda               Brazil 
##                 1054                   34                   62 
##               Canada      Channel Islands               Cyprus 
##                   77                  906                  554 
##              Denmark                 EIRE              Finland 
##                  428                 9670                  354 
##               France              Germany               Greece 
##                 5772                 8129                  517 
##            Hong Kong              Iceland               Israel 
##                   76                   71                   74 
##                Italy                Japan                Korea 
##                  731                  224                   63 
##              Lebanon            Lithuania                Malta 
##                   13                  154                  172 
##          Netherlands              Nigeria               Norway 
##                 2769                   32                  369 
##               Poland             Portugal                  RSA 
##                  194                 1101                  111 
##            Singapore                Spain               Sweden 
##                  117                 1278                  902 
##          Switzerland             Thailand United Arab Emirates 
##                 1187                   76                  432 
##       United Kingdom                  USA          West Indies 
##               485852                  244                   54 
##                 <NA> 
##                  310

instead of writing

df <- df %>%
  mutate(Country = na_if(Country, "Unspecified"))

- the R pipe |>

Since version 4.1, also base R has its pipe operator, |>. Its purpose is the same as the magrittr pipe %>% but it has less functionalities (no dots as placeholders for example). In case of interest, there are further information at this link: https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/