With the bind family of functions, consisting of bind_rows() and bind_cols(), we can merge data frames.

- bind_rows()

bind_rows() stacks data frames vertically, by the order they are specified in the function.

(df_first_rows <- slice_head(df, n = 5))
A tibble: 5 x 8
(df_second_rows <- slice_tail(df, n = 5))
A tibble: 5 x 8
bind_rows(df_first_rows, 
          df_second_rows)
A tibble: 10 x 8

bind_rows() uses their names to bind the columns with each other, so if there are columns’ names that are not shared, like Invoice_copy in df_first_rows for example, that column will get NAs in the rows of the data frame where it is not present.

bind_rows(df_first_rows %>%
            mutate(Invoice_Copy = Invoice, .after = "Invoice"), 
          df_second_rows)
A tibble: 10 x 9

One frequent occasion of this event is when one data frame has less columns.

bind_rows(df_first_rows %>% 
            select(2:8), 
          df_second_rows)
A tibble: 10 x 8
bind_rows(df_first_rows, 
          df_second_rows %>%
            select(2:8))
A tibble: 10 x 8

Notice how in the first example the missing column has been moved last: this is because the output’s columns’ order is taken from the first data frame.

In the most extreme case, where there are no shared columns’ names, we will still get an output even if the data frames don’t share any column.

bind_rows(df_first_rows %>% 
            select(Country),
          df_second_rows %>% 
            select(Invoice))
A tibble: 10 x 2

The type of columns must be the same to do the merging.

bind_rows(df_first_rows %>% 
            mutate(`Customer ID` = as.character(`Customer ID`)), 
          df_second_rows)
Error in `bind_rows()`:
! Can't combine `..1$Customer ID` <character> and `..2$Customer ID` <double>.

As already seen, the order of the columns is inherited from the first data frame.

bind_rows(df_first_rows %>% 
            select(8:1), 
          df_second_rows)
A tibble: 10 x 8
bind_rows(df_first_rows, 
          df_second_rows  %>% 
            select(8:1)) 
A tibble: 10 x 8

bind_rows() can also be used to merge together a list of data frames.

group_split(df, Country) %>%
  bind_rows()
A tibble: 525461 x 8

We can also merge data frames with vectors

bind_rows(df_first_rows,
          c(Country = "222", StockCode = "222222"))
A tibble: 6 x 8

or vectors with vectors,

bind_rows(c(Country = "111", StockCode = "111111"),
          c(Country = "222", StockCode = "222222"))
A tibble: 2 x 2

just as long as they are named though.

bind_rows(df_first_rows,
          c("222", "222222"))
Error in `bind_rows()`:
! Argument 2 must be a data frame or a named atomic vector.
bind_rows(c("111", "111111"),
          c("222", "222222"))
Error in `bind_rows()`:
! Argument 1 must be a data frame or a named atomic vector.

If they don’t share names we will get NAs for the non-matching columns, as with data frames.

bind_rows(df_first_rows,
          c(Country = "222", StockPosition = "222222"))
A tibble: 6 x 9
bind_rows(df_first_rows,
          c(StockPosition = "222222"))
A tibble: 6 x 9
bind_rows(c(Country = "111", StockPosition = "111111"),
          c(Country = "222", StockCode = "222222"))
A tibble: 2 x 3

- .id

We can identify from which data frame every row comes from with the .id argument, which will create a new column with a name of our choice.

bind_rows(df_first_rows, df_second_rows, .id = "N_of_df")
A tibble: 10 x 9

The default way to differentiate the data frames is a progressive number but if the arguments of bind_rows() are named those will be used instead.

bind_rows("number_one" = df_first_rows, 
          "number_two" = df_second_rows, .id = "N_of_df")
A tibble: 10 x 9

- bind_cols()

bind_cols() merges data frames horizontally.

(df_first_cols <- select(df, 1:4))
A tibble: 525461 x 4
(df_second_cols <- select(df, 5:8))
A tibble: 525461 x 4
bind_cols(df_first_cols, 
          df_second_cols)
A tibble: 525461 x 8

With this function the matching happens by row indexes, meaning that row index 1 of df_first_cols will bind with row index 1 of df_second_cols.

Consequently, the number of rows must either be the same or at least one data frame’s number of rows must be equal to one, in which case that row will be recycled.

bind_cols(df_first_cols %>% 
            slice(1), 
          df_second_cols)
A tibble: 525461 x 8

Divisors are not recycled.

bind_cols(df_first_cols %>% 
            slice(1:2), 
          df_second_cols %>% 
            slice(1:4))
Error in `bind_cols()`:
! Can't recycle `..1` (size 2) to match `..2` (size 4).

If we want to bind_cols() two data frames that have the same columns’ names, the function will add numeric suffixes to differentiate.

bind_cols(df_first_rows %>% 
            slice(1:4),
          df_second_rows %>% 
            slice(1:4))
New names:
• `Invoice` -> `Invoice...1`
• `StockCode` -> `StockCode...2`
• `Description` -> `Description...3`
• `Quantity` -> `Quantity...4`
• `InvoiceDate` -> `InvoiceDate...5`
• `Price` -> `Price...6`
• `Customer ID` -> `Customer ID...7`
• `Country` -> `Country...8`
• `Invoice` -> `Invoice...9`
• `StockCode` -> `StockCode...10`
• `Description` -> `Description...11`
• `Quantity` -> `Quantity...12`
• `InvoiceDate` -> `InvoiceDate...13`
• `Price` -> `Price...14`
• `Customer ID` -> `Customer ID...15`
• `Country` -> `Country...16`
A tibble: 4 x 16

- .name_repair

the .name_repair argument can change how this works (check https://tibble.tidyverse.org/reference/as_tibble.html and https://vctrs.r-lib.org/reference/vec_as_names.html for further details).

We can merge a data frame with a column, but that has to be named.

bind_cols(df_first_cols, 
          "Customer ID" = df$`Customer ID`)
A tibble: 525461 x 5

Otherwise the column name will be dictated by the .name_repair default.

bind_cols(df_first_cols, 
          df$`Customer ID`)
New names:
• `` -> `...5`
A tibble: 525461 x 5

- with group_by()

In case of grouped data frames, the output will inherit the groups of the first one.

bind_rows(df_first_rows %>%
            group_by(Country), 
          df_second_rows %>%
            group_by(`Customer ID`))
A tibble: 10 x 8
Groups: Country [1]
bind_cols(df_first_cols %>%
            group_by(StockCode),
          df_second_cols %>%
            group_by(Country))
A tibble: 525461 x 8
Groups: StockCode [4631]