How to Merge Multiple Pandas DataFrames in a Loop


Suppose we have multiple DataFrames sitting in a list.

dfs = [df1,df2,df3]

Example Scenario

We want to merge these three DataFrames into a single DataFrame.

   date   col1
2021-01      1
2021-02      2
2021-03      3
   date   col2
2021-01      4
2021-02      5
2021-03      6
   date   col3
2021-01      7
2021-02      8
2021-03      9

In order to merge these DataFrames, we need a column to merge over. Suppose we want to merge over the date column.

   date  col1  col2  col3
2021-01     1     4     7
2021-02     2     5     8
2021-03     3     6     9

Merge using reduce()

We can merge using reduce(), which will apply some function to an iterable and reduce it to a single cumulative value.

from functools import reduce
merged_df = reduce(lambda l, r: pd.merge(l, r, on='date', how='inner'), dfs)

The reduce() function will apply the provided lambda function on each list element. We’ll notice that the function takes two arguments l and r. We are passing in the current list element along with the result of the previous iteration. It becomes a running total of all previous iterations.

DataFrame merging steps with reduce()

Suppose we had n DataFrames to merge.

  1. In the 1st iteration, the first 2 DataFrames will merge.
  2. In the 2nd iteration, the third DataFrame will merge with the result of the 1st iteration (i.e. the merge of the first 2 DataFrames)
  3. In the 3rd iteration, the fourth DataFrame will merge with the result of the 2nd iteration (i.e. the merge of the first 3 DataFrames)
  4. In the nth iteration, the (n+1)th DataFrame will merge with the result the (n-1)th iteration (i.e. the merge of the first n DataFrames)