How to Merge Multiple Pandas DataFrames in a Loop
Suppose we have multiple DataFrames sitting in a list.
dfs = [df1,df2,df3]
Example Scenario
We want to merge these three DataFrames into a single DataFrame.
date col1
2021-01 1
2021-02 2
2021-03 3
date col2
2021-01 4
2021-02 5
2021-03 6
date col3
2021-01 7
2021-02 8
2021-03 9
In order to merge these DataFrames, we need a column to merge over. Suppose we want to merge over the date
column.
date col1 col2 col3
2021-01 1 4 7
2021-02 2 5 8
2021-03 3 6 9
Merge using reduce()
We can merge using reduce()
, which will apply some function to an iterable and reduce it to a single cumulative value.
from functools import reduce
merged_df = reduce(lambda l, r: pd.merge(l, r, on='date', how='inner'), dfs)
The reduce()
function will apply the provided lambda
function on each list element. We’ll notice that the function takes two arguments l
and r
. We are passing in the current list element along with the result of the previous iteration. It becomes a running total of all previous iterations.
DataFrame merging steps with reduce()
Suppose we had n
DataFrames to merge.
- In the
1st
iteration, the first2
DataFrames will merge. - In the
2nd
iteration, the third DataFrame will merge with the result of the1st
iteration (i.e. the merge of the first2
DataFrames) - In the
3rd
iteration, the fourth DataFrame will merge with the result of the2nd
iteration (i.e. the merge of the first3
DataFrames) - In the
nth
iteration, the(n+1)th
DataFrame will merge with the result the(n-1)th
iteration (i.e. the merge of the firstn
DataFrames)