How to Fix "ValueError" While Merging DataFrames in Pandas
Suppose we’re trying to merge multiple DataFrames.
Example Scenario
We have DataFrame df1
:
year col1
0 2010 1
1 2011 2
2 2012 3
3 2013 4
4 2014 5
And DataFrame df2
:
year col2
0 2010 6
1 2011 7
2 2012 8
3 2013 9
4 2014 10
We’ll try merging.
merged_df = df1.merge(df2, on=['year'], how='inner')
We might end up with an error message like this.
ValueError: You are trying to merge on object and int64 columns.
If you wish to proceed you should use pd.concat
Change the column type
The error message is telling us that there’s a type error on the column we are merging over.
The first step would be to check the on=[]
list in our call to merge()
, and check the dtypes
for those columns.
print(df1.dtypes)
print(df2.dtypes)
In our scenario, maybe year
is an int
in df1
, but in df2
, year
is a str
.
We can first cast that column to some common datatype, and then merge.
df1['year'] = df1['year'].astype(int)
df2['year'] = df2['year'].astype(int)
Note that we would need to check each column we’re merging over.