How to Remove Duplicate Columns on Join in a Spark DataFrame
How can we perform a join between two Spark DataFrames without any duplicate columns?
Suppose we have two DataFrames:
df2, both with columns
We want to join
df2 over column
col, so we might run a join like this:
joined = df1.join(df2, df1.col == df2.col)
Join DataFrames without duplicate columns
We can specify the join column using an array or a string to prevent duplicate columns.
joined = df1.join(df2, ["col"]) # OR joined = df1.join(df2, "col")