How to Keep Certain Columns in a Pandas DataFrame
Let’s see how we can keep specific columns in a Pandas DataFrame (while dropping the rest).
Suppose we have a DataFrame df
that contains the following columns in this order: drop1
, keep1
, drop2
, keep2
.
drop1 keep1 drop2 keep2
0 1 2 3 4
1 5 6 7 8
Let’s say we want to keep keep1
and keep2
and drop drop1
and drop2
.
We can either keep the columns we need or drop the columns we don’t need (whichever is simpler for our use case).
1. Keep columns we need
1.1. Keep columns by name
We have multiple ways to keep columns by name in a DataFrame.
df = df[['keep1', 'keep2']]
df = df[df.columns[df.columns.isin(['keep1', 'keep2'])]]
1.2. Keep columns by index
We can also use indexes to specify the columns to keep. Based on the order specified above, we would keep columns at index 1
and 3
.
df = df[df.columns[[1, 3]]]
2. Drop columns we don’t need
2.1. Drop columns by name
We have multiple ways to drop columns by name in a DataFrame.
df.drop(['drop1', 'drop2'], axis=1, inplace=True)
df = df[df.columns[~df.columns.isin(['drop1', 'drop2'])]]
df = df[df.columns.difference(['drop1', 'drop2'])] # Reorders columns alphabetically
2.2. Drop columns by index
Naturally, we can drop columns by index.
df.drop(df.columns[[0, 2]], axis=1, inplace=True)