How to Keep Certain Columns in a Pandas DataFrame


Let’s see how we can keep specific columns in a Pandas DataFrame (while dropping the rest).

Suppose we have a DataFrame df that contains the following columns in this order: drop1, keep1, drop2, keep2.

   drop1  keep1  drop2  keep2
0      1      2      3      4
1      5      6      7      8

Let’s say we want to keep keep1 and keep2 and drop drop1 and drop2.

We can either keep the columns we need or drop the columns we don’t need (whichever is simpler for our use case).

1. Keep columns we need

1.1. Keep columns by name

We have multiple ways to keep columns by name in a DataFrame.

df = df[['keep1', 'keep2']]
df = df[df.columns[df.columns.isin(['keep1', 'keep2'])]]

1.2. Keep columns by index

We can also use indexes to specify the columns to keep. Based on the order specified above, we would keep columns at index 1 and 3.

df = df[df.columns[[1, 3]]]

2. Drop columns we don’t need

2.1. Drop columns by name

We have multiple ways to drop columns by name in a DataFrame.

df.drop(['drop1', 'drop2'], axis=1, inplace=True)
df = df[df.columns[~df.columns.isin(['drop1', 'drop2'])]]
df = df[df.columns.difference(['drop1', 'drop2'])] # Reorders columns alphabetically

2.2. Drop columns by index

Naturally, we can drop columns by index.

df.drop(df.columns[[0, 2]], axis=1, inplace=True)