How to Get Column Substring in a Pandas DataFrame


Suppose we want to create a new column in our DataFrame that is simply a substring of another column in that DataFrame.

Or maybe we want to update a single column with the substring of its own contents.

We can achieve this using str.

Substring with str

Suppose we only want the first n characters of a column string.

We can create a new column with either approach below.

df['new_col'] = df['col'].str[:n]
df['new_col'] = df['col'].str.slice(0,n) # Same output

We can update a column by simply changing the column in the lefthand portion of the line.

df['col'] = df['col'].str[:n]
df['col'] = df['col'].str.slice(0,n)

Ensure the column is a string

We may not be able to run the substring operation if the column dtype is a string.

In those scenarios, we’ll need to cast first, and then run the operation above.

df['col'] = df['col'].astype(str).str[:n]
df['col'] = df['col'].astype(str).str.slice(0,n)