How to Get Column Substring in a Pandas DataFrame
Suppose we want to create a new column in our DataFrame that is simply a substring of another column in that DataFrame.
Or maybe we want to update a single column with the substring of its own contents.
We can achieve this using str
.
Substring with str
Suppose we only want the first n
characters of a column string.
We can create a new column with either approach below.
df['new_col'] = df['col'].str[:n]
df['new_col'] = df['col'].str.slice(0,n) # Same output
We can update a column by simply changing the column in the lefthand portion of the line.
df['col'] = df['col'].str[:n]
df['col'] = df['col'].str.slice(0,n)
Ensure the column is a string
We may not be able to run the substring operation if the column dtype
is a string.
In those scenarios, we’ll need to cast first, and then run the operation above.
df['col'] = df['col'].astype(str).str[:n]
df['col'] = df['col'].astype(str).str.slice(0,n)