How to Get the First Row Meeting a Condition in Pandas
How can we get the first row in a Pandas DataFrame that meets some condition or criteria?
Let’s say we have this DataFrame
id year period value 0 000e 1976 M01 7.3 1 000e 1976 M02 7.3 2 000e 1976 M03 7.3 3 000f 1976 M04 720 4 000f 1976 M05 710
Suppose we want the index of the first row whose
id ends with an
f (so we want an index of
Create the filtering logic
Let’s create our filtering logic to get all rows whose
id ends with
Get the index
We can get the row index using
index = df[df.id.str.endswith('f')].index
We could also use
iloc to achieve the same functionality.
index = df[df.id.str.endswith('f')].iloc
id 000f year 1976 period M04 value 720 Name: 4, dtype: object
This will give us the first row that meets our condition. We can obtain the actual index by accessing the
index = df[df.id.str.endswith('f')].iloc.name
Get all rows until that index
If we wanted to, we could get all rows up until that index that we obtained earlier.
If we’re working with a large DataFrame, it might be wasteful to apply a filter on the entire DataFrame just to extract the first row.
Ideally, we want to return the first row that meets the criteria without iterating or scanning through the other rows.
If we know that the row meeting the criteria will be one of the first
~10k rows, then a simple
for loop might be more performant than the original solution.
def get_first_row_with_condition(condition, df): for i in range(len(df)): if condition(df.iloc[i]): break return i
Then, we can use this function like so:
index = get_first_row_with_condition(lambda x: np.char.endswith(x.id.endswith('f'), df)