How to Get the First Row Meeting a Condition in Pandas
How can we get the first row in a Pandas DataFrame that meets some condition or criteria?
Let’s say we have this DataFrame df
.
id year period value
0 000e 1976 M01 7.3
1 000e 1976 M02 7.3
2 000e 1976 M03 7.3
3 000f 1976 M04 720
4 000f 1976 M05 710
Suppose we want the index of the first row whose id
ends with an f
(so we want an index of 4
).
Create the filtering logic
Let’s create our filtering logic to get all rows whose id
ends with f
.
df[df.id.str.endswith('f')]
Get the index
Using index
We can get the row index using .index[0]
.
index = df[df.id.str.endswith('f')].index[0]
Using iloc
We could also use iloc[0]
to achieve the same functionality.
index = df[df.id.str.endswith('f')].iloc[0]
id 000f
year 1976
period M04
value 720
Name: 4, dtype: object
This will give us the first row that meets our condition. We can obtain the actual index by accessing the name
attribute.
index = df[df.id.str.endswith('f')].iloc[0].name
Get all rows until that index
If we wanted to, we could get all rows up until that index that we obtained earlier.
df.iloc[:index,:]
Alternative approaches
If we’re working with a large DataFrame, it might be wasteful to apply a filter on the entire DataFrame just to extract the first row.
Ideally, we want to return the first row that meets the criteria without iterating or scanning through the other rows.
If we know that the row meeting the criteria will be one of the first ~10k
rows, then a simple for
loop might be more performant than the original solution.
def get_first_row_with_condition(condition, df):
for i in range(len(df)):
if condition(df.iloc[i]):
break
return i
Then, we can use this function like so:
index = get_first_row_with_condition(lambda x: np.char.endswith(x.id.endswith('f'), df)