How to Drop First n Rows of a Column Group in a Pandas DataFrame
How can we drop any number of rows of a column group in a Pandas DataFrame?
Example Scenario
Suppose we’re dealing with a DataFrame with a month
and value
column.
month value
0 1 1.0
1 1 2.0
2 1 3.0
3 2 4.0
4 2 5.0
5 2 6.0
6 3 7.0
7 3 8.0
8 3 9.0
9 4 10.0
10 4 11.0
11 4 12.0
We want to get the first value of each unique month
.
We can achieve this using groupby()
.
Get first row of each group
Let’s use nth(0)
to get just the first row of each group.
df.groupby('month').nth(0)
This will give us an output that looks like this.
month value
1 1.0
2 4.0
3 7.0
4 10.0
If we want the month
column, we can use reset_index()
.
df.groupby('month').nth(0).reset_index()
As expected, this will restore the month
column.
month value
0 1 1.0
1 2 4.0
2 3 7.0
3 4 10.0
Some sources suggest using
df.groupby('month').first()
, but it will return the first value in that column that is notNaN
. On the other hand,nth(0)
will return the value row value, even if it’sNaN
, which is “usually” the desired use case.
Get first n
rows of each group
We can use head()
to get the first n
rows of each group.
df.groupby('month').head(2)
This will give us something like this.
month value
0 1 1.0
0 1 2.0
1 2 4.0
1 2 5.0
2 3 7.0
2 3 8.0
3 4 10.0
3 4 11.0
Get the nth
row of each group
What if we just wanted the third row of each group?
df.groupby('month').nth(2)
We just need to remember that the nth
parameter is zero-indexed.