How to Drop First n Rows of a Column Group in a Pandas DataFrame
How can we drop any number of rows of a column group in a Pandas DataFrame?
Example Scenario#
Suppose we’re dealing with a DataFrame with a month
and value
column.
month value
0 1 1.0
1 1 2.0
2 1 3.0
3 2 4.0
4 2 5.0
5 2 6.0
6 3 7.0
7 3 8.0
8 3 9.0
9 4 10.0
10 4 11.0
11 4 12.0
We want to get the first value of each unique month
.
We can achieve this using groupby()
.
Get first row of each group#
Let’s use nth(0)
to get just the first row of each group.
df.groupby('month').nth(0)
This will give us an output that looks like this.
month value
1 1.0
2 4.0
3 7.0
4 10.0
If we want the month
column, we can use reset_index()
.
df.groupby('month').nth(0).reset_index()
As expected, this will restore the month
column.
month value
0 1 1.0
1 2 4.0
2 3 7.0
3 4 10.0
Some sources suggest using
df.groupby('month').first()
, but it will return the first value in that column that is notNaN
. On the other hand,nth(0)
will return the value row value, even if it’sNaN
, which is “usually” the desired use case.
Get first n
rows of each group#
We can use head()
to get the first n
rows of each group.
df.groupby('month').head(2)
This will give us something like this.
month value
0 1 1.0
0 1 2.0
1 2 4.0
1 2 5.0
2 3 7.0
2 3 8.0
3 4 10.0
3 4 11.0
Get the nth
row of each group#
What if we just wanted the third row of each group?
df.groupby('month').nth(2)
We just need to remember that the nth
parameter is zero-indexed.