How to Drop First n Rows of a Column Group in a Pandas DataFrame


How can we drop any number of rows of a column group in a Pandas DataFrame?

Example Scenario

Suppose we’re dealing with a DataFrame with a month and value column.

     month  value
0    1      1.0
1    1      2.0
2    1      3.0
3    2      4.0
4    2      5.0
5    2      6.0
6    3      7.0
7    3      8.0
8    3      9.0
9    4      10.0
10   4      11.0
11   4      12.0

We want to get the first value of each unique month.

We can achieve this using groupby().

Get first row of each group

Let’s use nth(0) to get just the first row of each group.

df.groupby('month').nth(0)

This will give us an output that looks like this.

month  value
1      1.0
2      4.0
3      7.0
4      10.0

If we want the month column, we can use reset_index().

df.groupby('month').nth(0).reset_index()

As expected, this will restore the month column.

     month  value
0    1      1.0
1    2      4.0
2    3      7.0
3    4      10.0

Some sources suggest using df.groupby('month').first(), but it will return the first value in that column that is not NaN. On the other hand, nth(0) will return the value row value, even if it’s NaN, which is “usually” the desired use case.

Get first n rows of each group

We can use head() to get the first n rows of each group.

df.groupby('month').head(2)

This will give us something like this.

     month  value
0    1      1.0
0    1      2.0
1    2      4.0
1    2      5.0
2    3      7.0
2    3      8.0
3    4      10.0
3    4      11.0

Get the nth row of each group

What if we just wanted the third row of each group?

df.groupby('month').nth(2)

We just need to remember that the nth parameter is zero-indexed.