GroupBy Operations — Pandas vs Polars

Published:
Last updated:
By Jeferson Peter
Polars & Pandas

Imagine you have a dataset of sales and want to know the total revenue per product.
Both Pandas and Polars provide a convenient groupby method, though the syntax differs slightly.


Example data

import pandas as pd
import polars as pl

data = {"product": ["A", "A", "B", "B"], "sales": [10, 20, 30, 40]}
df_pd = pd.DataFrame(data)
df_pl = pl.DataFrame(data)

GroupBy in Pandas

result_pd = df_pd.groupby("product")["sales"].sum().reset_index()
print(result_pd)

#   product  sales
# 0       A     30
# 1       B     70

GroupBy in Polars

result_pl = df_pl.groupby("product").agg(pl.col("sales").sum())
print(result_pl)

# shape: (2, 2)
# ┌────────┬───────┐
# │ product┆ sales │
# │ ---    ┆ ---   │
# │ str    ┆ i64   │
# ╞════════╪═══════╡
# │ A      ┆ 30    │
# │ B      ┆ 70    │
# └────────┴───────┘

Conclusion

  • Pandas: groupby("col")["value"].sum() is the common pattern.
  • Polars: uses .groupby(...).agg(...) with column expressions.
  • Both are efficient for aggregating, but Polars often runs faster on large datasets.