GroupBy Operations — Pandas vs Polars
Published:
• Last updated:
• By Jeferson Peter
Polars & Pandas
Imagine you have a dataset of sales and want to know the total revenue per product.
Both Pandas and Polars provide a convenientgroupby
method, though the syntax differs slightly.
Example data
import pandas as pd
import polars as pl
data = {"product": ["A", "A", "B", "B"], "sales": [10, 20, 30, 40]}
df_pd = pd.DataFrame(data)
df_pl = pl.DataFrame(data)
GroupBy in Pandas
result_pd = df_pd.groupby("product")["sales"].sum().reset_index()
print(result_pd)
# product sales
# 0 A 30
# 1 B 70
GroupBy in Polars
result_pl = df_pl.groupby("product").agg(pl.col("sales").sum())
print(result_pl)
# shape: (2, 2)
# ┌────────┬───────┐
# │ product┆ sales │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞════════╪═══════╡
# │ A ┆ 30 │
# │ B ┆ 70 │
# └────────┴───────┘
Conclusion
- Pandas:
groupby("col")["value"].sum()
is the common pattern. - Polars: uses
.groupby(...).agg(...)
with column expressions. - Both are efficient for aggregating, but Polars often runs faster on large datasets.