Merge/Join Operations — Pandas vs Polars
Published:
• Last updated:
• By Jeferson Peter
Polars & Pandas
Imagine you have two datasets: one with customers and another with their orders.
To analyze them together, you’ll need to perform a join. Let’s see how Pandas and Polars handle this.
Example data
import pandas as pd
import polars as pl
customers = pd.DataFrame({"id": [1, 2], "name": ["Alice", "Bob"]})
orders = pd.DataFrame({"id": [1, 2], "amount": [100, 200]})
customers_pl = pl.DataFrame({"id": [1, 2], "name": ["Alice", "Bob"]})
orders_pl = pl.DataFrame({"id": [1, 2], "amount": [100, 200]})
Merge in Pandas
merged_pd = pd.merge(customers, orders, on="id")
print(merged_pd)
# id name amount
# 0 1 Alice 100
# 1 2 Bob 200
Join in Polars
merged_pl = customers_pl.join(orders_pl, on="id")
print(merged_pl)
# shape: (2, 3)
# ┌─────┬───────┬────────┐
# │ id ┆ name ┆ amount │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ i64 │
# ╞═════╪═══════╪════════╡
# │ 1 ┆ Alice ┆ 100 │
# │ 2 ┆ Bob ┆ 200 │
# └─────┴───────┴────────┘
Conclusion
- Pandas: uses
pd.merge()
with many options (on
,how
, etc.). - Polars: uses
.join()
with similar parameters. - Both are flexible, but Polars joins are often faster on large data.