Merge/Join Operations — Pandas vs Polars

Published:
Last updated:
By Jeferson Peter
Polars & Pandas

Imagine you have two datasets: one with customers and another with their orders.
To analyze them together, you’ll need to perform a join. Let’s see how Pandas and Polars handle this.


Example data

import pandas as pd
import polars as pl

customers = pd.DataFrame({"id": [1, 2], "name": ["Alice", "Bob"]})
orders = pd.DataFrame({"id": [1, 2], "amount": [100, 200]})

customers_pl = pl.DataFrame({"id": [1, 2], "name": ["Alice", "Bob"]})
orders_pl = pl.DataFrame({"id": [1, 2], "amount": [100, 200]})

Merge in Pandas

merged_pd = pd.merge(customers, orders, on="id")
print(merged_pd)

#    id   name  amount
# 0   1  Alice     100
# 1   2    Bob     200

Join in Polars

merged_pl = customers_pl.join(orders_pl, on="id")
print(merged_pl)

# shape: (2, 3)
# ┌─────┬───────┬────────┐
# │ id  ┆ name  ┆ amount │
# │ --- ┆ ---   ┆ ---    │
# │ i64 ┆ str   ┆ i64    │
# ╞═════╪═══════╪════════╡
# │ 1   ┆ Alice ┆ 100    │
# │ 2   ┆ Bob   ┆ 200    │
# └─────┴───────┴────────┘

Conclusion

  • Pandas: uses pd.merge() with many options (on, how, etc.).
  • Polars: uses .join() with similar parameters.
  • Both are flexible, but Polars joins are often faster on large data.