Generators with yield: Lazy Iteration Made Simple

Published:
Last updated:
ByJeferson Peter
2 min read
Python
Share this post:

When working with large datasets, loading everything into memory is often unnecessary and inefficient. Python generators solve this by producing values one at a time instead of building entire collections upfront.

The key to this behavior is the yield keyword. It turns a normal function into a lazy iterator.


What Is a Generator?

A generator is a function that uses yield instead of return to produce a sequence of values over time.

Unlike lists, generators do not store all values in memory. They generate each value only when requested.

Consider this list comprehension:

nums = [n for n in range(1_000_000)]

This creates one million integers in memory immediately.

Now compare it to a generator expression:

nums = (n for n in range(1_000_000))

This produces numbers only when iterated.


How yield Works

When Python encounters yield, it:

  1. Returns the current value
  2. Pauses the function
  3. Saves its internal state
  4. Resumes execution on the next iteration

Example:

def count_up_to(n):
    i = 1
    while i <= n:
        yield i
        i += 1

for num in count_up_to(5):
    print(num)

Each iteration resumes exactly where it stopped. No extra memory is allocated for unused values.


Why Generators Matter

The primary benefit is lazy evaluation. Data is generated only when needed.

This is particularly useful for:

  • Processing large files
  • Streaming database records
  • Handling API responses
  • Building data pipelines

In ETL workflows, generators allow each transformation step to consume and produce data progressively. This keeps memory stable and improves composability.


Chaining Generators

Generators can be composed to form pipelines:

def read_lines(file):
    with open(file) as f:
        for line in f:
            yield line.strip()

def filter_lines(lines):
    for line in lines:
        if "ERROR" in line:
            yield line

for log in filter_lines(read_lines("system.log")):
    print(log)

Each function processes data lazily. Nothing is loaded entirely into memory.


Best Practices

  • Use generators for large or infinite data sources
  • Prefer generator expressions for simple transformations
  • Avoid mixing complex state logic inside generators
  • Remember that generators can only be iterated once
  • Use tools like next(), any(), and sum() effectively

Final Take

Generators are not just a memory optimization technique. They encourage a streaming mindset and modular data processing.

When you need controlled, incremental iteration, yield provides a clean and expressive solution.

Share this post: