Generators with yield: Lazy Iteration Made Simple
When working with large datasets, loading everything into memory is often unnecessary and inefficient. Python generators solve this by producing values one at a time instead of building entire collections upfront.
The key to this behavior is the yield keyword. It turns a normal function into a lazy iterator.
What Is a Generator?
A generator is a function that uses yield instead of return to produce a sequence of values over time.
Unlike lists, generators do not store all values in memory. They generate each value only when requested.
Consider this list comprehension:
nums = [n for n in range(1_000_000)]
This creates one million integers in memory immediately.
Now compare it to a generator expression:
nums = (n for n in range(1_000_000))
This produces numbers only when iterated.
How yield Works
When Python encounters yield, it:
- Returns the current value
- Pauses the function
- Saves its internal state
- Resumes execution on the next iteration
Example:
def count_up_to(n):
i = 1
while i <= n:
yield i
i += 1
for num in count_up_to(5):
print(num)
Each iteration resumes exactly where it stopped. No extra memory is allocated for unused values.
Why Generators Matter
The primary benefit is lazy evaluation. Data is generated only when needed.
This is particularly useful for:
- Processing large files
- Streaming database records
- Handling API responses
- Building data pipelines
In ETL workflows, generators allow each transformation step to consume and produce data progressively. This keeps memory stable and improves composability.
Chaining Generators
Generators can be composed to form pipelines:
def read_lines(file):
with open(file) as f:
for line in f:
yield line.strip()
def filter_lines(lines):
for line in lines:
if "ERROR" in line:
yield line
for log in filter_lines(read_lines("system.log")):
print(log)
Each function processes data lazily. Nothing is loaded entirely into memory.
Best Practices
- Use generators for large or infinite data sources
- Prefer generator expressions for simple transformations
- Avoid mixing complex state logic inside generators
- Remember that generators can only be iterated once
- Use tools like
next(),any(), andsum()effectively
Final Take
Generators are not just a memory optimization technique. They encourage a streaming mindset and modular data processing.
When you need controlled, incremental iteration, yield provides a clean and expressive solution.