Mastering Python Iteration Protocols: Iterators and Generators

The Python Iteration Protocol

In Python, iteration protocols define how sequences are traversed. There is a distinct difference between an iterable object and an iterator object.

  • Iterable: An object that can return an iterator via its __iter__ method. Examples include lists, tuples, and dictionaries.
  • Iterator: An object that represents a stream of data. It must implement both __iter__ (returning itself) and __next__ (returning the next item or raising StopIteration). Iterators utilize lazy evaluation, generating values only when requested.
from collections.abc import Iterable, Iterator

dataset = [10, 20, 30]
print(isinstance(dataset, Iterable)) # True
print(isinstance(dataset, Iterator)) # False

To convert an iterable into an explicit iterator, pass it to the built-in iter() functon:

source = ['apple', 'banana']
iterator_obj = iter(source)

Building a Custom Iterator

While built-in types work out of the box, you may need to create custom classes to handle specific traversal logic. Below is a class that iterates through a collection of IDs sequentially.

class RecordStream:
    def __init__(self, records):
        self._records = records
        self._cursor = 0

    def __iter__(self):
        """Must return the iterator object itself."""
        return self

    def __next__(self):
        """Return the next item or raise StopIteration."""
        try:
            current_record = self._records[self._cursor]
        except IndexError:
            raise StopIteration
        self._cursor += 1
        return current_record

# Usage
if __name__ == "__main__":
    user_ids = RecordStream(["user_01", "user_02", "user_03"])
    
    print(next(user_ids))
    for record in user_ids:
        print(record)

Generator Functions

A simpler alternative to writing a full class is using a generator function. By including the yield keyword, the function pauses execution upon each yield and resumes from where it left off on the next call.

def count_sequence(limit):
    for i in range(limit):
        yield i

generator_instance = count_sequence(3)
print(next(generator_instance)) 
for value in generator_instance:
    print(value)

Optimizing Recursive Algorithms with Generators

Classic recursive implementations of mathematical sequences, such as Fibonacci numbers, can consume significant stack memory. Generators offer a memory-efficient way to compute these streams.

# Standard recursive approach (uses call stack)
def get_fib_recursive(n):
    if n <= 2:
        return 1
    return get_fib_recursive(n - 1) + get_fib_recursive(n - 2)

Using a generator avoids deep recursion:

def generate_fibonacci(count):
    prev, curr = 0, 1
    num = 0
    while num < count:
        yield curr
        prev, curr = curr, prev + curr
        num += 1

# Stream the first 10 numbers
fib_stream = generate_fibonacci(10)
for val in fib_stream:
    print(val)

Procesing Large Files Efficiently

Reading massive datasets (e.g., 500GB) line-by-line is standard practice. However, if data lacks standard delimiters like newlines but relies on custom markers, standard iteration fails. A generator can parse chunks into logical units based on a custom separator.

def custom_chunk_reader(file_path, delimiter):
    buffer = ""
    while True:
        while delimiter in buffer:
            split_index = buffer.find(delimiter)
            yield buffer[:split_index]
            buffer = buffer[split_index + len(delimiter):]
        
        chunk = ""
        try:
            chunk = file.read(8192)
        except Exception:
            break
            
        if not chunk:
            if buffer:
                yield buffer
            break
        buffer += chunk

with open("large_data.bin") as f:
    for segment in custom_chunk_reader(f, "{|}"):
        print(segment.strip())

This approach ansures the entire file is never loaded into memory, processing data lazily as chunks become available.

Tags: python iterator protocol generator functions memory optimization file processing

Posted on Mon, 15 Jun 2026 16:32:18 +0000 by l_evans