The Python Iteration Protocol
In Python, iteration protocols define how sequences are traversed. There is a distinct difference between an iterable object and an iterator object.
- Iterable: An object that can return an iterator via its
__iter__method. Examples include lists, tuples, and dictionaries. - Iterator: An object that represents a stream of data. It must implement both
__iter__(returning itself) and__next__(returning the next item or raisingStopIteration). Iterators utilize lazy evaluation, generating values only when requested.
from collections.abc import Iterable, Iterator
dataset = [10, 20, 30]
print(isinstance(dataset, Iterable)) # True
print(isinstance(dataset, Iterator)) # False
To convert an iterable into an explicit iterator, pass it to the built-in iter() functon:
source = ['apple', 'banana']
iterator_obj = iter(source)
Building a Custom Iterator
While built-in types work out of the box, you may need to create custom classes to handle specific traversal logic. Below is a class that iterates through a collection of IDs sequentially.
class RecordStream:
def __init__(self, records):
self._records = records
self._cursor = 0
def __iter__(self):
"""Must return the iterator object itself."""
return self
def __next__(self):
"""Return the next item or raise StopIteration."""
try:
current_record = self._records[self._cursor]
except IndexError:
raise StopIteration
self._cursor += 1
return current_record
# Usage
if __name__ == "__main__":
user_ids = RecordStream(["user_01", "user_02", "user_03"])
print(next(user_ids))
for record in user_ids:
print(record)
Generator Functions
A simpler alternative to writing a full class is using a generator function. By including the yield keyword, the function pauses execution upon each yield and resumes from where it left off on the next call.
def count_sequence(limit):
for i in range(limit):
yield i
generator_instance = count_sequence(3)
print(next(generator_instance))
for value in generator_instance:
print(value)
Optimizing Recursive Algorithms with Generators
Classic recursive implementations of mathematical sequences, such as Fibonacci numbers, can consume significant stack memory. Generators offer a memory-efficient way to compute these streams.
# Standard recursive approach (uses call stack)
def get_fib_recursive(n):
if n <= 2:
return 1
return get_fib_recursive(n - 1) + get_fib_recursive(n - 2)
Using a generator avoids deep recursion:
def generate_fibonacci(count):
prev, curr = 0, 1
num = 0
while num < count:
yield curr
prev, curr = curr, prev + curr
num += 1
# Stream the first 10 numbers
fib_stream = generate_fibonacci(10)
for val in fib_stream:
print(val)
Procesing Large Files Efficiently
Reading massive datasets (e.g., 500GB) line-by-line is standard practice. However, if data lacks standard delimiters like newlines but relies on custom markers, standard iteration fails. A generator can parse chunks into logical units based on a custom separator.
def custom_chunk_reader(file_path, delimiter):
buffer = ""
while True:
while delimiter in buffer:
split_index = buffer.find(delimiter)
yield buffer[:split_index]
buffer = buffer[split_index + len(delimiter):]
chunk = ""
try:
chunk = file.read(8192)
except Exception:
break
if not chunk:
if buffer:
yield buffer
break
buffer += chunk
with open("large_data.bin") as f:
for segment in custom_chunk_reader(f, "{|}"):
print(segment.strip())
This approach ansures the entire file is never loaded into memory, processing data lazily as chunks become available.