Implementing Custom Iterators and Iterables
When processing large datasets or fetching data from remote APIs, loading all data into memory at once is inefficient. Instead, a lazy-evaluation approach where data is fetched item-by-item is preferred. This can be achieved by implementing the iterator protocol. The following example defines a custom iterator that fetches data objects from a remote source only when requested.
import requests
from collections.abc import Iterable, Iterator
class DataStreamIterator(Iterator):
'''Iterator that fetches data items one by one.'''
def __init__(self, identifiers):
self.identifiers = identifiers
self.cursor = 0
def fetch_item(self, item_id):
'''Simulates fetching a single item from an API.'''
response = requests.get(f'https://api.service.com/data/{item_id}')
data = response.()
return f"{item_id}: {data['status']}"
def __next__(self):
if self.cursor >= len(self.identifiers):
raise StopIteration
current_id = self.identifiers[self.cursor]
self.cursor += 1
return self.fetch_item(current_id)
class DataStreamIterable(Iterable):
'''Iterable container that returns an iterator instance.'''
def __init__(self, identifiers):
self.identifiers = identifiers
def __iter__(self):
return DataStreamIterator(self.identifiers)
if __name__ == '__main__':
ids = ['user-01', 'user-02', 'user-03']
for item in DataStreamIterable(ids):
print(item)
Creating Iterable Classes with Generator Functions
Python allows the __iter__ method to be implemented as a generator function. This simplifies the code by removing the need to define a separate iterator class and manually track the iteration state. This is particularly useful for generating sequences based on specific numerical criteria.
class EvenNumbers:
def __init__(self, min_val, max_val):
self.min_val = min_val
self.max_val = max_val
def is_even(self, val):
return val % 2 == 0
def __iter__(self):
for num in range(self.min_val, self.max_val + 1):
if self.is_even(num):
yield num
if __name__ == '__main__':
for n in EvenNumbers(1, 10):
print(n)
Implementing Bidirectional Iteration
For custom sequence types, it is often necessary to iterate in both forward and reverse directions. The __reversed__ special method allows an object to define its behavior when used with the reversed() built-in function. The following class generates a range of floating-point numbers and supports both traversal directions.
class FloatInterval:
def __init__(self, start, end, step=0.1):
self.start = start
self.end = end
self.step = step
def __iter__(self):
current = self.start
while current <= self.end:
yield current
current += self.step
def __reversed__(self):
current = self.end
while current >= self.start:
yield current
current -= self.step
if __name__ == '__main__':
print("Forward:")
for x in FloatInterval(1.0, 3.0, 0.5):
print(x)
print("Reverse:")
for x in reversed(FloatInterval(1.0, 3.0, 0.5)):
print(x)
Slicing Iterators
Since iterators do not support indexing or the standard slicing syntax, itertools.islice is used to select a specific range of elements. This function creates a new iterator that consumes the original one lazily, making it memory-efficient for large streams.
from itertools import islice
# Simulate a file stream or large list
log_stream = range(1000)
# Get lines 50 to 60 (exclusive)
lines = islice(log_stream, 50, 60)
for line in lines:
print(line)
# Get the first 30 lines
head = islice(log_stream, 30)
# Skip 100 and get the rest
tail = islice(log_stream, 100, None)
Iterating Over Multiple Iterables
When working with related datasets stored in separate containers, you may need to iterate over them simultaneously (parallel) or sequentially (serial). Python provides zip for parallel iteration and itertools.chain for serial iteration.
Parallel Iteration with zip
from random import randint
# Student grades for different subjects
math_scores = [randint(60, 100) for _ in range(5)]
physics_scores = [randint(60, 100) for _ in range(5)]
chem_scores = [randint(60, 100) for _ in range(5)]
# Calculate total score for each student
averages = []
for m, p, c in zip(math_scores, physics_scores, chem_scores):
averages.append((m + p + c) / 3)
print(averages)
Serial Iteration with itertools.chain
from random import randint
from itertools import chain
# Scores from different class groups
group_a = [randint(50, 100) for _ in range(30)]
group_b = [randint(50, 100) for _ in range(32)]
group_c = [randint(50, 100) for _ in range(28)]
# Count high scores across all groups
high_score_count = 0
for score in chain(group_a, group_b, group_c):
if score >= 90:
high_score_count += 1
print(high_score_count)