import itertools
Complete Guide to Python’s itertools Module
Introduction
The itertools
module is one of Python’s most powerful standard library modules for creating iterators and performing functional programming operations. It provides a collection of tools for creating iterators that are building blocks for efficient loops and data processing pipelines.
The itertools
module provides three categories of iterators:
- Infinite iterators: Generate infinite sequences
- Finite iterators: Work with finite sequences
- Combinatorial iterators: Generate combinations and permutations
Other necessary imports
import math
Why Use itertools?
- Memory Efficient: Creates iterators that generate values on-demand
- Functional Programming: Enables elegant functional programming patterns
- Performance: Many operations are implemented in C for speed
- Composability: Functions can be easily combined to create complex iterations
Categories of itertools Functions
The itertools module is organized into three main categories:
- Infinite Iterators: Generate infinite sequences
- Finite Iterators: Terminate based on input sequences
- Combinatorial Iterators: Generate combinations and permutations
1. Infinite Iterators
count(start=0, step=1)
Creates an infinite arithmetic sequence starting from start
with increments of step
.
import itertools
# Basic counting
= itertools.count(1)
counter print(list(itertools.islice(counter, 5))) # [1, 2, 3, 4, 5]
# Counting with step
= itertools.count(0, 2)
counter print(list(itertools.islice(counter, 5))) # [0, 2, 4, 6, 8]
# Counting with floats
= itertools.count(0.5, 0.1)
counter print(list(itertools.islice(counter, 3))) # [0.5, 0.6, 0.7]
[1, 2, 3, 4, 5]
[0, 2, 4, 6, 8]
[0.5, 0.6, 0.7]
Use Case: Generating IDs, pagination, or any sequence that needs infinite counting.
cycle(iterable)
Infinitely repeats the elements of an iterable.
= itertools.cycle(['red', 'green', 'blue'])
colors print(list(itertools.islice(colors, 8)))
# ['red', 'green', 'blue', 'red', 'green', 'blue', 'red', 'green']
# Practical example: Round-robin assignment
= ['task1', 'task2', 'task3', 'task4']
tasks = itertools.cycle(['Alice', 'Bob', 'Charlie'])
workers
= list(zip(tasks, workers))
assignments print(assignments)
# [('task1', 'Alice'), ('task2', 'Bob'), ('task3', 'Charlie'), ('task4', 'Alice')]
['red', 'green', 'blue', 'red', 'green', 'blue', 'red', 'green']
[('task1', 'Alice'), ('task2', 'Bob'), ('task3', 'Charlie'), ('task4', 'Alice')]
repeat(object, times=None)
Repeats an object either infinitely or a specified number of times.
# Infinite repeat
= itertools.repeat(1)
ones print(list(itertools.islice(ones, 5))) # [1, 1, 1, 1, 1]
# Finite repeat
= itertools.repeat(0, 3)
zeros print(list(zeros)) # [0, 0, 0]
# Practical example: Creating default values
= {'debug': False, 'timeout': 30}
default_config = list(itertools.repeat(default_config, 5))
configs print(len(configs)) # 5
[1, 1, 1, 1, 1]
[0, 0, 0]
5
2. Finite Iterators
accumulate(iterable, func=operator.add, initial=None)
Returns running totals or results of binary functions.
import operator
# Running sum (default)
= [1, 2, 3, 4, 5]
numbers print(list(itertools.accumulate(numbers))) # [1, 3, 6, 10, 15]
# Running product
print(list(itertools.accumulate(numbers, operator.mul))) # [1, 2, 6, 24, 120]
# Running maximum
print(list(itertools.accumulate([3, 1, 4, 1, 5], max))) # [3, 3, 4, 4, 5]
# With initial value (Python 3.8+)
print(list(itertools.accumulate([1, 2, 3], initial=100))) # [100, 101, 103, 106]
[1, 3, 6, 10, 15]
[1, 2, 6, 24, 120]
[3, 3, 4, 4, 5]
[100, 101, 103, 106]
chain(*iterables)
Flattens multiple iterables into a single sequence.
# Basic chaining
= [1, 2, 3]
list1 = [4, 5, 6]
list2 = [7, 8, 9]
list3
= itertools.chain(list1, list2, list3)
chained print(list(chained)) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Chain from iterable
= [[1, 2], [3, 4], [5, 6]]
nested_lists = itertools.chain.from_iterable(nested_lists)
flattened print(list(flattened)) # [1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6]
compress(data, selectors)
Filters data based on corresponding boolean values in selectors.
= ['A', 'B', 'C', 'D', 'E']
data = [1, 0, 1, 0, 1]
selectors
= itertools.compress(data, selectors)
filtered print(list(filtered)) # ['A', 'C', 'E']
# Practical example: Filtering based on conditions
= ['Alice', 'Bob', 'Charlie', 'David']
names = [25, 17, 30, 16]
ages = [age >= 18 for age in ages]
adults
= itertools.compress(names, adults)
adult_names print(list(adult_names)) # ['Alice', 'Charlie']
['A', 'C', 'E']
['Alice', 'Charlie']
dropwhile(predicate, iterable)
Drops elements from the beginning while predicate is true.
= [1, 3, 5, 8, 9, 10, 12]
numbers = itertools.dropwhile(lambda x: x < 8, numbers)
result print(list(result)) # [8, 9, 10, 12]
# Practical example: Skip header lines
= ['# Comment', '# Another comment', 'data1', 'data2', '# inline comment']
lines = itertools.dropwhile(lambda line: line.startswith('#'), lines)
data_lines print(list(data_lines)) # ['data1', 'data2', '# inline comment']
# Practical example: Processing log entries
= [
log_entries "INFO: Starting application",
"DEBUG: Loading config",
"ERROR: Database connection failed",
"INFO: Retrying connection",
"INFO: Connection successful"
]
# Skip INFO messages at the beginning
= itertools.dropwhile(
important_logs lambda x: x.startswith("INFO"), log_entries
)print(list(important_logs))
[8, 9, 10, 12]
['data1', 'data2', '# inline comment']
['DEBUG: Loading config', 'ERROR: Database connection failed', 'INFO: Retrying connection', 'INFO: Connection successful']
takewhile(predicate, iterable)
Returns elements from the beginning while predicate is true.
= [1, 3, 5, 8, 9, 10, 12]
numbers = itertools.takewhile(lambda x: x < 8, numbers)
result print(list(result)) # [1, 3, 5]
# Practical example: Read until delimiter
= ['apple', 'banana', 'STOP', 'cherry', 'date']
data = itertools.takewhile(lambda x: x != 'STOP', data)
before_stop print(list(before_stop)) # ['apple', 'banana']
[1, 3, 5]
['apple', 'banana']
filterfalse(predicate, iterable)
Returns elements where predicate is false (opposite of filter).
= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
numbers = itertools.filterfalse(lambda x: x % 2 == 0, numbers)
odds print(list(odds)) # [1, 3, 5, 7, 9]
# Compare with regular filter
= filter(lambda x: x % 2 == 0, numbers)
evens print(list(evens)) # [2, 4, 6, 8, 10]
[1, 3, 5, 7, 9]
[2, 4, 6, 8, 10]
groupby(iterable, key=None)
Groups consecutive elements by a key function.
# Basic grouping
= [1, 1, 2, 2, 2, 3, 1, 1]
data = itertools.groupby(data)
grouped
for key, group in grouped:
print(f"{key}: {list(group)}")
# 1: [1, 1]
# 2: [2, 2, 2]
# 3: [3]
# 1: [1, 1]
# Grouping with key function
= ['apple', 'banana', 'apricot', 'blueberry', 'cherry']
words # First sort by first letter, then group
= sorted(words, key=lambda x: x[0])
sorted_words = itertools.groupby(sorted_words, key=lambda x: x[0])
grouped_words
for letter, group in grouped_words:
print(f"{letter}: {list(group)}")
# a: ['apple', 'apricot']
# b: ['banana', 'blueberry']
# c: ['cherry']
# Grouping sorted data
= [
students 'Alice', 'A'),
('Bob', 'B'),
('Charlie', 'A'),
('David', 'B'),
('Eve', 'A')
(
]# Sort first, then group
= sorted(students, key=lambda x: x[1])
students_sorted = itertools.groupby(students_sorted, key=lambda x: x[1])
by_grade for grade, group in by_grade:
= [student[0] for student in group]
names print(f"Grade {grade}: {names}")
1: [1, 1]
2: [2, 2, 2]
3: [3]
1: [1, 1]
a: ['apple', 'apricot']
b: ['banana', 'blueberry']
c: ['cherry']
Grade A: ['Alice', 'Charlie', 'Eve']
Grade B: ['Bob', 'David']
islice(iterable, start, stop, step)
Returns selected elements from the iterable (like list slicing but for iterators).
= range(20)
numbers
# islice(iterable, stop)
print(list(itertools.islice(numbers, 5))) # [0, 1, 2, 3, 4]
# islice(iterable, start, stop)
print(list(itertools.islice(numbers, 5, 10))) # [5, 6, 7, 8, 9]
# islice(iterable, start, stop, step)
print(list(itertools.islice(numbers, 0, 10, 2))) # [0, 2, 4, 6, 8]
# Practical example: Pagination
def paginate(iterable, page_size):
= iter(iterable)
iterator while True:
= list(itertools.islice(iterator, page_size))
page if not page:
break
yield page
= range(25)
data for page_num, page in enumerate(paginate(data, 10), 1):
print(f"Page {page_num}: {page}")
[0, 1, 2, 3, 4]
[5, 6, 7, 8, 9]
[0, 2, 4, 6, 8]
Page 1: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Page 2: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Page 3: [20, 21, 22, 23, 24]
starmap(function, iterable)
Applies function to arguments unpacked from each item in iterable.
# Basic usage
= [(1, 2), (3, 4), (5, 6)]
points = itertools.starmap(lambda x, y: (x**2 + y**2)**0.5, points)
distances print(list(distances)) # [2.236..., 5.0, 7.810...]
# Practical example: Multiple argument functions
import operator
= [(2, 3), (4, 5), (6, 7)]
pairs = itertools.starmap(operator.mul, pairs)
products print(list(products)) # [6, 20, 42]
# Compare with map
= map(operator.mul, [2, 4, 6], [3, 5, 7])
regular_map print(list(regular_map)) # [6, 20, 42]
# Compare with map
# map passes each tuple as a single argument
# starmap unpacks each tuple as separate arguments
def add(x, y):
return x + y
= [(1, 2), (3, 4), (5, 6)]
pairs = list(itertools.starmap(add, pairs))
result print(result) # [3, 7, 11]
# Practical example: Applying operations to coordinate pairs
= [(1, 2), (3, 4), (5, 6)]
coordinates = list(itertools.starmap(
distances_from_origin lambda x, y: math.sqrt(x**2 + y**2), coordinates
))print(distances_from_origin)
[2.23606797749979, 5.0, 7.810249675906654]
[6, 20, 42]
[6, 20, 42]
[3, 7, 11]
[2.23606797749979, 5.0, 7.810249675906654]
tee(iterable, n=2)
Splits an iterable into n independent iterators.
= [1, 2, 3, 4, 5]
data = itertools.tee(data)
iter1, iter2
print(list(iter1)) # [1, 2, 3, 4, 5]
print(list(iter2)) # [1, 2, 3, 4, 5]
# Practical example: Processing data in multiple ways
= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
numbers = itertools.tee(numbers)
evens_iter, odds_iter
= filter(lambda x: x % 2 == 0, evens_iter)
evens = filter(lambda x: x % 2 == 1, odds_iter)
odds
print(f"Evens: {list(evens)}") # [2, 4, 6, 8, 10]
print(f"Odds: {list(odds)}") # [1, 3, 5, 7, 9]
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
Evens: [2, 4, 6, 8, 10]
Odds: [1, 3, 5, 7, 9]
zip_longest(*iterables, fillvalue=None)
Zips iterables but continues until the longest is exhausted.
= [1, 2, 3]
list1 = ['a', 'b', 'c', 'd', 'e']
list2
# Regular zip stops at shortest
print(list(zip(list1, list2))) # [(1, 'a'), (2, 'b'), (3, 'c')]
# zip_longest continues to longest
print(list(itertools.zip_longest(list1, list2)))
# [(1, 'a'), (2, 'b'), (3, 'c'), (None, 'd'), (None, 'e')]
# With custom fillvalue
print(list(itertools.zip_longest(list1, list2, fillvalue='X')))
# [(1, 'a'), (2, 'b'), (3, 'c'), ('X', 'd'), ('X', 'e')]
[(1, 'a'), (2, 'b'), (3, 'c')]
[(1, 'a'), (2, 'b'), (3, 'c'), (None, 'd'), (None, 'e')]
[(1, 'a'), (2, 'b'), (3, 'c'), ('X', 'd'), ('X', 'e')]
3. Combinatorial Iterators
product(*iterables, repeat=1)
Cartesian product of input iterables.
# Basic product
= ['red', 'blue']
colors = ['S', 'M', 'L']
sizes
= itertools.product(colors, sizes)
combinations print(list(combinations))
# [('red', 'S'), ('red', 'M'), ('red', 'L'), ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]
# With repeat
= itertools.product(range(1, 7), repeat=2)
dice_rolls print(list(itertools.islice(dice_rolls, 10)))
# [(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4)]
# Practical example: Grid coordinates
= itertools.product(range(3), range(3))
grid print(list(grid))
# [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
[('red', 'S'), ('red', 'M'), ('red', 'L'), ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]
[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
permutations(iterable, r=None)
Returns r-length permutations of elements.
# All permutations
= ['A', 'B', 'C']
letters = itertools.permutations(letters)
perms print(list(perms))
# [('A', 'B', 'C'), ('A', 'C', 'B'), ('B', 'A', 'C'), ('B', 'C', 'A'), ('C', 'A', 'B'), ('C', 'B', 'A')]
# r-length permutations
= itertools.permutations(letters, 2)
perms_2 print(list(perms_2))
# [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
# Practical example: Anagrams
def find_anagrams(word, length=None):
if length is None:
= len(word)
length return [''.join(p) for p in itertools.permutations(word, length)]
print(find_anagrams('CAT', 2)) # ['CA', 'CT', 'AC', 'AT', 'TC', 'TA']
[('A', 'B', 'C'), ('A', 'C', 'B'), ('B', 'A', 'C'), ('B', 'C', 'A'), ('C', 'A', 'B'), ('C', 'B', 'A')]
[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
['CA', 'CT', 'AC', 'AT', 'TC', 'TA']
combinations(iterable, r)
Returns r-length combinations without replacement.
# Basic combinations
= [1, 2, 3, 4]
numbers = itertools.combinations(numbers, 2)
combos print(list(combos))
# [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
# Practical example: Team selection
= ['Alice', 'Bob', 'Charlie', 'David', 'Eve']
players = itertools.combinations(players, 3)
teams print(list(itertools.islice(teams, 5)))
# [('Alice', 'Bob', 'Charlie'), ('Alice', 'Bob', 'David'), ('Alice', 'Bob', 'Eve'), ('Alice', 'Charlie', 'David'), ('Alice', 'Charlie', 'Eve')]
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
[('Alice', 'Bob', 'Charlie'), ('Alice', 'Bob', 'David'), ('Alice', 'Bob', 'Eve'), ('Alice', 'Charlie', 'David'), ('Alice', 'Charlie', 'Eve')]
combinations_with_replacement(iterable, r)
Returns r-length combinations with replacement allowed.
# Basic combinations with replacement
= [1, 2, 3]
numbers = itertools.combinations_with_replacement(numbers, 2)
combos print(list(combos))
# [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
# Practical example: Coin flips allowing same outcome
= ['H', 'T']
outcomes = itertools.combinations_with_replacement(outcomes, 2)
two_flips print(list(two_flips))
# [('H', 'H'), ('H', 'T'), ('T', 'T')]
[(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
[('H', 'H'), ('H', 'T'), ('T', 'T')]
Grouping and Filtering
Advanced groupby() Examples
# Group by multiple criteria
= [
data 'name': 'Alice', 'age': 25, 'city': 'New York'},
{'name': 'Bob', 'age': 25, 'city': 'New York'},
{'name': 'Charlie', 'age': 30, 'city': 'Boston'},
{'name': 'David', 'age': 30, 'city': 'Boston'},
{'name': 'Eve', 'age': 25, 'city': 'Boston'}
{
]
# Group by age and city
= lambda x: (x['age'], x['city'])
key_func = sorted(data, key=key_func)
sorted_data for key, group in itertools.groupby(sorted_data, key=key_func):
= key
age, city = [person['name'] for person in group]
names print(f"Age {age}, City {city}: {names}")
Age 25, City Boston: ['Eve']
Age 25, City New York: ['Alice', 'Bob']
Age 30, City Boston: ['Charlie', 'David']
Custom Filtering Patterns
# Filter consecutive duplicates
def remove_consecutive_duplicates(iterable):
return [key for key, _ in itertools.groupby(iterable)]
= [1, 1, 2, 2, 2, 3, 1, 1, 1, 4]
data = remove_consecutive_duplicates(data)
result print(result) # [1, 2, 3, 1, 4]
# Filter with multiple conditions
= range(1, 21)
numbers # Even numbers not divisible by 4
= itertools.filterfalse(
filtered lambda x: x % 2 != 0 or x % 4 == 0, numbers
)print(list(filtered)) # [2, 6, 10, 14, 18]
[1, 2, 3, 1, 4]
[2, 6, 10, 14, 18]
Advanced Patterns and Recipes
Recipe: Flatten Nested Iterables
def flatten(nested_iterable):
"""Flatten one level of nesting."""
return itertools.chain.from_iterable(nested_iterable)
# Usage
= [[1, 2], [3, 4], [5, 6]]
nested = list(flatten(nested))
flat print(flat) # [1, 2, 3, 4, 5, 6]
def deep_flatten(nested_iterable):
"""Recursively flatten deeply nested iterables."""
for item in nested_iterable:
if hasattr(item, '__iter__') and not isinstance(item, (str, bytes)):
yield from deep_flatten(item)
else:
yield item
# Usage
= [1, [2, [3, 4]], 5, [6, [7, [8, 9]]]]
deeply_nested = list(deep_flatten(deeply_nested))
flat print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Recipe: Sliding Window
def sliding_window(iterable, n):
"""Create a sliding window of size n."""
= itertools.tee(iterable, n)
iterators for i, it in enumerate(iterators):
# Advance each iterator by i positions
for _ in range(i):
next(it, None)
return zip(*iterators)
# Usage
= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data = list(sliding_window(data, 3))
windows print(windows) # [(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9, 10)]
[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9, 10)]
Recipe: Roundrobin
def roundrobin(*iterables):
"""Take elements from iterables in round-robin fashion."""
= [iter(it) for it in iterables]
iterators while iterators:
for it in iterators[:]:
try:
yield next(it)
except StopIteration:
iterators.remove(it)
# Usage
= list(roundrobin('ABC', '12345', 'xyz'))
result print(result) # ['A', '1', 'x', 'B', '2', 'y', 'C', '3', 'z', '4', '5']
['A', '1', 'x', 'B', '2', 'y', 'C', '3', 'z', '4', '5']
Recipe: Unique Elements (Preserving Order)
def unique_everseen(iterable, key=None):
"""List unique elements, preserving order."""
= set()
seen = seen.add
seen_add if key is None:
for element in itertools.filterfalse(seen.__contains__, iterable):
seen_add(element)yield element
else:
for element in iterable:
= key(element)
k if k not in seen:
seen_add(k)yield element
# Usage
= [1, 2, 3, 2, 4, 1, 5, 3, 6]
data = list(unique_everseen(data))
unique print(unique) # [1, 2, 3, 4, 5, 6]
# With key function
= ['apple', 'Banana', 'cherry', 'Apple', 'banana']
words = list(unique_everseen(words, key=str.lower))
unique_words print(unique_words) # ['apple', 'Banana', 'cherry']
[1, 2, 3, 4, 5, 6]
['apple', 'Banana', 'cherry']
Practical Examples and Use Cases
1. Data Processing Pipeline
import itertools
import operator
# Sample data
= [
sales_data 'Q1', 'Product A', 100),
('Q1', 'Product B', 150),
('Q2', 'Product A', 120),
('Q2', 'Product B', 180),
('Q3', 'Product A', 110),
('Q3', 'Product B', 160),
(
]
# Group by quarter and calculate totals
= itertools.groupby(sales_data, key=lambda x: x[0])
sales_by_quarter
for quarter, sales in sales_by_quarter:
= sum(sale[2] for sale in sales)
total print(f"{quarter}: {total}")
Q1: 250
Q2: 300
Q3: 270
2. Batch Processing
def batch_process(iterable, batch_size):
"""Process items in batches"""
= iter(iterable)
iterator while True:
= list(itertools.islice(iterator, batch_size))
batch if not batch:
break
yield batch
# Example usage
= range(25)
data for batch in batch_process(data, 10):
print(f"Processing batch: {batch}")
Processing batch: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Processing batch: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Processing batch: [20, 21, 22, 23, 24]
3. Round-Robin Scheduler
def round_robin_scheduler(tasks, workers):
"""Distribute tasks among workers in round-robin fashion"""
= itertools.cycle(workers)
worker_cycle return list(zip(tasks, worker_cycle))
= ['task1', 'task2', 'task3', 'task4', 'task5']
tasks = ['Alice', 'Bob', 'Charlie']
workers
= round_robin_scheduler(tasks, workers)
schedule for task, worker in schedule:
print(f"{task} -> {worker}")
task1 -> Alice
task2 -> Bob
task3 -> Charlie
task4 -> Alice
task5 -> Bob
4. Sliding Window
def sliding_window(iterable, window_size):
"""Create sliding window of specified size"""
= itertools.tee(iterable, window_size)
iterators = [itertools.islice(iterator, i, None)
iterators for i, iterator in enumerate(iterators)]
return zip(*iterators)
# Example usage
= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data = sliding_window(data, 3)
windows for window in windows:
print(window)
# (1, 2, 3)
# (2, 3, 4)
# (3, 4, 5)
# ...
(1, 2, 3)
(2, 3, 4)
(3, 4, 5)
(4, 5, 6)
(5, 6, 7)
(6, 7, 8)
(7, 8, 9)
(8, 9, 10)
5. Pairwise Iteration
def pairwise(iterable):
"""Return successive overlapping pairs"""
= itertools.tee(iterable)
a, b next(b, None)
return zip(a, b)
# Example usage
= [1, 2, 3, 4, 5]
numbers = pairwise(numbers)
pairs for pair in pairs:
print(pair)
# (1, 2)
# (2, 3)
# (3, 4)
# (4, 5)
(1, 2)
(2, 3)
(3, 4)
(4, 5)
Performance Tips
1. Memory Efficiency
# Bad: Creates entire list in memory
= list(range(1000000))
large_range = [x**2 for x in large_range]
squared
# Good: Uses iterators
= range(1000000)
large_range = map(lambda x: x**2, large_range) squared
2. Lazy Evaluation
# Itertools functions are lazy - they don't compute until needed
= range(1000000)
data = itertools.filterfalse(lambda x: x % 2 == 0, data)
filtered # No computation happens here yet
# Only compute what you need
= list(itertools.islice(filtered, 10)) first_10_odds
3. Chaining Operations
# Chain multiple itertools operations for complex processing
= range(100)
data = itertools.takewhile(
result lambda x: x < 50,
itertools.filterfalse(lambda x: x % 3 == 0,
itertools.accumulate(data)
) )
Common Patterns and Recipes
1. Flatten Nested Iterables
def flatten(nested_iterable):
"""Completely flatten a nested iterable"""
for item in nested_iterable:
if hasattr(item, '__iter__') and not isinstance(item, (str, bytes)):
yield from flatten(item)
else:
yield item
# Example
= [1, [2, 3], [4, [5, 6]], 7]
nested print(list(flatten(nested))) # [1, 2, 3, 4, 5, 6, 7]
[1, 2, 3, 4, 5, 6, 7]
2. Unique Elements (Preserving Order)
def unique_everseen(iterable, key=None):
"""List unique elements, preserving order"""
= set()
seen = seen.add
seen_add if key is None:
for element in itertools.filterfalse(seen.__contains__, iterable):
seen_add(element)yield element
else:
for element in iterable:
= key(element)
k if k not in seen:
seen_add(k)yield element
# Example
= [1, 2, 3, 2, 1, 4, 3, 5]
data print(list(unique_everseen(data))) # [1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
3. Consume Iterator
def consume(iterator, n=None):
"""Advance the iterator n-steps ahead. If n is None, consume entirely."""
if n is None:
# feed the entire iterator into a zero-length deque
=0)
collections.deque(iterator, maxlenelse:
# advance to the empty slice starting at position n
next(itertools.islice(iterator, n, n), None)
Real-World Examples
Example 1: Data Processing Pipeline
# Processing CSV-like data
def process_sales_data(data):
"""Process sales data with itertools."""
# Filter out header and empty lines
= itertools.filterfalse(
clean_data lambda x: x.startswith('Date') or not x.strip(),
data
)
# Parse each line
= (line.split(',') for line in clean_data)
parsed
# Group by month
= itertools.groupby(
by_month sorted(parsed, key=lambda x: x[0][:7]), # Sort by year-month
=lambda x: x[0][:7]
key
)
# Calculate monthly totals
= {}
monthly_totals for month, sales in by_month:
= sum(float(sale[2]) for sale in sales)
total = total
monthly_totals[month]
return monthly_totals
# Sample data
= [
sales_data "Date,Product,Amount",
"2023-01-15,Widget,100.50",
"2023-01-20,Gadget,75.25",
"2023-02-10,Widget,120.00",
"2023-02-15,Gadget,85.75",
"",
"2023-01-25,Widget,95.00"
]
= process_sales_data(sales_data)
result print(result)
Example 2: Configuration Generator
# Generate all possible configurations
def generate_configurations(options):
"""Generate all possible configuration combinations."""
= list(options.keys())
keys = list(options.values())
values
for combo in itertools.product(*values):
yield dict(zip(keys, combo))
# Usage
= {
server_options 'cpu': ['2-core', '4-core', '8-core'],
'memory': ['4GB', '8GB', '16GB'],
'storage': ['SSD', 'HDD'],
'os': ['Linux', 'Windows']
}
= list(generate_configurations(server_options))
configs print(f"Total configurations: {len(configs)}")
for config in configs[:3]: # Show first 3
print(config)
Example 3: Batch Processing
def batch_process(items, batch_size, process_func):
"""Process items in batches."""
= iter(items)
iterator while True:
= list(itertools.islice(iterator, batch_size))
batch if not batch:
break
yield process_func(batch)
def sum_batch(batch):
return sum(batch)
# Usage
= range(1000)
large_numbers = list(batch_process(large_numbers, 100, sum_batch))
batch_sums print(f"Batch sums: {batch_sums[:5]}...") # Show first 5 batch sums
Best Practices
- Use itertools for memory-efficient processing: When working with large datasets, itertools can help avoid loading everything into memory.
- Combine with other functional programming tools: itertools works well with
map()
,filter()
, andfunctools.reduce()
. - Remember lazy evaluation: Most itertools functions return iterators, not lists. Use
list()
when you need to materialize the results. - Profile your code: While itertools is generally efficient, measure performance for your specific use case.
- Consider readability: Sometimes a simple loop is clearer than a complex itertools chain.
- Use type hints: When writing functions that use itertools, consider adding type hints for better code documentation.
- Sort before grouping:
groupby()
only groups consecutive identical elements, so sort your data first if needed. - Use
tee()
carefully: Each iterator fromtee()
maintains its own internal buffer, which can consume significant memory if iterators advance at different rates. - Profile your code: For performance-critical applications, measure whether itertools or other approaches (like NumPy) are faster for your specific use case.
Conclusion
The itertools module provides powerful tools for creating efficient, memory-friendly iterators. By mastering these functions, you can write more elegant and performant Python code, especially when dealing with large datasets or complex iteration patterns. The key is understanding when and how to use each function effectively in your specific use cases.
Remember that itertools excels at functional programming patterns and can often replace complex loops with more readable and efficient iterator chains. Practice with these examples and experiment with combining different itertools functions to solve your specific problems.