The Question Mark - blog by Mark Volkmann

Python collections module

Built-in Collection Classes

Python provides four collection classes that do not need to be imported to use.

Collection
Name
Mutable?Ordered?Allows
Duplicates?
Can
Index?
Syntax
listyesyesyesyes[e1, e2, ...]
tuplenoyesyesyes(e1, e2, ...)
setyesnonono{e1, e2, ...}
dictyesyesyesno{k1: v1, k2: v2, ...}

For details on the list and tuple classes, see here.

For details on the set class, see here.

For details on the dict class, see here.

collections Module Classes

The collections module in the Python standard library defines nine additional collection classes that must be imported to use.

Collection NameDescription
namedtupletuple subclass with named fields
dequelist-like container with fast appends and pops on both ends
Chainmapdict-like view of multiple mappings
Counterdict subclass for counting hashable values
OrderedDictdict subclass that remembers insertion order
defaultdictdict subclass that calls a function to get missing values
UserDictwrapper around a dict object for easier subclassing
UserListwrapper around a list object for easier subclassing
UserStringwrapper around a string object for easier subclassing

namedtuple

A namedtuple has all the features of a tuple, but differs in that each position has an associated name.

Here is an example of creating and using namedtuple instances:

from collections import namedtuple

# Define a namedtuple.
DogNT = namedtuple('Dog', 'breed, name')

# Create an instance with positional arguments.
dog1 = DogNT('Whippet', 'Comet')
print(dog1)

# Create an instance with named arguments.
dog2 = DogNT(name='Oscar', breed='German Shorthaired Pointer')
print(dog2)

# Access a field by name.
print(dog1.name) # Comet

# All methods from the tuple class are inherited.

# Get the name of a namedtuple instance.
print(type(dog1).__name__) # Dog

# Make a new instance from an iterable.
data = ['Native American Indian Dog', 'Ramsay']
dog3 = DogNT._make(data) # Why start with underscore?
print(dog3)

# Get a dict from a namedtuple.
print(dog1._asdict())

# Create a new instance from an existing one
# where some values are replaced.
print(dog2._replace(name='Oscar Wilde'))

# Get tuple of field names from an instance.
print(dog1._fields) # ('breed', 'name')

deque

A deque (pronounced “deck”) is a double-ended queue. Unlike the list class, elements can be efficiently appended to and popped from both ends.

Here is an example of creating and using deque instances:

from collections import deque

# Create an empty deque and then append values to both ends.
dogs = deque()
dogs.append('Oscar')
dogs.appendleft('Ramsay')
dogs.append('Comet')
dogs.appendleft('Maisey')
print(dogs) # deque(['Maisey', 'Ramsay', 'Oscar', 'Comet'])

# Create a deque with initial values.
dogs = deque(['Maisey', 'Ramsay', 'Oscar', 'Comet'])
print(dogs) # deque(['Maisey', 'Ramsay', 'Oscar', 'Comet'])

# Remove the first and last elements.
dogs.pop()
dogs.popleft();
print(dogs) # deque(['Ramsay', 'Oscar'])

# Add multiple elements at the beginning (left) and end (right).
dogs.extendleft(['Snoopy', 'Spot']) # added in reverse order
dogs.extend(['Speed', 'Trixie'])
print(dogs) # deque(['Spot', 'Snoopy', 'Ramsay', 'Oscar', 'Speed', 'Trixie'])

# Insert an element before a given index.
index = 3
dogs.insert(index, 'Reece')
print(dogs) # deque(['Spot', 'Snoopy', 'Ramsay', 'Reece', 'Oscar', 'Speed', 'Trixie'])

# Remove the element that was just inserted.
del dogs[index]
print(dogs) # deque(['Spot', 'Snoopy', 'Ramsay', 'Oscar', 'Speed', 'Trixie'])

# Remove all elements.
dogs.clear();
print(dogs) # deque([])

Chainmap

A ChainMap provides a view over multiple dict objects.

Here is an example of creating and using a ChainMap:

from collections import ChainMap

dict1 = {
    'Maisey': 'Treeing Walker Coonhound',
    'Ramsay': 'Native American Indian Dog'
}
dict2 = {
    'Oscar': 'German Shorthaired Pointer',
    'Comet': 'Whippet'
}

# Create a ChainMap that provides a view of the two dicts above.
cm = ChainMap(dict1, dict2)
print(cm) # ChainMap({'Maisey': 'Tree...', 'Ramsay': 'Native...'}, {'Oscar': 'German...', 'Comet': 'Whippet'})

# Get values of some keys from the ChainMap.
print(cm['Ramsay']) # Native American Indian Dog
print(cm['Comet']) # Whippet

# Modify one of the dicts and verify that the ChainMap sees the change.
dict2['Oscar'] = 'GSP'
print(cm['Oscar']) # GSP

# Add data to one of the dicts and verify that the ChainMap sees it.
dict1['Snoopy'] = 'Beagle'
print(cm['Snoopy']) # Beagle

# Get the dicts used by the ChainMap.
dicts = cm.maps # returns list containing dict1 and dict2
print(dicts)

# Add a key/value pair to the ChainMap
# which adds it to the first associated dict.
# It's better to add to associated dicts.
cm['Reece'] = 'Whippet'

# Remove a key/value pair from the ChainMap
# which only removes from the first associated dict
# and raises KeyError if not found there.
# It's better to remove from to associated dicts.
del cm['Reece']

Counter

A Counter is a special dict that counts occurrences of unique values in an iterable.

Here is an example of creating and using a Counter:

from collections import Counter

fruits = [
    'apple', 'banana', 'cherry', 'grape', 'orange', 'strawberry',
    'banana', 'apple', 'apple', 'grape', 'banana', 'banana'
]
counter = Counter(fruits)
print(counter)
# Counter({'banana': 4, 'apple': 3, 'grape': 2,
#          'cherry': 1, 'orange': 1, 'strawberry': 1})

# Get the number of occurrences of banana.
print(counter['banana']) # 4

# Remove all occurrences of grape.
del counter['grape']

# Get a list of all the elements in sorted order.
print(list(counter.elements()))
# ['apple', 'apple', 'apple', 'banana', 'banana', 'banana', 'banana',
#  'cherry', 'orange', 'strawberry']

# Get a list of tuples where each contains a unique value
# and its count (number of occurrences)
# where the tuples are sorted by descending count.
print(counter.most_common())
# [('banana', 4), ('apple', 3), ('cherry', 1), ('orange', 1), ('strawberry', 1)]

# Remove some occurrences.
counter.subtract({'banana': 2, 'strawberry': 1})
print(list(counter.elements()))
# ['apple', 'apple', 'apple', 'banana', 'banana', 'cherry', 'orange']
print(counter.most_common())
# [('apple', 3), ('banana', 2), ('cherry', 1), ('orange', 1), ('strawberry', 0)]

OrderedDict

An OrderedDict is a special dict that preserves the order in which keys were added. Starting in Python 3.5, the dict class also does this. One reason to continue using OrderedDict instead of dict when order must be preserved is that assuming dict objects will preserve order will cause code to break when run in versions of Python before 3.5.

Here is an example of creating and using an OrderedDict:

from collections import OrderedDict

# Create and populate an OrderedDict.
dogs = OrderedDict()
dogs['Maisey'] = 'Treeing Walker Coonhound'
dogs['Ramsay'] = 'Native American Indian Dog'
dogs['Oscar'] = 'German Shorthaired Pointer'
dogs['Comet'] = 'Whippet'

print(dogs)
# OrderedDict([
#  ('Maisey', 'Treeing Walker Coonhound'),
#  ('Ramsay', 'Native American Indian Dog'),
#  ('Oscar', 'German Shorthaired Pointer'),
#  ('Comet', 'Whippet')])

# Change the value of a key,
# noting that its position does not change.
dogs['Oscar'] = 'GSP'

# Move "Comet" to the beginning.
dogs.move_to_end('Comet', last=False)

# Move "Ramsay" to the end.
dogs.move_to_end('Ramsay')

print(dogs)
# OrderedDict([
#  ('Comet', 'Whippet'),
#  ('Maisey', 'Treeing Walker Coonhound'),
#  ('Oscar', 'GSP'),
#  ('Ramsay', 'Native American Indian Dog')])

# Remove the first key/value pair.
dogs.popitem(last=False)

# Remove the last key/value pair.
dogs.popitem()

print(dogs)
# OrderedDict([
#  ('Maisey', 'Treeing Walker Coonhound'),
#  ('Oscar', 'GSP')])

defaultdict

A defaultdict is a special dict that can provide a value for missing keys rather than raise a KeyError. This is useful because it enables writing code that doesn’t need to check for missing values.

The defaultdict function is passed a function that must return the default value to use for missing keys. This can be a constructor function from a built-in class, a custom class, a lambda function, or a normal function. Unfortunately the function is not passed the key that is missing, so that cannot be used to determine the value to return.

# Each of the examples below indicate
# the value that is used for missing keys.
my_dict = defaultdict(bool) # False
my_dict = defaultdict(int) # 0
my_dict = defaultdict(float) # 0.0
my_dict = defaultdict(str) # empty string
my_dict = defaultdict(list) # empty list
my_dict = defaultdict(tuple) # empty tuple
my_dict = defaultdict(dict) # empty dict
my_dict = defaultdict(set) # empty set
my_dict = defaultdict(MyClass) # an instance of MyClass
my_dict = defaultdict(lambda: 'missing') # the string "missing"
my_dict = defaultdict(lambda: 7) # the number 7

# This function is only called once for each missing key.
# The missing key and the value this returns are
# added to the defaultdict so it is no longer missing.
def get_missing():
    print('in get_missing')
    return 'mixed'

dogs = defaultdict(get_missing) # returns "mixed" for missing keys

Here is an example of creating and using a defaultdict:

from collections import defaultdict

names_by_breed = defaultdict(list)

def add_dog(name, breed):
    # Don't need to check for missing breed key
    # because an empty list will be supplied.
    names_by_breed[breed].append(name)

add_dog('Comet', 'Whippet')
add_dog('Maisey', 'Treeing Walker Coonhound')
add_dog('Reece', 'Whippet')

print('Whippet names are', names_by_breed['Whippet']) # ['Comet', 'Reece']
print('Beagle names are', names_by_breed['Beagle']) # []

fruit_counts = defaultdict(int)

def add_fruit(name):
    # Don't need to check for missing fruit key
    # because zero will be supplied.
    fruit_counts[name] += 1

add_fruit('banana')
add_fruit('apple')
add_fruit('banana')

print('banana count is', fruit_counts['banana']) # 2
print('grape count is', fruit_counts['grape']) # 0

UserDict, UserList, and UserString

These classes are useful as the base class of custom classes that need the functionality of a dict, list, or string. In older versions of Python it is not possible to directly inherit from dict, list, and string, but doing so is now supported. However, it can still be easier to inherit from these “User” classes instead because the underlying dict, list, or string will be available as an attribute.