by Anand | tags: [ python list lists datastructure programming ]
Fun with Python Lists
Lists are one of the most versatile data structures in Python. Being heterogeneous, ordered and mutable, they are one of the most used Python types apart from dictionaries.
In today’s post, I am hoping to introduce you to some interesting problems using Python lists. You may have seen most of the things we are going to discuss here in the Internet in many other places. However I am hoping there will be a surprise or two along the way and you may learn a few new things!
This post will be written as a list (no pun intended) of tasks and we will look at ways to solve the tasks.
1. Flatten a list of depth 2
Let us say you got a list like the following with a depth of two. In other words a two-level nested list.
l1 = [[1,2,3], [4,5,6], [7,8,9]
How will you flatten it to,
l2 = [1,2,3,4,5,6,7,8,9]
A quick solution that comes to mind first is,
l2=[]
for x in l1:
if type(x) is list:
l2.extend(x)
else:
l2.append(x)
However, the following is more concise.
l2=[]
map(l2.extend, l1)
# Result is in l2
There are many answers to this problem as discussed in here. However one of the most interesting ones is,
l2 = sum(l1, [])
In this case the list acts as a Monoid using +
operator in a general sense to unfold and append the sub-lists together.
A slight rewrite of this using functools.reduce
would be,
l2 = functools.reduce(operator.iadd, l1, [])
And a more raw
rewrite using the built-in reduce
function (only in Python 2.x) would be,
l2 = reduce(lambda x,y: x+y, l1, [])
Note that to flatten a list of arbitrary depth, you will need a recursive solution.
2. Transpose a Matrix
In this case, our matrix
is represented as a list of lists. In fact we can reuse the same list from previous example here.
l1 = [[1,2,3], [4,5,6], [7,8,9]
This represents the matrix
$$\left[ \begin{array}{c} 1 & 2 & 3 & \newline 4 & 5 & 6 & \newline 7 & 8 & 9 & \newline \end{array} \right]$$
The transpose of which is,
$$\left[ \begin{array}{c} 1 & 4 & 7 & \newline 2 & 5 & 8 & \newline 3 & 6 & 9 & \newline \end{array} \right]$$
There is a simple and beautiful one-liner for this.
transpose = zip(*matrix)
For example,
>>> matrix=[[1,2,3], [4,5,6], [7,8,9]]
>>> zip(*matrix)
[(1, 4, 7), (2, 5, 8), (3, 6, 9)]
zip
is a built-in function which combines two or more sequences taking elements by index across them one by one and building the result as a final list. In this case, when you feed the matrix as a de-referenced
list by expanding it using positional argument syntax *matrix
, this becomes equivalent to feeding zip with each of the sub-lists (rows) one by one. So the result becomes equivalent to,
>>> zip([1,2,3], [4,5,6], [7,8,9])
[(1, 4, 7), (2, 5, 8), (3, 6, 9)]
which is same as the transpose.
3. Convert a list to a dictionary
This is a fairly common use case, that of creating a dictionary whose keys are members of the list. There are quite a few solutions here with the most common one being,
d = {}.fromkeys(l)
which returns a dictionary with keys as items of the list and values as None
.
A few other approaches are,
# Using enumerate.
d = dict(enumerate(l))
# Using a dict comprehension
d = {k:None for k in l}
# Using zip
d = dict(zip(l, l))
Which one of these do you think is the fastest ? I suggest you to time them and figure it out as an exercise.
4. Frequency dictionary
This task involves returning a frequency dictionary from a list containing repeated elements. In other words you return a dictionary of (element, count)
pairs for every element in the list where count
is the number of times it occurs in the list. This is a very common problem that occurs in programming.
If you are new to Python you may write something like the following as the first-cut.
# brute-force
d = {}
for item in l:
d[item] = l.count(item)
However you will find that this scales very badly for large lists as the count
method will be un-necessarily called multiple times for the same item again and again.
If you know your collections
module, you will instead write something like,
# using collections.defaultdict
d = collections.defaultdict(int)
for item in l:
d[item] += 1
This scales up very nicely as there is no real penalty for over-riding a repeating dictionary key as will happen here when an element repeats. This performs close to O(n)
where n is the size of the list.
Here is another method using itertools
and its groupby
function.
# using itertools.groupby
d = {k:len(tuple(g)) for k,g in itertools.groupby(sorted(l), hash)}
I am not going to explain how this one works. It is an exercise left to the informed or enterprising reader.
In my tests the defaultdict
solution performed the best. The groupby
one performed similar but it was approx an order of 5X times slower. The brute-force one is about 1000 times slower than both of them.
And last but not the least, there is a data-structure built-in in Python for this exact use-case, namely the Counter
class in collections
module. So much so that it is a one-liner.
d = collections.Counter(l)
Though this looks very succinct, it performs slightly worse than the solution using defaultdict
.
5. Common elements across two lists
The last of this post, but not the least. Here the problem is to find common elements across two lists, a problem so common that there is a standard solution which almost every Python programmer seems to be aware of:
# using sets
common = sorted(set(l1).intersection(set(l2)))
However there are at least two other ways of doing this.
# using a dictionary
d = {}.fromkeys(l1)
common = sorted({item:1 for item in l2 if item in d})
Here we convert one of the lists to a dictionary. Then we go through the second list and create a dictionary of elements if they are found in the dictionary.
The third way is to convert both the list to a dictionary. This may be more optimal or less optimal than the above solution depending on how many elements repeat in the dictionaries and their relative sizes.
# using two dictionaries
d1 = {}.fromkeys(l1)
d2 = {}.fromkeys(l2)
common = sorted([k for k in d1 if k in d2])
These three solutions perform very similarly in order of magnitude though there may be slight variation depending on the relative sizes of the lists and the repetition of elements in them. On average, the first two solutions perform similar with the last one performing slightly worse.
Note that name and e-mail are required for posting comments