Numerical Python Primer#

OpenPNM uses two very common python data structures, so it’s best to get comfortable with them right away. In this example we’ll cover:

Topics Covered

  • Python’s list

  • Numpy’s ndarray

  • Python’s dict or dictionary

  • Subclassing a dict

Python Lists: Flexible but Slow#

First lets quickly look at Python’s version of an array, which is called a list. It is indicated by square brackets:

L = [0, 2, 4, 6, 8]

Note

Note that round brackets indicate a “tuple” which is basically the same as a list except they are immutable, meaning their values cannot be changed.

You can read and write values in a list as follows:

L[0] = L[2]*L[4]
print(L)
[32, 2, 4, 6, 8]

Note

Python uses 0-indexing, and also that square brackets are used to index into a sequence while round brackets are used in function calls.

You can make the list longer by appending a single item:

L.append(100)
print(L)
[32, 2, 4, 6, 8, 100]

extending it with another list of items:

L.extend([200, 300])
print(L)
[32, 2, 4, 6, 8, 100, 200, 300]

Or removing items:

L.pop(2)
print(L)
[32, 2, 6, 8, 100, 200, 300]

However, the list is not very good at math:

try:
    print(L + 2)
except TypeError:
    print('Adding to a list assumes you are joining 2 lists')
print(L + [2, 3])
Adding to a list assumes you are joining 2 lists
[32, 2, 6, 8, 100, 200, 300, 2, 3]

And multiplication assumes you want to duplicate the list N times:

print(L*2)
[32, 2, 6, 8, 100, 200, 300, 32, 2, 6, 8, 100, 200, 300]

The reason the list is not ideal for numerical operations is that anything can be stored in each location:

L[0] = 'str'
print(L)
['str', 2, 6, 8, 100, 200, 300]

This is why it’s not possible to add or multiply a list, since python has no way of knowing the meaning of adding an integer and a string (i.e. ‘one’ + 1.0).

For a more detailed introduction to lists there are many resources on the internet, such as the official docs and other tutorials like this one

Numpy ndarray: Optimized for Numerics#

Given the above noted limitations of python for math, it is natural to wonder why use python for scientific work. The answer to this question is Numpy. Numpy has been around almost as long as python, and it is used almost exclusively in scientific python applications because as discussed above the native list is not very fast. Numpy arrays, or ndarrays, on the other hand are actually “C” arrays behind the scenes so are very fast. The downside is that you must learn a ‘mini-language’ to use them. The following few code blocks illustrate this.

See also

There are many resources for learning and understanding Numpy arrays, such as the official numpy user guide and this fun illustrated guide.

import numpy as np
a = np.arange(0, 100, 15)
print(a)
[ 0 15 30 45 60 75 90]

This is an example of the ‘mini-language’ that you need to learn, since arange is the Numpy version of range. The Numpy package has hundreds of functions available, and to be proficient with Numpy you need to at least be aware of most of them. You can see a list by typing dir(np).

Like the list you can index into Numpy arrays for reading and writing:

print(a[2])
30
a[0] = 999
print(a)
[999  15  30  45  60  75  90]

You can also use what is called ‘fancy indexing’, which allows you to index into an ndarray with another array:

b = [0, 2, 4]
print(a[b])
[999  30  60]

You can set multiple locations with a single value:

a[b] = -100
print(a)
[-100   15 -100   45 -100   75   90]

Or an array of values:

a[[1, 3, 5]] = [111, 222, 333]
print(a)
[-100  111 -100  222 -100  333   90]

You can also use “masks” of boolean values, which is interpreted to mean any index where the mask is True:

mask = a < 0
a[mask] = 0
print(a)
[  0 111   0 222   0 333  90]

Vectorization#

And of course, math makes sense to an ndarray since all the elements are assured to be numbers. This means that an ndarray can be multiplied by 2 or by another array (of the same length). This is called vectorization, since the array are operated on elementwise as you would do when adding two vectors. The alternative would be matrix operations, which are possible but are not the default behavior (unlike in Matlab where all operations are assumed to be matrix operations).

print(a*2)
[  0 222   0 444   0 666 180]
print(a + 100)
[100 211 100 322 100 433 190]
print(a*a)
[     0  12321      0  49284      0 110889   8100]

Methods versus Functions#

One of the tricks when first learning numpy is recognizing the difference (of lack thereof) between the methods attached to numpy objects, and the functions offered at the top level of the numpy package. For instance, consider finding the sum of the following array:

arr = np.array([1, 2, 3, 4])

This can be done by passing arr to the numpy sum function:

print(np.sum(arr))
10

Or by calling the sum method that is attached to the arr object:

print(arr.sum())
10

These are both equally valid and computationally identical behind the scenes. The first option may be more familiar if you have used matlab. The first option also offers a several more functions since some require two arrays as arguments. The second option is really just for readability and for people who prefer to work with “objects”.

Dictionaries: Holding Things Together#

The last piece of the puzzle is Python’s built-in dict which is much like a list, in the sense that it can act as a container for any datatype, but items are addressed by name instead of index number.

d = dict()
d['arr'] = a
d['list'] = L
print(d)
{'arr': array([  0, 111,   0, 222,   0, 333,  90]), 'list': ['str', 2, 6, 8, 100, 200, 300]}

You can retrieve any element or value by name, which is called the key:

print(d['arr'])
[  0 111   0 222   0 333  90]

'arr' and [  0 111   0 222   0 333  90] are called “key-value” pairs. You can get a list of all keys defined on a dict using:

d.keys()
dict_keys(['arr', 'list'])
d.values()
dict_values([array([  0, 111,   0, 222,   0, 333,  90]), ['str', 2, 6, 8, 100, 200, 300]])

And adding new items is easy:

d['test'] = 1.0
print(d)
{'arr': array([  0, 111,   0, 222,   0, 333,  90]), 'list': ['str', 2, 6, 8, 100, 200, 300], 'test': 1.0}

Deleting values is also possible:

del d['test']
print(d.keys())
dict_keys(['arr', 'list'])

or if you want to remove the value and catch it for use:

arr = d.pop('arr')
print(d)
{'list': ['str', 2, 6, 8, 100, 200, 300]}

See also

For a more detailed overview of using dicts, there are many internet tutorials such as this one, this one, or this one.

Subclassing dict#

This is may seem like an intimidating concept at first, but it’s actually beautifully simple once you get a feel for how it works. It’s also relevant to learning OpenPNM since it uses dicts extensively, and these are often subclassed to augment their functionality. Subclassing means “taking the basic functionality of the dict, then adding to and/or changing it”. For a deep dive into the process of subclassing a dict refer to this tutorial, but here we’ll give a more basic introduction.

To illustrate the idea of subclasses, as it pertains to OpenPNM, let’s change how the reading and writing of items works. Whenever you use the square brackets [ ] to index into a dict, this actually calls the __getitem__ and __setitem__ methods. The double underscores indicate that these are intrinsic Python methods which the user should not call directly (sometimes called magic methods), but they do the bulk of the work. So let’s try it out by creating a class that tells us what is going on each time we read and write something:

# Create a new class which is a dict PLUS the extra functionality we will add
class NewDict(dict): 
    
    def __setitem__(self, key, value):
        # This is our customization
        print("The key being written is:", key)
        print("The value being written is:", value)
        # Now we call the setitem on the actual dict class
        super().__setitem__(key, value)
        
    def __getitem__(self, key):
        print("The key being retrieved is:", key)
        return super().__getitem__(key)
dnew = NewDict()

The following will trigger the __setitem__ method since we are “setting” the value of 1.0:

dnew['test'] = 1.0
The key being written is: test
The value being written is: 1.0

The following will trigger the __getitem__ method since we are “getting” the value stored in 'test':

dnew['test']
The key being retrieved is: test
1.0

In OpenPNM, both the __setitem__ and __getitem__ methods are overloaded so that several checks are performed. For instance, we enforce that all dictionary keys start with either 'pore' or 'throat':

import openpnm as op
pn = op.network.Demo()
try:
    pn['foo.bar'] = 1.0
except Exception as e:
    print(e)
All dict names must start with pore, throat, or param

In the above cell block the statement within the try block raises and exception, which is caught in the except block, and the error message is printed indicating clearly what the problem is.

We can implement this behavior in our NewDict class above as follows:

class NewDict(dict): 
    
    def __setitem__(self, key, value):
        if not (key.startswith('pore') or key.startswith('throat')):
            raise Exception('Key must start with either pore, or throat')
        super().__setitem__(key, value)
dnew = NewDict()
try:
    dnew['foo.test'] = 1.0
except Exception as e:
    print(e)
dnew['pore.test'] = 1.0
print(dnew.keys())
Key must start with either pore, or throat
dict_keys(['pore.test'])

Another useful feature of subclassing dicts is the ability to add new functions, not just overwriting the old ones. Let’s add a method that prints all the pore properties, but not the throat properties:

class NewDict(dict): 
    
    def __setitem__(self, key, value):
        if not (key.startswith('pore') or key.startswith('throat')):
            raise Exception('Key must start with either pore, or throat')
        super().__setitem__(key, value)
    
    def poreprops(self):
        print('The following pore properties were found:')
        for item in self.keys():
            if item.startswith('pore'):
                print('-> ' + item)
dnew = NewDict()
dnew['pore.test'] = 1.0
dnew['pore.bar'] = 100
dnew['throat.blah'] = 2.0
dnew.poreprops()
The following pore properties were found:
-> pore.test
-> pore.bar

And finally, you can add attributes to a subclassed dict, such as a name:

dnew.name = 'bob'
print(dnew.name)
bob

You can also add other dictionaries!

dnew.settings = {}
dnew.settings['name'] = 'bob'
print(dnew.settings)
{'name': 'bob'}

So in summary, dicts are extremely versatile and handy “data containers”. You can alter their built-in behavior to do things like enforcing that all data are numpy arrays; you can add new methods the allow users interact with the data in specific ways; and you can add data attributes to help users get quick access to needed information.