Numerical Python Primer#
OpenPNM uses two very common python data structures, so it’s best to get comfortable with them right away. In this example we’ll cover:
Topics Covered
Python’s
list
Numpy’s
ndarray
Python’s
dict
or dictionarySubclassing a
dict
Python Lists: Flexible but Slow#
First lets quickly look at Python’s version of an array, which is called a list
. It is indicated by square brackets:
L = [0, 2, 4, 6, 8]
Note
Note that round brackets indicate a “tuple” which is basically the same as a list except they are immutable, meaning their values cannot be changed.
You can read and write values in a list as follows:
L[0] = L[2]*L[4]
print(L)
[32, 2, 4, 6, 8]
Note
Python uses 0-indexing, and also that square brackets are used to index into a sequence while round brackets are used in function calls.
You can make the list
longer by append
ing a single item:
L.append(100)
print(L)
[32, 2, 4, 6, 8, 100]
extend
ing it with another list of items:
L.extend([200, 300])
print(L)
[32, 2, 4, 6, 8, 100, 200, 300]
Or removing items:
L.pop(2)
print(L)
[32, 2, 6, 8, 100, 200, 300]
However, the list
is not very good at math:
try:
print(L + 2)
except TypeError:
print('Adding to a list assumes you are joining 2 lists')
print(L + [2, 3])
Adding to a list assumes you are joining 2 lists
[32, 2, 6, 8, 100, 200, 300, 2, 3]
And multiplication assumes you want to duplicate the list N times:
print(L*2)
[32, 2, 6, 8, 100, 200, 300, 32, 2, 6, 8, 100, 200, 300]
The reason the list is not ideal for numerical operations is that anything can be stored in each location:
L[0] = 'str'
print(L)
['str', 2, 6, 8, 100, 200, 300]
This is why it’s not possible to add or multiply a list, since python has no way of knowing the meaning of adding an integer and a string (i.e. ‘one’ + 1.0).
For a more detailed introduction to list
s there are many resources on the internet, such as the official docs and other tutorials like this one
Numpy ndarray
: Optimized for Numerics#
Given the above noted limitations of python for math, it is natural to wonder why use python for scientific work. The answer to this question is Numpy. Numpy has been around almost as long as python, and it is used almost exclusively in scientific python applications because as discussed above the native list
is not very fast. Numpy arrays, or ndarray
s, on the other hand are actually “C” arrays behind the scenes so are very fast. The downside is that you must learn a ‘mini-language’ to use them. The following few code blocks illustrate this.
See also
There are many resources for learning and understanding Numpy arrays, such as the official numpy user guide and this fun illustrated guide.
import numpy as np
a = np.arange(0, 100, 15)
print(a)
[ 0 15 30 45 60 75 90]
This is an example of the ‘mini-language’ that you need to learn, since arange
is the Numpy version of range
. The Numpy package has hundreds of functions available, and to be proficient with Numpy you need to at least be aware of most of them. You can see a list by typing dir(np)
.
Like the list
you can index into Numpy arrays for reading and writing:
print(a[2])
30
a[0] = 999
print(a)
[999 15 30 45 60 75 90]
You can also use what is called ‘fancy indexing’, which allows you to index into an ndarray
with another array:
b = [0, 2, 4]
print(a[b])
[999 30 60]
You can set multiple locations with a single value:
a[b] = -100
print(a)
[-100 15 -100 45 -100 75 90]
Or an array of values:
a[[1, 3, 5]] = [111, 222, 333]
print(a)
[-100 111 -100 222 -100 333 90]
You can also use “masks” of boolean values, which is interpreted to mean any index where the mask is True
:
mask = a < 0
a[mask] = 0
print(a)
[ 0 111 0 222 0 333 90]
Vectorization#
And of course, math makes sense to an ndarray
since all the elements are assured to be numbers. This means that an ndarray
can be multiplied by 2 or by another array (of the same length). This is called vectorization, since the array are operated on elementwise as you would do when adding two vectors. The alternative would be matrix operations, which are possible but are not the default behavior (unlike in Matlab where all operations are assumed to be matrix operations).
print(a*2)
[ 0 222 0 444 0 666 180]
print(a + 100)
[100 211 100 322 100 433 190]
print(a*a)
[ 0 12321 0 49284 0 110889 8100]
Methods versus Functions#
One of the tricks when first learning numpy is recognizing the difference (of lack thereof) between the methods attached to numpy objects, and the functions offered at the top level of the numpy package. For instance, consider finding the sum of the following array:
arr = np.array([1, 2, 3, 4])
This can be done by passing arr
to the numpy sum
function:
print(np.sum(arr))
10
Or by calling the sum
method that is attached to the arr
object:
print(arr.sum())
10
These are both equally valid and computationally identical behind the scenes. The first option may be more familiar if you have used matlab. The first option also offers a several more functions since some require two arrays as arguments. The second option is really just for readability and for people who prefer to work with “objects”.
Dictionaries: Holding Things Together#
The last piece of the puzzle is Python’s built-in dict
which is much like a list, in the sense that it can act as a container for any datatype, but items are addressed by name instead of index number.
d = dict()
d['arr'] = a
d['list'] = L
print(d)
{'arr': array([ 0, 111, 0, 222, 0, 333, 90]), 'list': ['str', 2, 6, 8, 100, 200, 300]}
You can retrieve any element or value
by name, which is called the key
:
print(d['arr'])
[ 0 111 0 222 0 333 90]
'arr'
and [ 0 111 0 222 0 333 90]
are called “key-value” pairs. You can get a list of all keys defined on a dict using:
d.keys()
dict_keys(['arr', 'list'])
d.values()
dict_values([array([ 0, 111, 0, 222, 0, 333, 90]), ['str', 2, 6, 8, 100, 200, 300]])
And adding new items is easy:
d['test'] = 1.0
print(d)
{'arr': array([ 0, 111, 0, 222, 0, 333, 90]), 'list': ['str', 2, 6, 8, 100, 200, 300], 'test': 1.0}
Deleting values is also possible:
del d['test']
print(d.keys())
dict_keys(['arr', 'list'])
or if you want to remove the value and catch it for use:
arr = d.pop('arr')
print(d)
{'list': ['str', 2, 6, 8, 100, 200, 300]}
See also
For a more detailed overview of using dict
s, there are many internet tutorials such as this one, this one, or this one.
Subclassing dict
#
This is may seem like an intimidating concept at first, but it’s actually beautifully simple once you get a feel for how it works. It’s also relevant to learning OpenPNM since it uses dict
s extensively, and these are often subclassed to augment their functionality. Subclassing means “taking the basic functionality of the dict
, then adding to and/or changing it”. For a deep dive into the process of subclassing a dict
refer to this tutorial, but here we’ll give a more basic introduction.
To illustrate the idea of subclasses, as it pertains to OpenPNM, let’s change how the reading and writing of items works. Whenever you use the square brackets [ ]
to index into a dict
, this actually calls the __getitem__
and __setitem__
methods. The double underscores indicate that these are intrinsic Python methods which the user should not call directly (sometimes called magic methods), but they do the bulk of the work. So let’s try it out by creating a class that tells us what is going on each time we read and write something:
# Create a new class which is a dict PLUS the extra functionality we will add
class NewDict(dict):
def __setitem__(self, key, value):
# This is our customization
print("The key being written is:", key)
print("The value being written is:", value)
# Now we call the setitem on the actual dict class
super().__setitem__(key, value)
def __getitem__(self, key):
print("The key being retrieved is:", key)
return super().__getitem__(key)
dnew = NewDict()
The following will trigger the __setitem__
method since we are “setting” the value of 1.0:
dnew['test'] = 1.0
The key being written is: test
The value being written is: 1.0
The following will trigger the __getitem__
method since we are “getting” the value stored in 'test'
:
dnew['test']
The key being retrieved is: test
1.0
In OpenPNM, both the __setitem__
and __getitem__
methods are overloaded so that several checks are performed. For instance, we enforce that all dictionary keys start with either 'pore'
or 'throat'
:
import openpnm as op
pn = op.network.Demo()
try:
pn['foo.bar'] = 1.0
except Exception as e:
print(e)
All dict names must start with pore, throat, or param
In the above cell block the statement within the try
block raises and exception, which is caught in the except
block, and the error message is printed indicating clearly what the problem is.
We can implement this behavior in our NewDict
class above as follows:
class NewDict(dict):
def __setitem__(self, key, value):
if not (key.startswith('pore') or key.startswith('throat')):
raise Exception('Key must start with either pore, or throat')
super().__setitem__(key, value)
dnew = NewDict()
try:
dnew['foo.test'] = 1.0
except Exception as e:
print(e)
dnew['pore.test'] = 1.0
print(dnew.keys())
Key must start with either pore, or throat
dict_keys(['pore.test'])
Another useful feature of subclassing dict
s is the ability to add new functions, not just overwriting the old ones. Let’s add a method that prints all the pore properties, but not the throat properties:
class NewDict(dict):
def __setitem__(self, key, value):
if not (key.startswith('pore') or key.startswith('throat')):
raise Exception('Key must start with either pore, or throat')
super().__setitem__(key, value)
def poreprops(self):
print('The following pore properties were found:')
for item in self.keys():
if item.startswith('pore'):
print('-> ' + item)
dnew = NewDict()
dnew['pore.test'] = 1.0
dnew['pore.bar'] = 100
dnew['throat.blah'] = 2.0
dnew.poreprops()
The following pore properties were found:
-> pore.test
-> pore.bar
And finally, you can add attributes to a subclassed dict
, such as a name:
dnew.name = 'bob'
print(dnew.name)
bob
You can also add other dictionaries!
dnew.settings = {}
dnew.settings['name'] = 'bob'
print(dnew.settings)
{'name': 'bob'}
So in summary, dict
s are extremely versatile and handy “data containers”. You can alter their built-in behavior to do things like enforcing that all data are numpy arrays; you can add new methods the allow users interact with the data in specific ways; and you can add data attributes to help users get quick access to needed information.