Storage of Network Data and Topology#
As discussed in the first tutorial, OpenPNM uses Numpy ndarrays
to store all data, and these arrays are all stored in dict
s, because all OpenPNM objects are subclassed python dictionaries. In this tutorial will take a deeper dive into the details of the OpenPNM data storage scheme. The sections to be covered are:
The Spreadsheet Analogy
Maintaining Data Integrity
Representing Topology
Spreadsheet Analogy#
The best analogy for explaining data storage in OpenPNM is the humble spreadsheet. According to this analogy you can imagine all data is stored in a table; two tables actually, one for pore data and one for throat data. Each pore (or throat) corresponds to a row and each property corresponds to a column. Consider the following network with 4 pores, 3 throats:
import openpnm as op
import numpy as np
op.visualization.set_mpl_style()
np.random.seed(0)
pn = op.network.Demo(shape=[4, 1, 1])
Let’s use pandas to express the geometric properties as a ‘spreadsheet’ and pull out all pore properties that are as wide as a single column.
import pandas as pd
pore_data_sheet = pd.DataFrame({k: pn[k] for k in pn.props(element='pore') if pn[k].ndim == 1})
We can now view this ‘spreadsheet’:
pore_data_sheet
pore.coordination_number | pore.diameter | pore.max_size | pore.seed | pore.volume | |
---|---|---|---|---|---|
0 | 1.0 | 0.474407 | 1.0 | 0.474407 | 0.055905 |
1 | 2.0 | 0.557595 | 1.0 | 0.557595 | 0.090773 |
2 | 2.0 | 0.501382 | 1.0 | 0.501382 | 0.065994 |
3 | 1.0 | 0.472442 | 1.0 | 0.472442 | 0.055213 |
The column names are the properties such as pore.volume, and the rows correspond to the pore index, so pore 0
has a volume of 0.055905.
One could also extract an entire column using:
column = pore_data_sheet['pore.volume']
print(column)
0 0.055905
1 0.090773
2 0.065994
3 0.055213
Name: pore.volume, dtype: float64
Or access individual elements:
print(pore_data_sheet['pore.volume'][0])
0.05590507143096387
Warning
The spreadsheet analogy is very apt, but it breaks down slightly when one considers data that is multiple columns wide such as ‘pore.coords’. The is stored as an Np-by-3 array in the network dictionary, but spreadsheets usually require that all data is only a single column wide.
Rules to Maintain Data Integrity#
Several rules have been implemented to control the integrity of the data. This is implemented by subclassing the __setitem__
method of the dict
. Each time data is assigned to a dictionary key, OpenPNM first checks the data type and dictionary key to ensure the rules are followed. Each of these checks are explored below.
All Values are Converted to Numpy Arrays#
Only Numpy arrays can be stored in an OpenPNM object, and any data that is written to an OpenPNM object (i.e. dictionary) will be converted to a Numpy array. This is done to ensure that all mathematical operations throughout the code can be consistently done using vectorization.
pn['throat.list'] = [1, 2, 3]
print(type(pn['throat.list']))
<class 'numpy.ndarray'>
This illustrates that the basic python list-type has been converted to a Numpy array when stored in the dictionary
Dictionary Keys Must Start With ‘pore’ or ‘throat’#
All array names must begin with either ‘pore.’ or ‘throat.’ which serves to identify the type of information they contain.
try:
pn['foo.bar'] = 0
except:
print('This will throw an exception since the dict name cannot start with foo')
This will throw an exception since the dict name cannot start with foo
The pore or throat prefix is enforced so that OpenPNM knows how long an array should be. Attempting to write an array of the wrong length will result in an error:
try:
pn['pore.test'] = [0, 0, 0]
except:
print('This will throw an exception since there are 4 pores, hence all pore arrays should be 4 elements long')
This will throw an exception since there are 4 pores, hence all pore arrays should be 4 elements long
Any Scalars are Expanded to a Full-Length Vector#
For the sake of consistency only arrays of length Np or Nt are allowed in the dictionary. Assigning a scalar value to a dictionary results in the creation of a full length vector, either Np or Nt long, depending on the name of the array. This effectively applies the scalar value to all locations in the network.
pn['pore.test'] = 0
print(pn['pore.test'])
[0 0 0 0]
Note how the scalar value has been cast to an array of 4 elements long, one for each pore in the network.
Nested Dictionary Names are Allowed#
It’s possible to create nested properties by assigning a dictionary containing numpy arrays
pn['pore.concentration'] = {'species_A': 0, 'species_B': 1}
print(pn['pore.concentration'])
{'species_A': array([0, 0, 0, 0]), 'species_B': array([1, 1, 1, 1])}
The above rule about expanding the scalar values to a numpy array have been applied.
Requesting the top level of dictionary key returns both concentrations, but they can be accessed directly too:
print(pn['pore.concentration.species_A'])
[0 0 0 0]
You can also retrieve a dictionary by requesting only the element (‘pore’ or ‘throat’) and the propname (‘concentration’):
pn['pore.concentration']
{'species_A': array([0, 0, 0, 0]), 'species_B': array([1, 1, 1, 1])}
The above returns a ‘subdictionary’ which can be indexed into:
pn['pore.concentration']['species_A']
array([0, 0, 0, 0])
Boolean Arrays are Treated as Labels, Numerical Arrays are Treated as Properties#
Any Boolean data will be treated as a label while all other numerical data is treated as a property. OpenPNM uses labels extensively as will be illustrated in a later tutorial. For this tutorial, we’ll just introduce the idea. We can create a new label by assigning False
to a new key. This will expand the single False
value into every pore, which means that NO pores have that label (yet):
pn['pore.label'] = False
print(pn.labels(element='pore'))
['pore.label', 'pore.left', 'pore.right', 'pore.surface', 'pore.xmax', 'pore.xmin']
You can see that 'pore.label'
shows up in this list automatically since it is of Boolean type. However, if you ask the network which pores have that label, the answer is none:
pn.pores('label')
array([], dtype=int64)
If we then set some locations to True
, this will change:
pn['pore.label'][[0, 1, 2]] = True
print(pn.pores('label'))
[0 1 2]
OpenPNM objects include a method for creating a label and applying it to locations all in one step using the set_label
method.
pn.set_label('another', pores=[2, 3])
print(pn.pores('another'))
[2 3]
The param
prefix#
Expanding all scalar values to arrays that are either Np or Nt long is done as a convenience or shorthand to mean “place this value of X in all pores”. It means that you can later ask “what is the value of X in pore Y” and get an answer. This implies that X might take on different values in each pore/throat. However, there are properties which we know are constant, such as the molecular weight or critical temperature of a pure species. In V3 we have introduced the params
attribute to all object, such that pn.params
is a dictionary containing such scalar values. For convenience we have also added the ability to read and write to the params
dictionary via the main get/set method. This is illustrated below:
pn.params['lattice_connectivity'] = 6
print(pn.params)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
Parameters Value
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
dimensionality (3,)
lattice_connectivity 6
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
Tip
Use the “param” prefix: It is possible use the param
prefix when reading and writing to the object’s dictionary. This intercepts the given key and dispatches the request to the params
attribute. This feature is particularly useful when passing arguments to pore-scale models (to be discussed in a later tutorial) since you can pass either arg='param.value'
or arg='pore.value'
and OpenPNM will be able to fetch the requested data by doing value = pn[arg]
and both options will work.
pn['param.test'] = 2
print(pn.params)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
Parameters Value
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
dimensionality (3,)
lattice_connectivity 6
test 2
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
Representing Topology#
Topology was introduced in the last tutorial, but not discussed in depth. Actually, the next tutorial will explain topology in full detail, but here we can reveal a little more about it as it pertains to data storage.
Consider the following simple random network:
np.random.seed(0)
dn = op.network.Delaunay(points=4, shape=[1, 1, 0])
op.visualization.plot_tutorial(dn);
The basic premise of how OpenPNM stores topology can be stated in 1 sentence:
The pores on either end of a throat are just another property to be stored, along with diameter, length, etc.
In other words, referring to the above diagram, throat No. 5 has pores 1 and 4 on its ends. Using the spreadsheet analogy, this implies a new column that stores the pair of pores connected by each throat. OpenPNM calls this property 'throat.conns'
:
print(dn['throat.conns'])
[[0 1]
[0 2]
[0 3]
[1 3]
[2 3]]
Inspection of this Nt-by-2 array shows that each row contains the pore indices corresponding to that throat.
Note
Topology Rules: OpenPNM requires that the following be true about the network:
A throat is a connection between exactly two pores, no more and no less
Throats are non-directional, meaning that flow in either direction is equal Other general, but non-essential rules are:
Pores can have an arbitrary number of throats, including zero; however, pores with zero throats lead to singular matrices and other problems so should be avoided.
Two pores are generally connected by no more than one throat. It is technically possible in OpenPNM to have multiple throats between a pair of pores, but it is not rigorously supported so unintended results may arise. For instance, when converting to CSR format the dual throats are merged into one, so re-converting back results in loss of such throats.
Adjacency Matrices: Dense vs Sparse#
The topology storage scheme described above is actually an adjacency matrix, in a sparse storage format known as IJV or COO.
An adjacency matrix is a Np-by-Np matrix with non-zero values at location (i, j) indicating that pores i and j are connected.
OpenPNM Network objects have a method that generates the sparse adjacency matrix:
am = dn.create_adjacency_matrix(weights=dn.Ts, triu=True)
print(am)
<COOrdinate sparse matrix of dtype 'int64'
with 5 stored elements and shape (4, 4)>
Coords Values
(0, 1) 0
(0, 2) 1
(0, 3) 2
(1, 3) 3
(2, 3) 4
This can be converted to a dense array and visualized:
am = dn.create_adjacency_matrix()
print(am.todense())
[[0 1 1 1]
[1 0 0 1]
[1 0 0 1]
[1 1 1 0]]
In dense format the Np-by-Np shape is evident. The above matrix is symmetrical, so connections between pore i and j as well as between j and i are shown.
An important feature of the adjacency matrix is that it is highly sparse (mostly zeros) and can be stored with a variety of sparse storage schemes offered by Scipy. OpenPNM stores the adjacency matrix in the ‘COO’ format, which essentially stores the coordinates (I,J) of the nonzero elements in an two-column wide array.
Thus the throat property called 'throat.conns'
is an Nt-by-2 array that gives the index of the two pores on either end of a given throat.
Note
Additional Thoughts on Sparse Storage:
In pore networks there is (usually) no difference between traversing from pore i to pore j or from pore j to pore i, so a 1 is also found at location (j, i) and the matrix is symmetrical.
Since the adjacency matrix is symmetric, it is redundant to store the entire matrix when only the upper triangular part is necessary. The
'throat.conns'
array only stores the upper triangular information, and i is always less than j.Although this storage scheme is widely known as IJV, the
scipy.sparse
module calls this the Coordinate or COO storage scheme.Some tasks are best performed on other types of storages scheme, such as CSR or LIL. OpenPNM converts between these internally as necessary, but users can generate a desired format using the
create_adjacency_matrix
method which accepts the storage type as an argument (i.e.'csr'
,'lil'
, etc). For a discussion of sparse storage schemes and the respective merits, see this Wikipedia article.
conduit
data#
A conduit in OpenPNM refers to a ‘pore-throat-pore’ connection, where the pores and throats are each called elements. These elements act as resistors (or conductors) in series. Each conduit therefor contains exactly 1 throat and 2 pores. Transport within the network occurs via these interconnected conduits, so the geometric properties of these elements are often needed for computing things like diffusive conductance. This is common enough that OpenPNM V3 includes a new helper function for obtaining this data, as illustrated below:
D = pn.get_conduit_data('diameter')
print(D)
[[0.47440675 0.23720338 0.55759468]
[0.55759468 0.25069084 0.50138169]
[0.50138169 0.2362208 0.47244159]]
The above array contains 3 columns, with the center column representing the throat diameter:
print(pn['throat.diameter'])
[0.23720338 0.25069084 0.2362208 ]
And the left and right columns represent the diameter of the pores on each end of the throat. The order of “left” vs “right” reflects the pore indices for the 'throat.conns'
array.
print(pn.conns)
[[0 1]
[1 2]
[2 3]]
In fact, the values for the pore elements can be obtained using numpy’s fancy indexing:
R1_R2 = pn['pore.diameter'][pn.conns]
print(R1_R2)
[[0.47440675 0.55759468]
[0.55759468 0.50138169]
[0.50138169 0.47244159]]
Which can be seen to match the diameter values in D
above.