pygraphblas

pygraphblas is a Python wrapper around the GraphBLAS API.

Installation

pygraphblas requires the
SuiteSparse:GraphBLAS
library. Once you have these installed, pygraphblas can be installed
with:

python setup.py install

An installation script for Ubuntu 18.04 is provided in the install-ubuntu.sh file.

Docker

pygraphblas is distributed as two different docker images on Docker
Hub

. The "minimal" image, containing only the library and
ipython and can be run with the command:

docker run --rm -it graphblas/pygraphblas-minimal ipython

You can run a "full" Jupyter notebook server
with docker and try the example Notebooks use the command:

docker run --rm -it graphblas/pygraphblas-notebook

Open up the URL printed on your terminal screen, typically something
liker http://127.0.0.1:8888/tree/pygraphblas/demo and see the
following Notebooks:

Tests

To run the tests checkout pygraphblas and use:

$ ./test.sh

Summary

pygraphblas is a python extension that bridges The GraphBLAS
API
with the Python
programming language. It uses the
CFFI library to wrap the low
level GraphBLAS API and provides high level Matrix and Vector Python
types that make GraphBLAS simple and easy.

GraphBLAS is a sparse linear algebra API optimized for processing
graphs encoded as sparse matrices and vectors. In addition to common
real/integer matrix algebra operations, GraphBLAS supports over a
thousand different Semiring
algebra operations, that can be used as basic building blocks to
implement a wide variety of graph algorithms. See
Applications
from Wikipedia for some specific examples.

pygraphblas leverages the expertise in the field of sparse matrix
programming by The GraphBLAS Forum and uses
the
SuiteSparse:GraphBLAS
API implementation. SuiteSparse:GraphBLAS is brought to us by the work
of Dr. Tim Davis,
professor in the Department of Computer Science and Engineering at
Texas A&M University. News and
information
can provide
you with a lot more background information, in addition to the
references below.

While it is my goal to make it so that pygraphblas works with any
GraphBLAS implementation, it currently only works with SuiteSparse.
SuiteSparse is currently the only realistically usable GraphBLAS
implementation, and additionally it provides several "extension"
features and pre-packaged objects that are very useful for
pygraphblas. If there is a GraphBLAS implementation you would like to
see support for in pygraphblas, please consider creating an issue for
it for discussion and/or sending me a pull request.

Introduction to Graphs and Matrices

GraphBLAS uses matrices and Linear Algebra to represent graphs, as
described in this mathmatical introduction to
GraphBLAS

by Dr. Jeremy Kepner head and founder
of MIT Lincoln Laboratory Supercomputing
Center
.

There are two useful matrix representations of graphs: Adjacency
Matrices
and
Incidence Matrices.
For this introduction we will focus on the adjacency type as they are
simpler, but the same ideas apply to both, both are suported by
GraphBLAS and pygraphblas, and it is easy to switch back and forth
between them.

On the left is a graph, and on the right, the adjacency matrix that
represents it. The matrix has a row and column for every node in the
graph. If there is an edge going from node A to B, then there will be
a value present in the intersection of As row with Bs column. How it
differs from many other matrix representations is that the matrix is
sparse, nothing is stored in computer memory where there are unused
elements.

Sparsity is important because one practical problem with
matrix-encoding graphs is that most real-world graphs tend to be
sparse, as above, only 7 of 36 possible elements have a value. Those
that have values tend to be scattered randomly across the matrix
(for "typical" graphs), so dense linear algebra libraries like BLAS or
numpy do not encode or operate on them efficiently, as the relevant
data is mostly empty memory with actual data elements spaced far
apart. This wastes memory and CPU resources, and defeats CPU caching
mechanisms.

For example, suppose a fictional social network has 1 billion users,
and each user has about 100 friends, which means there are about 100
billion (1e+11) connections in the graph. A dense matrix large enough
to hold this graph would need (1 billion)^2 or
(1,000,000,000,000,000,000), a "quintillion" elements, but only 1e+11
of them would have meaningful values, leaving only 0.0000001th of the
matrix being utilized.

By using a sparse matrix instead of dense, only the elements used are
actually stored in memory. The parts of the matrix with no value are
interpreted, but not necessarily stored, as an identity value, which
may or may not be the actual number zero, but possibly other values
like positive or negative infinity depending on the particular
semiring operations applied to the matrix.

Semirings encapsulate different algebraic operations and identities
that can be used to multiply matrices and vectors. Anyone who has
multiplied matrices has used at least one Semiring before, typically
referred to as "plus_times". This is the common operation of
multiplying two matrices containing real numbers, the corresponding row
and column entries are multipled and the results are summed for the
final value.

GitHub