## pygraphblas

pygraphblas is a Python wrapper around the GraphBLAS API.

## Installation

pygraphblas requires the

SuiteSparse:GraphBLAS

library. Once you have these installed, pygraphblas can be installed

with:

python setup.py install

An installation script for Ubuntu 18.04 is provided in the `install-ubuntu.sh`

file.

## Docker

pygraphblas is distributed as two different docker images on Docker

Hub

. The "minimal" image, containing only the library and

ipython and can be run with the command:

```
docker run --rm -it graphblas/pygraphblas-minimal ipython
```

You can run a "full" Jupyter notebook server

with docker and try the example Notebooks use the command:

```
docker run --rm -it graphblas/pygraphblas-notebook
```

Open up the URL printed on your terminal screen, typically something

liker `http://127.0.0.1:8888/tree/pygraphblas/demo`

and see the

following Notebooks:

# Tests

To run the tests checkout pygraphblas and use:

```
$ ./test.sh
```

# Summary

pygraphblas is a python extension that bridges The GraphBLAS

API with the Python

programming language. It uses the

CFFI library to wrap the low

level GraphBLAS API and provides high level Matrix and Vector Python

types that make GraphBLAS simple and easy.

GraphBLAS is a sparse linear algebra API optimized for processing

graphs encoded as sparse matrices and vectors. In addition to common

real/integer matrix algebra operations, GraphBLAS supports over a

thousand different Semiring

algebra operations, that can be used as basic building blocks to

implement a wide variety of graph algorithms. See

Applications

from Wikipedia for some specific examples.

pygraphblas leverages the expertise in the field of sparse matrix

programming by The GraphBLAS Forum and uses

the

SuiteSparse:GraphBLAS

API implementation. SuiteSparse:GraphBLAS is brought to us by the work

of Dr. Tim Davis,

professor in the Department of Computer Science and Engineering at

Texas A&M University. News and

information can provide

you with a lot more background information, in addition to the

references below.

While it is my goal to make it so that pygraphblas works with any

GraphBLAS implementation, it currently only works with SuiteSparse.

SuiteSparse is currently the only realistically usable GraphBLAS

implementation, and additionally it provides several "extension"

features and pre-packaged objects that are very useful for

pygraphblas. If there is a GraphBLAS implementation you would like to

see support for in pygraphblas, please consider creating an issue for

it for discussion and/or sending me a pull request.

# Introduction to Graphs and Matrices

GraphBLAS uses matrices and Linear Algebra to represent graphs, as

described in this mathmatical introduction to

GraphBLAS

by Dr. Jeremy Kepner head and founder

of MIT Lincoln Laboratory Supercomputing

Center.

There are two useful matrix representations of graphs: Adjacency

Matrices and

Incidence Matrices.

For this introduction we will focus on the adjacency type as they are

simpler, but the same ideas apply to both, both are suported by

GraphBLAS and pygraphblas, and it is easy to switch back and forth

between them.

On the left is a graph, and on the right, the adjacency matrix that

represents it. The matrix has a row and column for every node in the

graph. If there is an edge going from node A to B, then there will be

a value present in the intersection of As row with Bs column. How it

differs from many other matrix representations is that the matrix is

sparse, nothing is stored in computer memory where there are unused

elements.

Sparsity is important because one practical problem with

matrix-encoding graphs is that most real-world graphs tend to be

sparse, as above, only 7 of 36 possible elements have a value. Those

that have values tend to be scattered randomly across the matrix

(for "typical" graphs), so dense linear algebra libraries like BLAS or

numpy do not encode or operate on them efficiently, as the relevant

data is mostly empty memory with actual data elements spaced far

apart. This wastes memory and CPU resources, and defeats CPU caching

mechanisms.

For example, suppose a fictional social network has 1 billion users,

and each user has about 100 friends, which means there are about 100

billion (1e+11) connections in the graph. A dense matrix large enough

to hold this graph would need (1 billion)^2 or

(1,000,000,000,000,000,000), a "quintillion" elements, but only 1e+11

of them would have meaningful values, leaving only 0.0000001th of the

matrix being utilized.

By using a sparse matrix instead of dense, only the elements used are

actually stored in memory. The parts of the matrix with no value are

*interpreted*, but not necessarily stored, as an identity value, which

may or may not be the actual number zero, but possibly other values

like positive or negative infinity depending on the particular

semiring operations applied to the matrix.

Semirings encapsulate different algebraic operations and identities

that can be used to multiply matrices and vectors. Anyone who has

multiplied matrices has used at least one Semiring before, typically

referred to as "plus_times". This is the common operation of

multiplying two matrices containing real numbers, the corresponding row

and column entries are multipled and the results are summed for the

final value.