Statistical tests for the sequential locality of graphs

You can assess the statistical significance of the sequential locality of an adjacency matrix (graph + vertex sequence) using sequential_locality.py.

This file also includes ORGM.py that generates an instance of the ordered random graph model (ORGM) [1] and spectral.py that yields an optimized vertex sequence based on the spectral ordering algorithms.

Please find Ref. [1] for the details of the statistical tests.

sequential_locality.py

sequential_locality.py executes statistical tests with respect to the sequential locality.

Simple example

import numpy as np
import igraph
import sequential_locality as seq

s = seq.SequentialLocality(
		g = igraph.Graph.Erdos_Renyi(n=20,m=80), 
		sequence = np.arange(20)
		)
s.H1()

{'H1': 1.0375,
 'z1': 0.5123475382979811,
 'H1 p-value (ER/ORGM)': 0.6957960998835012,
 'H1 p-value (random)': 0.7438939644617626,
 'bandwidth_opt': None}

Please find Demo.ipynb for more examples.

SequentialLocality

This is a class to be instantiated to assess the sequential locality.

Input parameters

Either g or edgelist must be provided as an input.

Parameter Value Default Description
g graph None Graph (undirected, unweighted, no self-loops) in igraph or graph-tool.
edgelist list of tuples None Edgelist as a list of tuples.
sequence 1-dim array None Array (list or ndarray) indicating the vertex ordering. If provided, the vertex indices in the graph will be replaced based on sequence . If sequence is None, the intrinsic vertex indices in the graph or edgelist will be used as the sequence .
format 'igraph' or 'graph-tool' 'igraph' Input graph format
simple Boolean True If True, the graph is assumed to be a simple graph, otherwise the graph is assumed to be a multigraph.

H1

This is a method that returns H1 and z1 test statistics and p-values of the input data.

Input parameters

Parameter Value Default Description
random_sequence 'analytical' or 'empirical' 'analytical' If 'analytical' is selected, the p-value based on the normal approximation will be returned for the test of vertex sequence H1 p-value (random). If 'empirical' is selected, the p-value based on random sequences specified by samples will be returned.
n_samples Integer 10,000 Number of samples to be drawn as a set of random sequences. This is used only when random_sequence = 'empirical'.
in_envelope Boolean False If False, the p-value based on the ER model will be returned. If True, the p-value based on the ORGM will be returned. That is, the matrix elements outside of the bandwidth r will be ignored.
r Integer None An integer between 1 and N-1. If provided, r will be used as the bandwidth when in_envelope=True.

Output parameters

Parameter Description
H1 H1 test statistic of the input data (graph & vertex sequence)
z1 z1 test statistic of the input data
H1 p-value (ER/ORGM) p-value under the null hypothesis of the ER random graph (when in_envelope=False) or the ORGM (when in_envelope=True).
H1 p-value (random) p-value under the null hypothesis of random sequences
bandwidth_opt Maximum likelihood estimate (MLE) of the bandwidth (when r=None in the input) or the input bandwidth r

HG

This is a method that returns HG and zG test statistics and p-values of the input data.

  • There is no in_envelope option for the test based on HG.
  • random_sequence = 'analytical' can be computationally demanding.

Input parameters

Parameter Value Default Description
random_sequence 'analytical' or 'empirical' 'empirical' If 'analytical' is selected, the p-value based on the normal approximation will be returned for the test of vertex sequence H1 p-value (random). If 'empirical' is selected, the p-value based on random sequences specified by samples will be returned.
n_samples Integer 10,000 Number of samples to be drawn as a set of random sequences. This is used only when random_sequence = 'empirical'.

Output parameters

Parameter Description
HG HG test statistic of the input data (graph & vertex sequence)
zG zG test statistic of the input data
HG p-value (ER) p-value under the null hypothesis of the ER random graph.
HG p-value (random) p-value under the null hypothesis of random sequences

ORGM.py

ORGM.py is a random graph generator.
It generates an ORGM [1] instance that has a desired strength of sequentially lcoal structure.

Simple example

import ORGM as orgm

edgelist, valid = orgm.ORGM(
	N=20, M=80, bandwidth=10, epsilon=0.25
	)

Input parameters

Parameter Value Default Description
N Integer required input Number of vertices
M Integer required input Number of edges
bandwidth Integer required input Bandwidth of the ORGM
epsilon Float (in [0,1]) required input Density ratio between the adjacency matrix elements inside & outside of the envelope. When epsilon=1, the ORGM becomes a uniform model. When epsilon=0, the nonzero matrix elements are strictly confined in the envelope.
simple Boolean True If True, the graph is constrained to be simple. If False, the graph is allowed to have multiedges.

spectral.py

spectral.py is an implementation of the spectral ordering [2].

Simple example

import graph_tool.all as gt
import spectral

g_real = gt.collection.ns['karate/77']
inferred_sequence = spectral.spectral_sequence(
	g= g_real, 
	format='graph-tool'
	)
Parameter Value Default Description
g graph required input graph (undirected, unweighted, no self-loops) in igraph or graph-tool
normalized Boolean True Normalized Laplacian (True) vs unnormalized (combinatorial) Laplacian (False)
format 'igraph' or 'graph-tool' 'igraph' Input graph format

Citation

Please use Ref. [1] for the citation of the present code.

References

  • [1] Tatsuro Kawamoto and Teruyoshi Kobayashi, “Sequential locality of graphs and its hypothesis testing,” arXiv:2111.11267 (2021).
  • [2] Chris Ding and Xiaofeng He, “Linearized Cluster Assignment via Spectral Ordering,” Proceedings of the Twenty-First International Conference on Machine Learning (ICML) (2004).

GitHub

View Github