lazy

Python library for rapidly developing lazy interfaces. This is currently a prototype built for playing with the paradigm.

By deferring the execution of your code until the last possible moment (when you actually request the data with .get())
you can optimize its execution while preserving simple imperative semantics.

Optimizations include things like

  • Minimal execution by tracing dependencies and only execution operations needed to produce the data
  • Automatic output caching and invalidation
  • Automatic parallelization of the induced dataflow graph

How it works

This library works by modifying annotated functions to record when they were called and their inputs and outputs.
Once .get() is invoked on an output a minimal dataflow graph is generated by inspecting
all of its dependencies (including cached outputs). This dataflow graph can optionally be automatically parallelized.

A key requirement of this library is that all annotated functions be stateless and synchronous.

See the execution example at the bottom for details, or try it out yourself!

Usage

Decorate stateless and synchronous functions with @lazy.synchronous

import lazy

@lazy.synchronous
def Square(x):
    time.sleep(0.1)
    return x ** 2

@lazy.synchronous
def Mul(x, y):
    time.sleep(0.1)
    return x * y

@lazy.synchronous
def Add(x, y):
    time.sleep(0.1)
    return x + y

Write your program and access the output of annotated functions with .get()

a = Square(2)
b = Square(3)
c = Mul(a, b)
d = Add(a, b)

t = time.time()
# The code isn't run until you call .get()
print(c.get())
print(time.time() - t)

t = time.time()
print(d.get())
print(time.time() - t)

Run things in parallel automatically with lazy.parallelize = True

lazy.parallelize = True

a = Square(2)
b = Square(3)
c = Mul(a, b)

t = time.time()
print(c.get())
print(time.time() - t)
# Should only take 0.2s instead of 0.3s by automatic parallelism

Asynchronous execution can be made synchronous with locking primitives.
Functions annotated with @lazy.asynchronous are fed an extra input t
of type Task which has a spin primitive. See below:

@lazy.asynchronous
def Recv(t, ptr):
    # Around 10 spins before we break
    for _ in t.spin():
        r = random.randint(0,10)
        if r == 7:
            break
    return ptr # pretend we actually receieved something from network

ptr = 0x123123
d = Recv(ptr)
print(d.get()) # 7

The idea here is that spin will periodically run the body of the loop until it is broken.
The rate at which spin loops is determined by the runtime.
After a couple of iterations of the same function,
we can actually track how many spins it typically takes for the lock condition to be met and further optimize the rate at which spins happen.
As an example, if it takes on average 100ms for the network to respond we can make the first spin take exactly 100ms and speed up all subsequent spins.
This frees up cycles to work on other tasks in parallel.

TODO

  • Support functions that operate in-place and have multiple outputs
  • Support maximal trace length (to automatically force calls to get())

Execution example

Below was generated with calls to lazy.draw().

Before calling c.get() in the above example we can see that only the input data is valid

After calling c.get() we can see that only Mul was invoked (and not Add)

Once we call d.get() Add is executed using the cached intermediate values calculated when we called c.get()

Other small things

  • data.dump_cf() to get the calculated controlflow graph (networkx format) of data (i.e. what needs to be executed to generate it)
  • data.executor = func to set a sepecific executor for the node. The executor must be of the form func(data : Data) -> None
  • lazy.dump() to get the full known dataflow graph (networkx format)
  • lazy.draw() to draw the full known dataflow graph (with colors as in the above example)

If you really want to play with this I'd recommend attacking most ideas with networkx.
As an example: to get a subgraph of all the data dependencies of d you can simply do

subgraph = nx.subgraph(nx.ancestors(lazy.dump(), d))

GitHub