Python library for rapidly developing lazy interfaces. This is currently a prototype built for playing with the paradigm.
By deferring the execution of your code until the last possible moment (when you actually request the data with
you can optimize its execution while preserving simple imperative semantics.
Optimizations include things like
- Minimal execution by tracing dependencies and only execution operations needed to produce the data
- Automatic output caching and invalidation
- Automatic parallelization of the induced dataflow graph
How it works
This library works by modifying annotated functions to record when they were called and their inputs and outputs.
.get() is invoked on an output a minimal dataflow graph is generated by inspecting
all of its dependencies (including cached outputs). This dataflow graph can optionally be automatically parallelized.
A key requirement of this library is that all annotated functions be stateless and synchronous.
See the execution example at the bottom for details, or try it out yourself!
Decorate stateless and synchronous functions with
import lazy @lazy.synchronous def Square(x): time.sleep(0.1) return x ** 2 @lazy.synchronous def Mul(x, y): time.sleep(0.1) return x * y @lazy.synchronous def Add(x, y): time.sleep(0.1) return x + y
Write your program and access the output of annotated functions with
a = Square(2) b = Square(3) c = Mul(a, b) d = Add(a, b) t = time.time() # The code isn't run until you call .get() print(c.get()) print(time.time() - t) t = time.time() print(d.get()) print(time.time() - t)
Run things in parallel automatically with
lazy.parallelize = True
lazy.parallelize = True a = Square(2) b = Square(3) c = Mul(a, b) t = time.time() print(c.get()) print(time.time() - t) # Should only take 0.2s instead of 0.3s by automatic parallelism
Asynchronous execution can be made synchronous with locking primitives.
Functions annotated with
@lazy.asynchronous are fed an extra input
Task which has a
spin primitive. See below:
@lazy.asynchronous def Recv(t, ptr): # Around 10 spins before we break for _ in t.spin(): r = random.randint(0,10) if r == 7: break return ptr # pretend we actually receieved something from network ptr = 0x123123 d = Recv(ptr) print(d.get()) # 7
The idea here is that
spin will periodically run the body of the loop until it is broken.
The rate at which
spin loops is determined by the runtime.
After a couple of iterations of the same function,
we can actually track how many spins it typically takes for the lock condition to be met and further optimize the rate at which spins happen.
As an example, if it takes on average 100ms for the network to respond we can make the first spin take exactly 100ms and speed up all subsequent spins.
This frees up cycles to work on other tasks in parallel.
- Support functions that operate in-place and have multiple outputs
- Support maximal trace length (to automatically force calls to
Below was generated with calls to
c.get() in the above example we can see that only the input data is valid
c.get() we can see that only
Mul was invoked (and not
Once we call
Add is executed using the cached intermediate values calculated when we called
Other small things
data.dump_cf()to get the calculated controlflow graph (networkx format) of data (i.e. what needs to be executed to generate it)
data.executor = functo set a sepecific executor for the node. The executor must be of the form
func(data : Data) -> None
lazy.dump()to get the full known dataflow graph (networkx format)
lazy.draw()to draw the full known dataflow graph (with colors as in the above example)
If you really want to play with this I'd recommend attacking most ideas with networkx.
As an example: to get a subgraph of all the data dependencies of
d you can simply do
subgraph = nx.subgraph(nx.ancestors(lazy.dump(), d))