lazycluster is a Python library intended to liberate data scientists and machine learning engineers by abstracting away cluster management and configuration so that they are able to focus on their actual tasks. Especially, the easy and convenient cluster setup with Python for various distributed machine learning frameworks is emphasized.
- High-Level API for starting clusters:
- More lazyclusters (e.g. Ray, PyTorch, Tensorflow, Horovod, Spark) to come ...
- Lower-level API for:
- Managing Runtimes or RuntimeGroups to:
- A-/synchronously execute RuntimeTasks by leveraging the power of ssh
- Expose services (e.g. a DB) from or to a
Runtimeor in a whole
- Command line interface (CLI)
- List all available
- Add a
- Delete a
Concept Definition: Runtime
Runtimeis the logical representation of a remote host. Typically, the host is another server or a virtual machine / container on another server. This python class provides several methods for utilizing remote resources such as the port exposure from / to a
Runtimeas well as the execution of RuntimeTasks. A
Runtimehas a working directory. Usually, the execution of a
RuntimeTaskis conducted relatively to this directory if no other path is explicitly given. The working directory can be manually set during the initialization. Otherwise, a temporary directory gets created that might eventually be removed.
Concept Definition: RuntimeGroup
RuntimeGroupis the representation of logically related
Runtimesand provides convenient methods for managing those related
Runtimes. Most methods are wrappers around their counterparts in the
Runtimeclass. Typical usage examples are exposing a port (i.e. a service such as a DB) in the
RuntimeGroup, transfer files, or execute a
Runtimes. Additionally, all concrete RuntimeCluster (e.g. the HyperoptCluster) implementations rely on
Concept Definition: Manager
managerrefers to the host where you are actually using the lazycluster library, since all desired lazycluster entities are managed from here. Caution: It is not to be confused with the RuntimeManager class.
Concept Definition: RuntimeTask
RuntimeTaskis a composition of multiple elemantary task steps, namely
run function(python). A
RuntimeTaskcan be executed on a remote host either by handing it over to a
Runtimeobject or standalone by handing over a fabric Connection object to the execute method of the
RuntimeTask. Consequently, all invididual task steps are executed sequentially. Moreover, a
RuntimeTaskobject captures the output (stdout/stderr) of the remote execution in its execution log. An example for a
RuntimeTaskcould be to send a csv file to a
Runtime, execute a python function that is transforming the csv file and finally get the file back.
pip install lazycluster
# Most up-to-date development version pip install --upgrade git+https://github.com/ml-tooling/[email protected]
For lazycluster usage on the manager:
Unix based OS
Python >= 3.6
ssh client (e.g. openssh-client)
Passwordless ssh access to the
Runtime hosts (recommended)
Configure passwordless ssh access (click to expand...)
- Create a key pair on the manager as described here or use an existing one
- Install lazycluster on the manager
- Create the ssh configuration for each host to be used as Runtime by using the lazycluster CLI command
lazycluster add-runtimeas described here and do not forget to specify the
- Finally, enable the passwordless ssh access by copying the public key to each Runtime as descibed here
Runtime host requirements:
- Unix based OS
- Python >= 3.6
- ssh server (e.g. openssh-server)
Passwordless ssh needs to be setup for the hosts to be used as Runtimes for the most convenient user experience. Otherwise, you need to pass the connection details to Runtime.__init__ via connection_kwargs. These parameters will be passed on to the fabric.Connection.
Usage example high-level API
Start a Dask cluster.
from lazycluster import RuntimeManager from lazycluster.cluster.dask_cluster import DaskCluster # Automatically generate a group based on the ssh configuration runtime_manager = RuntimeManager() runtime_group = runtime_manager.create_group() # Start the Dask cluster instances using the RuntimeGroup dask_cluster = DaskCluster(runtime_group) dask_cluster.start() # => Now, you can start using the running Dask cluster # Get Dask client to interact with the cluster # Note: This will give you a dask.distributed.Client which is not # a lazycluster cluster but a Dask one instead client = cluster.get_client()
Usage example lower-level API
Execute a Python function on a remote host and access the return data.
from lazycluster import RuntimeTask, Runtime # Define a Python function which will be executed remotely def hello(name:str): return 'Hello ' + name + '!' # Compose a `RuntimeTask` task = RuntimeTask('my-first_task').run_command('echo Hello World!') \ .run_function(hello, name='World') # Actually execute it remotely in a `Runtime` task = Runtime('host-1').execute_task(task, execute_async=False) # The stdout from from the executing `Runtime` can be accessed # via the execution log of the `RuntimeTask` task.print_log() # Print the return of the `hello()` call generator = task.function_returns print(next(generator))