/ Data Visualization

A tool for creating data science pipelines

A tool for creating data science pipelines

Orchest

Orchest is a web based data science tool that works on top of your filesystem allowing you to use your editor of choice. With Orchest you get to focus on visually building and iterating on your pipeline ideas. Under the hood Orchest runs a collection of containers to provide a scalable platform that can run on your laptop as well as on a large scale cloud cluster.

Orchest lets you

  • Interactively build data science pipelines through its visual interface.
  • Automatically run your pipelines in parallel.
  • Develop your code in your favorite editor. Everything is filesystem based.
  • Tag the notebooks cells you want to skip when running a pipeline. Perfect for prototyping as you
    do not have to maintain a perfectly clean notebook.
  • Run experiments by parametrizing your pipeline. Easily try out all of your modeling ideas.

Installation

Requirements

  • Docker (tested on 19.03.9)

Linux/macOS/Windows(through WSL 2)

git clone https://github.com/orchest/orchest.git
cd orchest
./orchest.sh start

Note! on Windows Docker should be configured to use WSL 2. Make sure you clone inside the
Linux environment. More info about Docker + WSL 2 can be found here:
https://docs.docker.com/docker-for-windows/wsl/.

Quickstart

Please refer to our docs for a more comprehensive
quickstart tutorial.

Build your pipeline.

Each pipeline step executes a file (.ipynb, .py, .R, .sh) in a containerized environment.

clip-1-cropped

Write your code.

Iteratively edit and run your code for each pipeline step with an interactive JupyterLab session.

clip-2-cropped

Run your pipeline and see the results come in.

Outputs (both stdout and stderr) are directly viewable and stored on disk.

clip-3-cropped

GitHub

Comments