Earth observation framework for scaled-up processing in Python.
Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms. In the EO domain most problems come with an additional challenge: How do we apply the solution on a larger scale?
Working with EO data is made easy by the
eo-learn package, while the
eo-grow package takes care of running the solutions at a large scale. In
EOWorkflow based solution is wrapped in a pipeline object, which takes care of parametrization, logging, storage, multi-processing, EOPatch management and more. However, pipelines are not necessarily bound to
EOWorkflow execution and can be used for other tasks such as training ML models.
- Direct use of
- Parametrizing workflows by using validated configuration files, making executions easy to reproduce and adjust.
- Easy use of both local and AWS S3 storage with no required code adaptation.
- Splitting large areas of interest into grids and defining collections of EOPatches.
- Workflows can be run either single-process, multi-process, or even on multiple machines (by using
- Execution reports and customizable logging.
- Options for skipping already processed data when re-running a pipeline.
- Offers a CLI interface for running pipelines, validating configuration files, and generating templates.
- A collection of basic pipelines, with methods that can be overridden to tailor to a large amount of use-cases.
General Structure Overview
The core object of
eo-grow is the
Pipeline. Each pipeline has a
run_procedure method, which is executed after the pipeline is set up. By default, the
run_procedure executes an
EOWorkflow which is built by the (user-defined)
Each pipeline is linked to so called managers:
StorageManagerhandles loading and saving of files,
AreaManagerdefines the area of interest and how it should be split into EOPatches,
EOPatchManagertakes care of listing eopatches and handling their storage details,
LoggingManagerprovides control over logging.
Managers and pipelines usually require a large amount of parameters (setting storage paths, configuring log parameters, etc.), which are provided in
.json configuration files. Each
eo-grow object contains a special
Schema class, which is a
pydantic model describing the parameters of the object. Config files are then validated before execution to catch issues early. Templates for config files can be generated with the
eogrow-template CLI command.
To make config files easier to write
eo-grow uses a simple config language that supports importing other configs, variables, and more.
eo-learn 1.0.0 release.
eo-grow package requires Python version
>= 3.8 and can be installed with
pip install eo-grow
Command Line Interface
Running pipelines is easiest by using the CLI provided by
eo-grow. For all options use the
--help flag with each command.
eogrow <config>executes the pipeline defined in the
eogrow-validate <config>only performs validation of the
eogrow-test <config>initializes the pipeline/object but does not run it. Useful for testing if managers are set correctly or for generating area-split grids.
eogrow-ray <cluster> <config>executes the pipeline defined in
<config>on the active Ray cluster defined by the
eogrow-template <import path> <template>generates a template config for the object specified by the
<import path>and saves it to the
<template>file (or outputs it directly if
<template>is not provided).
Explanatory examples can be found here.
More details on the config language used by
eo-grow can be found here.
Questions and Issues