Demo for Hydra

About Hydra

Hydra is a Python tool to manage complex configurations in your data science projects.

How to Run the Project

  1. Clone this repository:
git clone https://github.com/khuyentran1401/hydra_demo.git
  1. Install Poetry
  2. Set up the environment:
make setup

Introduction to Hydra

Folders

Folders shown in the video:

Short Summary

Imagine your YAML configuration file looks like this:

process:
  keep_columns:
      - Income
      - Recency
      - NumWebVisitsMonth
      - Complain
      - age
      - total_purchases
      - enrollment_years
      - family_size

  remove_outliers_threshold:
    age: 90
    Income: 600000

To access the list under process.keep_columns in the configuration file, simple add the @hydra.main decorator to the function that uses the configuration:

import hydra
from omegaconf import DictConfig, OmegaConf


@hydra.main(config_path="../config", config_name="main")
def process_data(config: DictConfig):

    print(config.process.keep_columns)

process_data()

Output:

['Income', 'Recency', 'NumWebVisitsMonth', 'Complain', 'age', 'total_purchases', 'enrollment_years', 'family_size']

Group Configuration Files

TODO

GitHub

View Github