G-Research-Crypto-Competition

Project for passing the ML exam. Dataset took from the competition on the kaggle https://www.kaggle.com/c/g-research-crypto-forecasting

In this repository you can find an example of using SnakeMake to solve ML tasks.

The workflow automation system Snakemake is a tool for creating reproducible and scalable pipelines. Pipelines
are described using a human-readable Python-based language. They can be easily scaled for server, cluster, network and cloud environments
without having to change the workflow definition. Finally, Snakemake workflows can include a description of the necessary
software that will be automatically deployed in any runtime environment.

Getting Started:

  • Create virtual environment for development:
$ conda env create -f devenv.yaml
  • Activate virtual environment:
$ conda activate G-Research-Crypto-Competition
  • Start snakemake pipelines with 8 cores:
$ snakemake --cores 8

Project Organization

├── README.rst         <- The top-level readme for developers.
│
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- Technical documentation.
│
├── models             <- Trained and serialized models, model predictions,
│                         or model summaries.
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number
│                         (for ordering), the creator's initials, and a
│                         short `-` delimited description, e.g.
│                         `001-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other
│                         explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in
│                         reporting.
│
├── devenv.yaml        <- The environment file for reproducing the analysis
│                         environment, e.g. generated with
│                         `conda env export --from-history > devenv.yaml`
│
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python package.
│   │
│   ├── data           <- Scripts to download or generate data.
│   │
│   ├── features       <- Scripts to turn raw data into features for
│   │                     modeling.
│   │
│   ├── models         <- Scripts to train models and then use trained
│   │                     models to make predictions.
│   │
│   └── reports        <- Scripts to create exploratory reports and results
│                         oriented visualizations.
│
├── workflow           <- Snakemake workflow storage.
│   ├── envs           <- Conda environments for snakemake rules.
│   │   └── default.yaml
│   │
│   ├── rules          <- Rules as modules.
│   │   └── clean.smk
│   │
│   ├── scripts        <- Support functions for using in snakemake workflow.
│   │
│   ├── config.yaml    <- Parameters for workflow in yaml format.
│   │
│   └── Snakefile      <- Entrypoint of the workflow (it will be
│                         automatically discovered when running snakemake
│                         from the root of above structure).
│
└── .env.example       <- Example of file for environment variables, like
                          MinIO access or Postgresql credentials. It is
                          necessary to create an `.env` file based on it.

GitHub

View Github