Template for Data Science Project
This repo aims to give a robust starting point to any Data Science related project.
It contains readymade tools setup to start adding dependencies and coding.
To get yourself familiar with tools used here watch my talk on Data Science project setup (in Russian)
If you use this repo as a template – leave a star please because template usages don’t count in Forks.
Experiments and technology discovery are usualy performed on Jupyter Notebooks. For them
notebooks directory is reserved. More info on working with Notebooks could be found in
More mature part of pipeline (functions, classes, etc) are stored in
.py files in main package directory (by default
What to change?
- project name (default:
- main project directory (
- test in
- line length (default:
90) Why 90?
How to setup an environment?
This template use
poetry to manage dependencies of your project. They
First you need to .
Then if you use
conda (recommended) to manage environments (to use regular virtualenvenv just skip this step):
poetrynot to create new virtualenv for you
poetrywill use currently activated virtualenv):
poetry config virtualenvs.create false
condaenvironment for your project (change env name for your desired one):
conda create -n ds_project python=3.9
conda activate ds_project
Now you are ready to add dependencies to your project. For this use
poetry add scikit-learn torch
poetry install to check your final state are even with configs.
After that add changes to git and commit them
git add pyproject.toml poetry.lock
pre-commit hooks to git:
At this step you are ready to write clean reproducible code!