A Python library for high performance ensemble learning

ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framework to build memory efficient, maximally parallelized ensemble networks in as few lines of codes as possible.

ML-Ensemble is thread safe as long as base learners are and can fall back on memory mapped multiprocessing for memory-neutral process-based concurrency. For tutorials and full documentation, visit the project website.

Ensembles as computational graphs

An ensemble is built on top of a computational graph, allowing users great design freedom. Ensembles can be built with recursion, dynamic evaluation (e.g. if-else) and much more. A high-level API wraps common ensemble architectures into Scikit-learn estimators.

Example computational graph of a layer in an ensemble

Memory efficient parallelized learning

ML-Ensemble is optimized for speed and minimal memory consumption. No serialization of data takes place, regardless of whether multithreading or multiprocessing is used. Additionally, multithreading is pickle-free.

Ease of use

Ready-made ensembles are built by adding layers to an instance. No matter how complex the ensemble, to train it call the fit method:

ensemble = Subsemble()

# First layer
ensemble.add(list_of_estimators)

# Second layer
ensemble.add(list_of_estimators)

# Final meta estimator
ensemble.add_meta(estimator)

# Train ensemble
ensemble.fit(X, y)

Similarly, it's straightforward to modify an existing ensemble:

# Remove layer
ensemble.remove(2)

# Change layer parameters
ensemble.replace(0, new_list_of_estimators)

And to create differentiated preprocessing pipelines for different subsets of estimators within a given layer, simple pass a mapping to the add method:

preprocessing = {'pipeline-1': list_of_transformers_1,
                 'pipeline-2': list_of_transformers_2}

estimators = {'pipeline-1': list_of_estimators_1,
              'pipeline-2': list_of_estimators_2}

ensemble.add(estimators, preprocessing)

Dedicated diagnostics

ML Ensemble implements a dedicated diagnostics and model selection suite for intuitive and speedy ensemble evaluation. The Evaluator allows you to evaluate several preprocessing pipelines and several estimators in one go, giving you birds-eye view of how different candidates for the ensemble perform.

Moreover, entire ensembles can be used as a preprocessing pipeline, to leverage model selection for higher-level layers. Simply set model_selection to True in the ensemble (don't forget to turn it off when done).

preprocessing_dict = {'pipeline-1': list_of_transformers_1,
                      'pipeline-2': list_of_transformers_2}


evaluator = Evaluator(scorer=score_func)
evaluator.fit(X, y, list_of_estimators, param_dists_dict, preprocessing_dict)

All ensembles and model selection instances provides summary statistics in tabular format. You can find fit and prediction times in any ensemble by calling data, and cv-scores if you passed a scorer. For the model selection suite, the results attribute gives you the outcome of an evaluation::

              fit_time_mean  fit_time_std  train_score_mean  train_score_std  test_score_mean  test_score_std               params
prep-1 est-1       0.001353      0.001316          0.957037         0.005543         0.960000        0.032660                   {}
       est-2       0.000447      0.000012          0.980000         0.004743         0.966667        0.033333  {'n_neighbors': 15}
prep-2 est-1       0.001000      0.000603          0.957037         0.005543         0.960000        0.032660                   {}
       est-2       0.000448      0.000036          0.965185         0.003395         0.960000        0.044222   {'n_neighbors': 8}
prep-3 est-1       0.000735      0.000248          0.791111         0.019821         0.780000        0.133500                   {}
       est-2       0.000462      0.000143          0.837037         0.014815         0.800000        0.126491   {'n_neighbors': 9}

Install

PyPI

ML-Ensemble is available on PyPI. Install with

pip install mlens

Bleeding edge

Fork the GitHub repository:

git clone https://github.com/flennerhag/mlens.git; cd mlens;
python setup.py install

Citation

For scientific publication, ML-Ensemble can be cited as

@misc{flennerhag:2017mlens,
  author = {Flennerhag, Sebastian},
  title  = {ML-Ensemble},
  month  = nov,
  year   = 2017,
  doi    = {10.5281/zenodo.1042144},
  url    = {https://dx.doi.org/10.5281/zenodo.1042144}
}

GitHub

GitHub - flennerhag/mlens: ML-Ensemble – high performance ensemble learning
ML-Ensemble – high performance ensemble learning. Contribute to flennerhag/mlens development by creating an account on GitHub.