A scikit-learn-compatible module for estimating prediction intervals

MAPIE

MAPIE allows you to easily estimate prediction intervals on single-output data using your favourite scikit-learn-compatible regressor.

Prediction intervals output by MAPIE encompass both aleatoric and epistemic uncertainty and are backed by strong theoretical guarantees [1].

Requirements

Python 3.7+

MAPIE stands on the shoulders of giant.

Its only internal dependency is scikit-learn.

Installation

Install via pip:

pip install mapie

To install directly from the github repository :

pip install git+https://github.com/simai-ml/MAPIE

Quickstart

Let us start with a basic regression problem. Here, we generate one-dimensional noisy data that we fit with a linear model.

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

regressor = LinearRegression()
X, y = make_regression(n_samples=500, n_features=1, noise=20, random_state=59)

Since MAPIE is compliant with the standard scikit-learn API, we follow the standard sequential fit and predict process like any scikit-learn regressor. We set two values for alpha to estimate prediction intervals at approximately one and two standard deviations from the mean.

from mapie.estimators import MapieRegressor
mapie = MapieRegressor(regressor)
mapie.fit(X, y)
y_preds = mapie.predict(X)

MAPIE returns a np.ndarray of shape (n_samples, 3, len(alpha)) giving the predictions, as well as the lower and upper bounds of the prediction intervals for the target quantile for each desired alpha value. The estimated prediction intervals can then be plotted as follows.

from matplotlib import pyplot as plt
from mapie.metrics import coverage_score
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(X, y, alpha=0.3)
plt.plot(X, y_preds[:, 0], color='C1')
order = np.argsort(X[:, 0])
plt.fill_between(X[order].ravel(), y_preds[:, 1][order], y_preds[:, 2][order], alpha=0.3)
plt.title(
    f"Target coverage = 0.9; Effective coverage = {coverage_score(y, y_preds[:, 1], y_preds[:, 2])}"
)
plt.show()

The title of the plot compares the target coverages with the effective coverages. The target coverage, or the confidence interval, is the fraction of true labels lying in the prediction intervals that we aim to obtain for a given dataset. It is given by the alpha parameter defined in MapieRegressor, here equal to 0.05 and 0.32, thus giving target coverages of 0.95 and 0.68. The effective coverage is the actual fraction of true labels lying in the prediction intervals.

quickstart_1