Gradient-Free-Optimizers

Simple and reliable optimization with local, global, population-based and sequential techniques in numerical search spaces.

Gradient-Free-Optimizers provides a collection of easy to use optimization techniques,
whose objective function only requires an arbitrary score that gets maximized.
This makes gradient-free methods capable of solving various optimization problems, including:

  • Optimizing arbitrary mathematical functions.
  • Fitting multiple gauss-distributions to data.
  • Hyperparameter-optimization of machine-learning methods.

Gradient-Free-Optimizers is the optimization backend of Hyperactive (in v3.0.0 and higher) but it can also be used by itself as a leaner and simpler optimization toolkit.

Main features

  • Easy to use:

    Simple API-design

    You can optimize anything that can be defined in a python function. For example a simple parabola function:

    def objective_function(para):
        score = para["x1"] * para["x1"]
        return -score
    

    Define where to search via numpy ranges:

    search_space = {
        "x": np.arange(0, 5, 0.1),
    }
    

    That`s all the information the algorithm needs to search for the maximum in the objective function:

    from gradient_free_optimizers import RandomSearchOptimizer
    
    opt = RandomSearchOptimizer(search_space)
    opt.search(objective_function, n_iter=100000)
    
    Receive prepared information about ongoing and finished optimization runs

    During the optimization you will receive ongoing information in a progress bar:

    • current best score
    • the position in the search space of the current best score
    • the iteration when the current best score was found
    • other information about the progress native to tqdm
  • High performance:

    Modern optimization techniques

    Gradient-Free-Optimizers provides not just meta-heuristic optimization methods but also sequential model based optimizers like bayesian optimization, which delivers good results for expensive objetive functions like deep-learning models.

    Lightweight backend

    Even for the very simple parabola function the optimization time is about 60% of the entire iteration time when optimizing with random search. This shows, that (despite all its features) Gradient-Free-Optimizers has an efficient optimization backend without any unnecessary slowdown.

    Save time with memory dictionary

    Per default Gradient-Free-Optimizers will look for the current position in a memory dictionary before evaluating the objective function.

    • If the position is not in the dictionary the objective function will be evaluated and the position and score is saved in the dictionary.

    • If a position is already saved in the dictionary Gradient-Free-Optimizers will just extract the score from it instead of evaluating the objective function. This avoids reevaluating computationally expensive objective functions (machine- or deep-learning) and therefore saves time.

  • High reliability:

    Extensive testing

    Gradient-Free-Optimizers is extensivly tested with more than 400 tests in 2500 lines of test code. This includes the testing of:

    • Each optimization algorithm
    • Each optimization parameter
    • All attributes that are part of the public api
    Performance test for each optimizer

    Each optimization algorithm must perform above a certain threshold (for selected objetive functions) to be included. Poorly performing algorithms are reworked or scraped.


Optimization strategies:

Gradient-Free-Optimizers supports a variety of optimization algorithms, which can make choosing the right algorithm a tedious endeavor. The gifs in this section give a visual representation how the different optimization algorithms explore the search space and exploit the collected information about the search space for a convex and non-convex objective function.

Optimization Strategy Convex Function Non-convex Function
Hill Climbing

Evaluates the score of n neighbours in an epsilon environment and moves to the best one.
Repulsing Hill Climbing

Hill climbing iteration + increases epsilon by a factor if no better neighbour was found.
Simulated Annealing

Hill climbing iteration + accepts moving to worse positions with decreasing probability over time (transition probability).
Random Search

Moves to random positions in each iteration.
Random Restart Hill Climbing

Hill climbing + moves to a random position after n iterations.
Random Annealing

Hill Climbing + large epsilon that decreases over time.
Parallel Tempering

Population of n simulated annealers, which occasionally swap transition probabilities.
Particle Swarm Optimization

Population of n particles attracting each other and moving towards the best particle.
Evolution Strategy

Population of n hill climbers occasionally mixing positional information.
Bayesian Optimization

Gaussian process fitting to explored positions and predicting promising new positions.
Tree of Parzen Estimators

Kernel density estimators fitting to good and bad explored positions and predicting promising new positions.
Decision Tree Optimizer

Ensemble of decision trees fitting to explored positions and predicting promising new positions.

Installation

PyPI version

The most recent version of Gradient-Free-Optimizers is available on PyPi:

pip install gradient-free-optimizers

Examples

Convex function
import numpy as np
from gradient_free_optimizers import RandomSearchOptimizer


def parabola_function(para):
    loss = para["x"] * para["x"]
    return -loss


search_space = {"x": np.arange(-10, 10, 0.1)}

opt = RandomSearchOptimizer(search_space)
opt.search(parabola_function, n_iter=100000)
Non-convex function
import numpy as np
from gradient_free_optimizers import RandomSearchOptimizer


def ackley_function(pos_new):
    x = pos_new["x1"]
    y = pos_new["x2"]

    a1 = -20 * np.exp(-0.2 * np.sqrt(0.5 * (x * x + y * y)))
    a2 = -np.exp(0.5 * (np.cos(2 * np.pi * x) + np.cos(2 * np.pi * y)))
    score = a1 + a2 + 20
    return -score


search_space = {
    "x1": np.arange(-100, 101, 0.1),
    "x2": np.arange(-100, 101, 0.1),
}

opt = RandomSearchOptimizer(search_space)
opt.search(ackley_function, n_iter=30000)
Machine learning example
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_wine

from gradient_free_optimizers import HillClimbingOptimizer


data = load_wine()
X, y = data.data, data.target


def model(para):
    gbc = GradientBoostingClassifier(
        n_estimators=para["n_estimators"],
        max_depth=para["max_depth"],
        min_samples_split=para["min_samples_split"],
        min_samples_leaf=para["min_samples_leaf"],
    )
    scores = cross_val_score(gbc, X, y, cv=3)

    return scores.mean()


search_space = {
    "n_estimators": np.arange(20, 120, 1),
    "max_depth": np.arange(2, 12, 1),
    "min_samples_split": np.arange(2, 12, 1),
    "min_samples_leaf": np.arange(1, 12, 1),
}

opt = HillClimbingOptimizer(search_space)
opt.search(model, n_iter=50)

Basic API reference

General optimization arguments

The following arguments can be passed to each optimization class:

  • search_space

    • Pass the search_space to the optimizer class to define the space were the optimization algorithm can search for the best parameters for the given objective function.

      example:

      ...
      
      search_space = {
        "x1": numpy.arange(-10, 31, 0.3),
        "x2": numpy.arange(-10, 31, 0.3),
      }
      
      opt = HillClimbingOptimizer(search_space)
      
      ...
      
  • initialize={"grid": 8, "vertices": 8, "random": 4, "warm_start": []}

    • (dict, None)

    • The initialization dictionary automatically determines a number of parameters that will be evaluated in the first n iterations (n is the sum of the values in initialize). The initialize keywords are the following:

      • grid

        • Initializes positions in a grid like pattern. Positions that cannot be put into a grid are randomly positioned.
      • vertices

        • Initializes positions at the vertices of the search space. Positions that cannot be put into a vertices are randomly positioned.
      • random

        • Number of random initialized positions
      • warm_start

        • List of parameter dictionaries that marks additional start points for the optimization run.

Optimizer Classes

Each optimization class needs the "search_space" as an input argument. Optionally "initialize" and optimizer-specific parameters can be passed as well. You can read more about each optimization-strategy and its parameters in the Optimization Tutorial.

  • HillClimbingOptimizer
  • RepulsingHillClimbingOptimizer
  • SimulatedAnnealingOptimizer
  • RandomSearchOptimizer
  • RandomRestartHillClimbingOptimizer
  • RandomAnnealingOptimizer
  • ParallelTemperingOptimizer
  • ParticleSwarmOptimizer
  • EvolutionStrategyOptimizer
  • BayesianOptimizer
  • TreeStructuredParzenEstimators
  • DecisionTreeOptimizer

.search(...)
  • objective_function

    • (callable)

    • The objective function defines the optimization problem. The optimization algorithm will try to maximize the numerical value that is returned by the objective function by trying out different parameters from the search space.

      example:

      def objective_function(para):
          score = -(para["x1"] * para["x1"] + para["x2"] * para["x2"])
          return score
      
  • n_iter

    • (int)

    • The number of iterations that will be performed during the optimiation run. The entire iteration consists of the optimization-step, which decides the next parameter that will be evaluated and the evaluation-step, which will run the objective function with the chosen parameter and return the score.

  • max_time=None

    • (float, None)
    • Maximum number of seconds until the optimization stops. The time will be checked after each completed iteration.
  • max_score=None

    • (float, None)
    • Maximum score until the optimization stops. The score will be checked after each completed iteration.
  • memory=True

    • (bool)
    • Whether or not to use the "memory"-feature. The memory is a dictionary, which gets filled with parameters and scores during the optimization run. If the optimizer encounters a parameter that is already in the dictionary it just extracts the score instead of reevaluating the objective function (which can take a long time).
  • memory_warm_start=None

    • (pandas dataframe, None)

    • Pandas dataframe that contains score and paramter information that will be automatically loaded into the memory-dictionary.

      example:

      score x1 x2 x...
      0.756 0.1 0.2 ...
      0.823 0.3 0.1 ...
      ... ... ... ...
      ... ... ... ...
  • verbosity=[
    "progress_bar",
    "print_results",
    "print_times"
    ]

    • (list, False)
    • The verbosity list determines what part of the optimization information will be printed in the command line.
  • random_state=None

    • (int, None)
    • Random state for random processes in the random, numpy and scipy module.

Results from attributes
  • .results

    • Dataframe, that contains information about the score, the value of each parameter and the evaluation and iteration time. Each row shows the information of one optimization iteration.

      example:

      score x1 x2 x... eval_time iter_time
      0.756 0.1 0.2 ... 0.000016 0.0.000034
      0.823 0.3 0.1 ... 0.000017 0.0.000032
      ... ... ... ... ... ...
      ... ... ... ... ... ...
  • .best_score

    • numerical value of the best score, that was found during the optimization run.
  • .best_para

    • parameter dictionary of the best score, that was found during the optimization run.

      example:

      {
        'x1': 0.2, 
        'x2': 0.3,
      }
      

Roadmap

v1.0.0
  • [ ] Final API
  • [ ] add Downhill-simplex-algorithm to optimizers
  • [ ] improve access to parameters of optimizers within population-based-optimizers (e.g. annealing rate of simulated annealing population in parallel tempering)

Gradient Free Optimizers <=> Hyperactive

Gradient-Free-Optimizers was created as the optimization backend of the Hyperactive package. Therefore the algorithms are exactly the same in both packages and deliver the same results.
However you can still use Gradient-Free-Optimizers as a standalone package.
The separation of Gradient-Free-Optimizers from Hyperactive enables multiple advantages:

  • Even easier to use than Hyperactive
  • Other developers can easily use GFOs as an optimizaton backend if desired
  • Separate and more thorough testing
  • Better isolation from the complex information flow in Hyperactive. GFOs only uses positions and scores in a N-dimensional search-space. It returns only the new position after each iteration.
  • a smaller and cleaner code base, if you want to explore my implementation of these optimization techniques.
Gradient-Free-Optimizers Hyperactive
Search space composition only numerical any python object
Multiprocessing not supported yes, via joblib or multiprocessing
Distributed computing not supported yes, via data sharing at runtime

Citation

@Misc{gfo2020,
  author =   {{Simon Blanke}},
  title =    {{Gradient-Free-Optimizers}: Simple and reliable optimization with local, global, population-based and sequential techniques in numerical search spaces.},
  howpublished = {\url{https://github.com/SimonBlanke}},
  year = {since 2020}
}

GitHub