In depth: Gradients

This tutorial shows how to supply gradient information about an objective to simplenlopt in SciPy or NLopt style. One example for modern automatic differentiation via the external package autograd is also included. The studied optimization problem is again the Rosenbrock function. Its objective and partial derivatives are given by

\[\begin{split}\begin{align} f(x, y) & =(1-x)^2+100(y-x^2)^2\\ \frac{\partial f}{\partial x} &=-2(1-x)-400x(y-x^2) \\ \frac{\partial f}{\partial y} &=200(y-x^2) \end{align}\end{split}\]

jac=callable

The easiest case which is also shown in the quickstart example. Objective and gradient are supplied as two individual functions:

[1]:
import numpy as np
import simplenlopt
from time import time

def rosenbrock(pos):

    x, y = pos
    return (1-x)**2 + 100 * (y - x**2)**2

def rosenbrock_grad(pos):

    x, y = pos
    dx = 2 * x -2 - 400 * x * (y-x**2)
    dy = 200 * (y-x**2)

    return np.array([dx, dy])

x0=np.array([-1.5, 2.25])
%timeit -n 1000 res = simplenlopt.minimize(rosenbrock, x0, jac = rosenbrock_grad)
2.33 ms ± 51.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

jac = True

Taking another look at the objective and its partial derivaties you can see that the expression in the brackets appear in both the objective and the partial derivatives. If both are calculated in individual functions, these terms are unnecessarily recomputed. This can be avoided by supplying both the objective and its gradient in one function. That the objective also contains the gradient information, is indicated by setting jac=True. Let’s see how this works and how it affects the performance.

[2]:
def rosenbrock_incl_grad(pos):

    x, y = pos
    first_bracket = 1-x
    second_bracket = y-x*x

    obj = first_bracket*first_bracket+100*second_bracket*second_bracket
    dx = -2*first_bracket-400*x*second_bracket
    dy = 200 * second_bracket

    return obj, np.array([dx, dy])

%timeit -n 1000 res = simplenlopt.minimize(rosenbrock_incl_grad, x0, jac = True)
2.25 ms ± 44.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

We see a small performance improvement. For more complicated objective functions, the performance gains can be massive though it repeated computations can be avoided.

jac=’nlopt’

This flag is mostly for former NLopt users. It indicates that the objective and its gradient are supplied in vanilla NLopt style. NLopt requires another signature for the objective: f(x, grad) instead of f(x). The gradient given by grad is given by a NumPy array which must be replaced in-place. For the Rosenbrock example this looks like:

[3]:
def rosenbrock_nlopt_style(pos, grad):

    x, y = pos
    first_bracket = 1-x
    second_bracket = y-x*x

    obj = first_bracket*first_bracket+100*second_bracket*second_bracket

    if grad.size > 0:

        dx = -2*first_bracket-400*x*second_bracket
        dy = 200 * second_bracket
        grad[0] = dx
        grad[1] = dy

    return obj

%timeit -n 1000 res = simplenlopt.minimize(rosenbrock_nlopt_style, x0, jac = 'nlopt')
2.14 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The in-place replacement led to another small performance gain. Side note: while the if statement might seem weird and unnecessary, it is required for some of the optimizers, so you are on the safe side if you include it in your objective function .

jac = ‘3-point’/’2-point’

These flags tell simplenlopt which finite difference scheme to use. Finite differencing is borrowed from SciPy. Note that ‘2-point’ requires less function evaluations but is less precise and therefore more prone to cause optimization failures.

[4]:
%timeit -n 100 res = simplenlopt.minimize(rosenbrock, x0, jac = '3-point')
7.39 ms ± 245 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This example shows that finite differences are not competitive against analytical gradients. For simple cases such as low dimensional curve fitting they are often still useful. If possible, automatic differentiation represents a powerful alternative.

Autodiff

In recent years, automatic differentiation (autodiff) has become one of the building blocks of machine learning. Many frameworks such as pytorch and tensorflow actually are centered around autodiff. Here, we will use the slightly older autograd package to automatigally compute the gradient for us and feed it to simplenlopt.

[5]:
import autograd.numpy as anp
from autograd import value_and_grad

rosen_and_grad = value_and_grad(rosenbrock)
%timeit -n 10 res = simplenlopt.minimize(rosen_and_grad, x0, jac = True)
17.3 ms ± 392 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

While autograd results in the worst performance for this example, autodiff shines when it comes to high dimensional problems where the inaccuracies of finite differences are much more severe. To circumvent autograd’s performance issues, another candidate could be for example autograd’s succesor jax which additionally provides just-in-time compilation.