Verifying numerical rates of convergence #

Let’s explore the first application of PEP: numerically verifying the rate of convergence. As an illustrative example, we consider gradient descent (GD) on \(L\)-smooth convex functions. We look for the worst function \(f\) for which, after \(N\) iterations of GD, the objective gap \(f(x_N) - f(x_\star)\) is as large as possible, where \(x_\star\) denotes a minimizer of \(f\). In other words, we obtain the worst-case convergence guarantee of GD by solving the following optimization problem

\[\begin{split}\begin{equation} \begin{split} \text{maximize} \quad & \text{some performance metric} && f(x_N) - f(x_\star) \\ \text{subject to} \quad & \text{$f$ belongs to some function class} && \text{$f$ is $L$-smooth convex} \\ & \text{$\{x_k\}_{k=1}^N$ is generated by an algorithm} \qquad \quad && x_{k+1} = x_k - \alpha \nabla f(x_k), \;\; k=0,\ldots,N-1 \\ & \text{$x_0$ satisfies some initial condition} && \|x_0 - x_\star\|^2 \leq R^2 \\ & \text{$x_\star$ is a minimizer of $f$} && \nabla f(x_\star) = 0. \end{split} \qquad \;\; \tag{PEP} \end{equation}\end{split}\]

We start by setting up a PEPContext object in PEPFlow. This PEPContext object keeps track of all the mathematical objects such as points and scalars involved in the analysis.

import pepflow as pf
import numpy as np
import matplotlib.pyplot as plt

ctx = pf.PEPContext("gd").set_as_current()

Then, we declare the four ingredients of algorithm analysis:

  • function class: \(L\)-smooth convex functions;

  • algorithm of interest: gradient descent method;

  • initial condition: initial point and optimum are not too far away \(\|x_0 - x_\star\|^2 \leq R^2\);

  • performance metric: objective gap \(f(x_k) - f(x_\star)\).

PEPFlow allows us to describe these objects in a high-level way that mirrors the mathematical setup. We first create a PEPBuilder object which hold all the information needed to construct and solve PEPs.

pep_builder = pf.PEPBuilder(ctx)

We now define an \(L\)-smooth convex function \(f\) and set a stationary point called x_star.

L = pf.Parameter("L")

# Define function class
f = pf.SmoothConvexFunction(is_basis=True, tags=["f"], L=L)
x_star = f.set_stationary_point("x_star")

Now, we declare the initial condition \(\lVert x_0 - x_\star \rVert^2 \leq R^2\).

R = pf.Parameter("R")

# Set the initial condition
x = pf.Vector(is_basis=True, tags=["x_0"])  # The first iterate
pep_builder.add_initial_constraint(
    ((x - x_star) ** 2).le(R**2, name="initial_condition")
)

Next, we define generate the GD iterates \(\{x_i\}_{i=1}^{N}\). Note that for each iterate \(x_i\) we add a tag to display it nicely and for easy access from the PEPContext object which tracks it. Furthermore, the Vector object representing \(\nabla f(x)\) can be generated by calling f.grad(x).

N = 8
alpha = 1 / L

# Define the gradient descent method
for i in range(N):
    x = x - alpha * f.grad(x)
    x.add_tag(f"x_{i + 1}")

Finally, we set the performance metric \(f(x_k) - f(x_\star)\). Observe that the Scalar object representing \(f(x)\) can be generated by calling f(x).

# Set the performance metric
x_N = ctx[f"x_{N}"]
pep_builder.set_performance_metric(f(x_N) - f(x_star))

Solving the PEP numerically gives the worst-case value of the chosen performance metric.

L_value = 1
R_value = 1
result = pep_builder.solve(resolve_parameters={"L": L_value, "R": R_value})
print(f"primal PEP optimal value = {result.opt_value:.4f}")
primal PEP optimal value = 0.0294

In this example, the numerical result can be interpreted as: for all \(1\)-smooth convex functions \(f\), \(8\)-step GD starting at \(x_0\) with \(\|x_0 - x_\star\|^2 \leq 1\) satisfies

\[f(x_8) - f(x_\star) \leq 0.0294.\]

To better understand how the error decreases with the number of iterations, we repeat the process for each \(k = 1, 2, \dots, 7\). This gives us a sequence of numerical values that can be compared to the known analytical rate.

opt_values = []
for k in range(1, N):
    x_k = ctx[f"x_{k}"]
    pep_builder.set_performance_metric(f(x_k) - f(x_star))
    result = pep_builder.solve(resolve_parameters={"L": L_value, "R": R_value})
    opt_values.append(result.opt_value)

plt.scatter(range(1, N), opt_values, color="blue", marker="o");
../_images/15eded3b9efca5d6e2505c7ade89ed478a2c38afba8cdf89a34d787d3c6599f1.png

The resulting values are visualized to show the trend of convergence. Each scatter point represents a guaranteed bound that holds for all \(L\)-smooth convex functions, while the continuous curve shows the known analytical rate for comparison (see, e.g., [1]).

After experimenting with different \(L\), \(R\), and \(N\), we tend to draw the conclusion: for all \(L\)-smooth convex functions \(f\), \(N\)-step GD with step size \(\alpha = 1/L\) satisfies

\[f(x_N) - f(x_\star) \leq \frac{L}{4N+2} \|x_0 - x_\star\|_2^2.\]
iters = np.arange(1, N)
cont_iters = np.arange(1, N, 0.01)
plt.plot(
    cont_iters,
    1 / (4 * cont_iters + 2),
    "r-",
    label="Analytical bound $\\frac{1}{4k+2}$",
)
plt.scatter(iters, opt_values, color="blue", marker="o", label="Numerical values")
plt.legend();
../_images/f80420a43c41252eef410f0cdc488a1e7cee94053072b11f5729600ae2575bd7.png