Verifying numerical rates of convergence #

Let’s explore the first application of PEP: numerically verifying the rate of convergence. As an illustrative example, we consider gradient descent (GD) on $L$-smooth convex functions. We look for the worst function $f$ for which, after $N$ iterations of GD, the objective gap $f(x_N) - f(x_\star)$ is as large as possible, where $x_\star$ denotes a minimizer of $f$. In other words, we obtain the worst-case convergence guarantee of GD by solving the following optimization problem

\[\begin{split}\begin{equation} \begin{split} \text{maximize} \quad & \text{some performance metric} && f(x_N) - f(x_\star) \\ \text{subject to} \quad & \text{$f$ belongs to some function class} && \text{$f$ is $L$-smooth convex} \\ & \text{$\{x_k\}_{k=1}^N$ is generated by an algorithm} \qquad \quad && x_{k+1} = x_k - \alpha \nabla f(x_k), \;\; k=0,\ldots,N-1 \\ & \text{$x_0$ satisfies some initial condition} && \|x_0 - x_\star\|^2 \leq R^2 \\ & \text{$x_\star$ is a minimizer of $f$} && \nabla f(x_\star) = 0. \end{split} \qquad \;\; \tag{PEP} \end{equation}\end{split}\]

We start by setting up a PEPContext object in PEPFlow. This PEPContext object keeps track of all the mathematical objects such as points and scalars involved in the analysis.

import pepflow as pf
import numpy as np
import matplotlib.pyplot as plt

ctx = pf.PEPContext("gd").set_as_current()

More on ScalarConstraint objects

The code ((x - x_star) ** 2).le(R**2, name="initial_condition") creates a ScalarConstraint object which represent a scalar constraint used in the PEP. A ScalarConstraint object requires 4 components:

a Scalar object for the left hand side of the constraint;
a Scalar object for the right hand side of the constraint;
the type of constraint, e.g., (le, geq, eq);
a user-provided name for the constraint.

Next, we define generate the GD iterates $\{x_i\}_{i=1}^{N}$. Note that for each iterate $x_i$ we add a tag to display it nicely and for easy access from the PEPContext object which tracks it. Furthermore, the Vector object representing $\nabla f(x)$ can be generated by calling f.grad(x).

N = 8
alpha = 1 / L

# Define the gradient descent method
for i in range(N):
    x = x - alpha * f.grad(x)
    x.add_tag(f"x_{i + 1}")

Tags for composite Vectors and Scalars

Vector and Scalar objects formed as linear combinations of basis objects do not have any tags. The GD iterates generated by the line of code x = x - alpha * f.grad(x) are examples of composite Vector objects. It is important for the user to give these composite objects tags to access these objects easily from the PEPContext that manages them.

Finally, we set the performance metric $f(x_k) - f(x_\star)$. Observe that the Scalar object representing $f(x)$ can be generated by calling f(x).

# Set the performance metric
x_N = ctx[f"x_{N}"]
pep_builder.set_performance_metric(f(x_N) - f(x_star))

Accessing Vector and Scalar objects from a PEPContext

The Vector and Scalar objects managed by a PEPContext object can be easily accessed through their tags. In the prior code block, we returned the iterate $x_N$ represented by a Vector object with the tag x_N managed by the PEPContext object ctx through the code: ctx[f"x_{N}"].

Solving the PEP numerically gives the worst-case value of the chosen performance metric.

L_value = 1
R_value = 1
result = pep_builder.solve(resolve_parameters={"L": L_value, "R": R_value})
print(f"primal PEP optimal value = {result.opt_value:.4f}")

primal PEP optimal value = 0.0294

Resolving Parameter objects

To solve the PEP problem represented by the pep_builder object, we need to resolve the Parameter objects L and R with concrete numerical values. To do this, we pass in a dictionary of the form: {name_of_parameter: concrete_value}.

In this example, the numerical result can be interpreted as: for all $1$-smooth convex functions $f$, $8$-step GD starting at $x_0$ with $\|x_0 - x_\star\|^2 \leq 1$ satisfies

\[f(x_8) - f(x_\star) \leq 0.0294.\]

To better understand how the error decreases with the number of iterations, we repeat the process for each $k = 1, 2, \dots, 7$. This gives us a sequence of numerical values that can be compared to the known analytical rate.

opt_values = []
for k in range(1, N):
    x_k = ctx[f"x_{k}"]
    pep_builder.set_performance_metric(f(x_k) - f(x_star))
    result = pep_builder.solve(resolve_parameters={"L": L_value, "R": R_value})
    opt_values.append(result.opt_value)

plt.scatter(range(1, N), opt_values, color="blue", marker="o");

../_images/15eded3b9efca5d6e2505c7ade89ed478a2c38afba8cdf89a34d787d3c6599f1.png

The resulting values are visualized to show the trend of convergence. Each scatter point represents a guaranteed bound that holds for all $L$-smooth convex functions, while the continuous curve shows the known analytical rate for comparison (see, e.g., [1]).

After experimenting with different $L$, $R$, and $N$, we tend to draw the conclusion: for all $L$-smooth convex functions $f$, $N$-step GD with step size $\alpha = 1/L$ satisfies

\[f(x_N) - f(x_\star) \leq \frac{L}{4N+2} \|x_0 - x_\star\|_2^2.\]

iters = np.arange(1, N)
cont_iters = np.arange(1, N, 0.01)
plt.plot(
    cont_iters,
    1 / (4 * cont_iters + 2),
    "r-",
    label="Analytical bound $\\frac{1}{4k+2}$",
)
plt.scatter(iters, opt_values, color="blue", marker="o", label="Numerical values")
plt.legend();

../_images/f80420a43c41252eef410f0cdc488a1e7cee94053072b11f5729600ae2575bd7.png