Skip to content

Commit 6daf52f

Browse files
committed
wip
1 parent 4dd2c05 commit 6daf52f

File tree

4 files changed

+230
-219
lines changed

4 files changed

+230
-219
lines changed

papers/intro/content.md

Lines changed: 10 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,8 @@ This tension intensifies in modern applications characterized by high dimensiona
4141
Researchers have developed various approaches to combine gradient and quasi-Newton directions:
4242

4343
* **Trust Region Methods** [@conn2000trust]: These methods constrain the step size within a region where the quadratic model is trusted to approximate the objective function. While effective, they require solving a constrained optimization subproblem at each iteration.
44-
4544
* **Line Search with Switching** [@morales2000automatic]: Some methods alternate between gradient and quasi-Newton directions based on heuristic criteria, but this can lead to discontinuous behavior and convergence issues.
46-
4745
* **Weighted Combinations** [@biggs1973minimization]: Linear combinations of gradient and quasi-Newton directions have been explored, but selecting appropriate weights remains challenging and often problem-dependent.
48-
4946
* **Adaptive Learning Rates** [@kingma2015adam]: Methods like Adam use adaptive learning rates based on gradient moments but don't directly incorporate second-order curvature information.
5047

5148
We propose quadratic interpolation as a simple geometric solution to this direction combination problem.
@@ -63,9 +60,7 @@ This approach provides several key advantages:
6360
This paper makes three primary contributions:
6461

6562
1. **The QQN Algorithm**: A novel optimization method that adaptively interpolates between gradient descent and L-BFGS through quadratic paths, achieving robust performance with minimal parameters.
66-
6763
2. **Rigorous Empirical Validation**: Comprehensive evaluation across 62 benchmark problems with statistical analysis, demonstrating QQN's superior robustness and practical utility.
68-
6964
3. **Benchmarking Framework**: A reusable Rust application for optimization algorithm evaluation that promotes reproducible research and meaningful comparisons.
7065

7166
Optimal configurations remain problem-dependent, but QQN's adaptive nature minimizes the need for extensive hyperparameter tuning.
@@ -120,10 +115,9 @@ We formulate the direction combination problem as a geometric interpolation. The
120115
$\mathbf{d}: [0,1] \rightarrow \mathbb{R}^n$ that traces a path from the current point. We impose three natural boundary conditions:
121116

122117
1. **Initial Position**: $\mathbf{d}(0) = \mathbf{0}$ (the curve starts at the current point)
123-
124-
2. **Initial Tangent**: $\mathbf{d}'(0) = -\nabla f(\mathbf{x}_k)$ (the curve begins tangent to the negative gradient, ensuring descent)
125-
118+
2. **Initial Tangent**: $\mathbf{d}'(0) = -\nabla f(\mathbf{x}_k)$ (the curve begins tangent to the negative gradient, ensuring descent)
126119
3. **Terminal Position**: $\mathbf{d}(1) = \mathbf{d}_{\text{LBFGS}}$ (the curve ends at the L-BFGS direction)
120+
127121
The second condition is crucial: by ensuring the path starts tangent to the negative gradient, we guarantee that moving along the path initially decreases the objective function, regardless of where the path eventually leads. This provides robustness against poor quasi-Newton directions.
128122

129123

@@ -204,6 +198,7 @@ This negative derivative at $t=0$ ensures that for sufficiently small positive $
204198
When the L-BFGS direction is high-quality (well-aligned with the negative gradient), the optimal parameter $t^*$ will be close to or exceed 1, effectively using the quasi-Newton step. When the L-BFGS direction is poor (misaligned or even pointing uphill), the optimization naturally selects a smaller $t^*$, staying closer to the gradient direction.
205199

206200
This can be visualized as a "trust slider" that automatically adjusts based on the quality of the quasi-Newton approximation:
201+
207202
- Good L-BFGS direction → $t^* \approx 1$ or larger → quasi-Newton-like behavior
208203
- Poor L-BFGS direction → $t^* \approx 0$ → gradient descent-like behavior
209204
- Intermediate cases → smooth interpolation between the two
@@ -233,40 +228,25 @@ The key insight is that the sufficient decrease property:
233228
$$f(\mathbf{x}_{k+1}) \leq f(\mathbf{x}_k) - c\|\nabla f(\mathbf{x}_k)\|^2$$
234229
combined with the lower bound on $f$, creates a "budget" of total possible decrease. This budget forces the gradients to become arbitrarily small.
235230

236-
237-
238-
239-
240-
241-
242-
243231
*Proof*: See Appendix B.2.2 for the complete convergence analysis using descent lemmas and summability arguments. $\square$
244232

245233
**Theorem 3** (Local Superlinear Convergence): Near a local minimum with positive definite Hessian, if the L-BFGS approximation satisfies standard Dennis-Moré conditions, QQN converges superlinearly.
234+
246235
*Intuition*: Near a minimum where the L-BFGS approximation is accurate, the optimal parameter $t^*$ approaches 1, making QQN steps nearly identical to L-BFGS steps. Since L-BFGS converges superlinearly under these conditions, so does QQN. The beauty is that this happens automatically—no switching logic or parameter tuning required.
236+
247237
The Dennis-Moré condition essentially states that the L-BFGS approximation $\mathbf{H}_k$ becomes increasingly accurate in the directions that matter (the actual steps taken). When this holds:
248238
$$t^* \to 1 \quad \text{and} \quad \mathbf{x}_{k+1} \approx \mathbf{x}_k - \mathbf{H}_k\nabla f(\mathbf{x}_k)$$
239+
249240
This recovers the quasi-Newton iteration, inheriting its superlinear convergence rate.
250241
*Proof*: See Appendix B.2.3 for the detailed local convergence analysis showing $t^* = 1 + o(1)$ and the resulting superlinear rate. $\square$
242+
251243
### Practical Implications of the Theory
244+
252245
The theoretical guarantees translate to practical benefits:
246+
253247
1. **No Hyperparameter Tuning**: The adaptive nature of the quadratic path eliminates the need for trust region radii, switching thresholds, or other parameters that plague hybrid methods.
254248
2. **Robust Failure Recovery**: When L-BFGS produces a bad direction (e.g., due to numerical errors or non-convexity), QQN automatically takes a more conservative step rather than diverging.
255249
3. **Smooth Performance Degradation**: As problems become more difficult (higher condition number, more non-convexity), QQN gradually transitions from quasi-Newton to gradient descent behavior, rather than failing catastrophically.
256-
257-
258-
259-
260-
261-
262-
263-
264-
265-
266-
267-
268-
269-
270250
4. **Preserved Convergence Rates**: In favorable conditions (near minima with positive definite Hessians), QQN achieves the same superlinear convergence as L-BFGS, so we don't sacrifice asymptotic performance for robustness.
271251

272252
# Benchmarking Methodology
@@ -357,6 +337,7 @@ We apply Bonferroni correction for multiple comparisons with adjusted significan
357337
## Overall Performance
358338

359339
The comprehensive evaluation across 62 benchmark problems with 25 optimizer variants revealed clear performance hierarchies. QQN variants dominated the results, winning the majority of problems across all categories. Key findings include:
340+
360341
* **QQN variants won 46 out of 62 test problems** (74.2% win rate)
361342
* **Statistical significance**: Friedman test p-value < 0.001 confirms algorithm performance differences
362343
* **Top performers**: QQN-StrongWolfe (12 wins), QQN-GoldenSection (11 wins), QQN-Bisection-1 (9 wins)

papers/intro/content.tex

Lines changed: 38 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ \subsection{Previous Approaches to Direction Combination}\label{previous-approac
5656
Researchers have developed various approaches to combine gradient and quasi-Newton directions:
5757

5858
\begin{itemize}
59+
\tightlist
5960
\item
6061
\textbf{Trust Region Methods} \citep{conn2000trust}: These methods constrain the step size within a region where the quadratic model is trusted to approximate the objective function. While effective, they require solving a constrained optimization subproblem at each iteration.
6162
\item
@@ -87,6 +88,7 @@ \subsection{Contributions}\label{contributions}}
8788

8889
\begin{enumerate}
8990
\def\labelenumi{\arabic{enumi}.}
91+
\tightlist
9092
\item
9193
\textbf{The QQN Algorithm}: A novel optimization method that adaptively interpolates between gradient descent and L-BFGS through quadratic paths, achieving robust performance with minimal parameters.
9294
\item
@@ -155,15 +157,17 @@ \subsection{Algorithm Derivation}\label{algorithm-derivation}}
155157

156158
\begin{enumerate}
157159
\def\labelenumi{\arabic{enumi}.}
160+
\tightlist
158161
\item
159162
\textbf{Initial Position}: \(\mathbf{d}(0) = \mathbf{0}\) (the curve starts at the current point)
160163
\item
161164
\textbf{Initial Tangent}: \(\mathbf{d}'(0) = -\nabla f(\mathbf{x}_k)\) (the curve begins tangent to the negative gradient, ensuring descent)
162165
\item
163166
\textbf{Terminal Position}: \(\mathbf{d}(1) = \mathbf{d}_{\text{LBFGS}}\) (the curve ends at the L-BFGS direction)
164-
The second condition is crucial: by ensuring the path starts tangent to the negative gradient, we guarantee that moving along the path initially decreases the objective function, regardless of where the path eventually leads. This provides robustness against poor quasi-Newton directions.
165167
\end{enumerate}
166168

169+
The second condition is crucial: by ensuring the path starts tangent to the negative gradient, we guarantee that moving along the path initially decreases the objective function, regardless of where the path eventually leads. This provides robustness against poor quasi-Newton directions.
170+
167171
Following Occam's razor, we seek the lowest-degree polynomial satisfying these constraints.
168172
A quadratic polynomial \(\mathbf{d}(t) = \mathbf{a}t^2 + \mathbf{b}t + \mathbf{c}\) provides the minimal solution.
169173

@@ -251,9 +255,16 @@ \subsubsection{Intuitive Understanding}\label{intuitive-understanding}}
251255
When the L-BFGS direction is high-quality (well-aligned with the negative gradient), the optimal parameter \(t^*\) will be close to or exceed 1, effectively using the quasi-Newton step. When the L-BFGS direction is poor (misaligned or even pointing uphill), the optimization naturally selects a smaller \(t^*\), staying closer to the gradient direction.
252256

253257
This can be visualized as a ``trust slider'' that automatically adjusts based on the quality of the quasi-Newton approximation:
254-
- Good L-BFGS direction → \(t^* \approx 1\) or larger → quasi-Newton-like behavior
255-
- Poor L-BFGS direction → \(t^* \approx 0\) → gradient descent-like behavior
256-
- Intermediate cases → smooth interpolation between the two
258+
259+
\begin{itemize}
260+
\tightlist
261+
\item
262+
Good L-BFGS direction → \(t^* \approx 1\) or larger → quasi-Newton-like behavior
263+
\item
264+
Poor L-BFGS direction → \(t^* \approx 0\) → gradient descent-like behavior
265+
\item
266+
Intermediate cases → smooth interpolation between the two
267+
\end{itemize}
257268

258269
\textbf{3. Convergence Through Sufficient Decrease}
259270

@@ -284,21 +295,29 @@ \subsubsection{Formal Theoretical Guarantees}\label{formal-theoretical-guarantee
284295
\emph{Proof}: See Appendix B.2.2 for the complete convergence analysis using descent lemmas and summability arguments. \(\square\)
285296

286297
\textbf{Theorem 3} (Local Superlinear Convergence): Near a local minimum with positive definite Hessian, if the L-BFGS approximation satisfies standard Dennis-Moré conditions, QQN converges superlinearly.
298+
287299
\emph{Intuition}: Near a minimum where the L-BFGS approximation is accurate, the optimal parameter \(t^*\) approaches 1, making QQN steps nearly identical to L-BFGS steps. Since L-BFGS converges superlinearly under these conditions, so does QQN. The beauty is that this happens automatically---no switching logic or parameter tuning required.
300+
288301
The Dennis-Moré condition essentially states that the L-BFGS approximation \(\mathbf{H}_k\) becomes increasingly accurate in the directions that matter (the actual steps taken). When this holds:
289302
\[t^* \to 1 \quad \text{and} \quad \mathbf{x}_{k+1} \approx \mathbf{x}_k - \mathbf{H}_k\nabla f(\mathbf{x}_k)\]
303+
290304
This recovers the quasi-Newton iteration, inheriting its superlinear convergence rate.
291305
\emph{Proof}: See Appendix B.2.3 for the detailed local convergence analysis showing \(t^* = 1 + o(1)\) and the resulting superlinear rate. \(\square\)
292-
\#\#\# Practical Implications of the Theory
306+
307+
\hypertarget{practical-implications-of-the-theory}{%
308+
\subsubsection{Practical Implications of the Theory}\label{practical-implications-of-the-theory}}
309+
293310
The theoretical guarantees translate to practical benefits:
294-
1. \textbf{No Hyperparameter Tuning}: The adaptive nature of the quadratic path eliminates the need for trust region radii, switching thresholds, or other parameters that plague hybrid methods.
295-
2. \textbf{Robust Failure Recovery}: When L-BFGS produces a bad direction (e.g., due to numerical errors or non-convexity), QQN automatically takes a more conservative step rather than diverging.
296-
3. \textbf{Smooth Performance Degradation}: As problems become more difficult (higher condition number, more non-convexity), QQN gradually transitions from quasi-Newton to gradient descent behavior, rather than failing catastrophically.
297311

298312
\begin{enumerate}
299313
\def\labelenumi{\arabic{enumi}.}
300-
\setcounter{enumi}{3}
301314
\tightlist
315+
\item
316+
\textbf{No Hyperparameter Tuning}: The adaptive nature of the quadratic path eliminates the need for trust region radii, switching thresholds, or other parameters that plague hybrid methods.
317+
\item
318+
\textbf{Robust Failure Recovery}: When L-BFGS produces a bad direction (e.g., due to numerical errors or non-convexity), QQN automatically takes a more conservative step rather than diverging.
319+
\item
320+
\textbf{Smooth Performance Degradation}: As problems become more difficult (higher condition number, more non-convexity), QQN gradually transitions from quasi-Newton to gradient descent behavior, rather than failing catastrophically.
302321
\item
303322
\textbf{Preserved Convergence Rates}: In favorable conditions (near minima with positive definite Hessians), QQN achieves the same superlinear convergence as L-BFGS, so we don't sacrifice asymptotic performance for robustness.
304323
\end{enumerate}
@@ -435,9 +454,16 @@ \section{Experimental Results}\label{experimental-results}}
435454
\subsection{Overall Performance}\label{overall-performance}}
436455

437456
The comprehensive evaluation across 62 benchmark problems with 25 optimizer variants revealed clear performance hierarchies. QQN variants dominated the results, winning the majority of problems across all categories. Key findings include:
438-
* \textbf{QQN variants won 46 out of 62 test problems} (74.2\% win rate)
439-
* \textbf{Statistical significance}: Friedman test p-value \textless{} 0.001 confirms algorithm performance differences
440-
* \textbf{Top performers}: QQN-StrongWolfe (12 wins), QQN-GoldenSection (11 wins), QQN-Bisection-1 (9 wins)
457+
458+
\begin{itemize}
459+
\tightlist
460+
\item
461+
\textbf{QQN variants won 46 out of 62 test problems} (74.2\% win rate)
462+
\item
463+
\textbf{Statistical significance}: Friedman test p-value \textless{} 0.001 confirms algorithm performance differences
464+
\item
465+
\textbf{Top performers}: QQN-StrongWolfe (12 wins), QQN-GoldenSection (11 wins), QQN-Bisection-1 (9 wins)
466+
\end{itemize}
441467

442468
\hypertarget{evaluation-insights}{%
443469
\subsection{Evaluation Insights}\label{evaluation-insights}}

papers/intro/paper.pdf

57 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)