Least Squares: Levenberg-Marquardt

A damped nonlinear least squares method between Gauss-Newton and gradient descent

The Levenberg-Marquardt method is one of the most common algorithms for nonlinear least squares [1],[2]. It is popular because it balances two useful behaviors:

It can behave like Gauss-Newton when the local approximation is reliable.
It can behave more like gradient descent when the step needs to be cautious.

That balance makes it especially useful for real experimental fitting, where the first parameter guess is often imperfect.

The Gauss-Newton problem

In Gauss-Newton, each iteration solves [3],[4]:

J^{T} J δ = J^{T} r

and then updates:

β^{(k + 1)} = β^{(k)} + δ

This can be very fast when the current point is already close to the answer. But if the starting point is poor, the Gauss-Newton step can be too aggressive.

Adding damping

A common Levenberg-style damping form modifies the system by adding a diagonal stabilizing term:

(J^{T} J + λ I) δ = J^{T} r

Here, $λ$ controls the damping and $I$ is the identity matrix [1],[2]. Many descriptions of the Marquardt variant use $λ diag (J^{T} J)$ instead of $λ I$ . The idea is the same for this article: add damping so the step becomes safer when the local approximation is unreliable.

When $λ$ is small, the method behaves like Gauss-Newton. When $λ$ is large, the method becomes more cautious and behaves more like gradient descent.

A practical way to think about lambda

The damping parameter is like a trust control. If the local quadratic approximation is working, the algorithm trusts it more and reduces damping. If a step makes the fit worse, the algorithm becomes more cautious and increases damping.

Interactive example

This panel uses the same kind of nonlinear curve fitting problem as Gauss-Newton, but now the damping parameter changes as the fit improves. When the model is uncertain, the method behaves cautiously. When the local approximation starts working, it becomes more confident.

Levenberg-Marquardt: damped nonlinear least squares

iteration

0 / 10

SSE

27.5425

lambda

1.000

The algorithmic rhythm

A typical Levenberg-Marquardt iteration looks like this:

Start from current parameters.
Compute residuals and the Jacobian.
Solve the damped system for $δ$ .
Try the new parameters.
If the fit improves, accept the step and reduce damping.
If the fit worsens, reject or shrink the step and increase damping.

This makes the method adaptive. It can move quickly when the path is clear, but slow down when the model surface becomes tricky.

Where it appears in chemistry

Levenberg-Marquardt is commonly used for nonlinear curve fitting:

Exponential decay models.
Michaelis-Menten kinetics.
Spectral peak fitting.
Binding curves.
Calibration models with nonlinear response.

In all these cases, the model parameters have physical meaning, and a stable fitting method matters. A small change in starting values should not make the model behave wildly.

What it gives you

Levenberg-Marquardt does not guarantee perfection. Nonlinear least squares can still have local minima, and the starting point still matters. But compared with plain Gauss-Newton, it is usually more forgiving.

If Gauss-Newton is the confident version of nonlinear least squares, Levenberg-Marquardt is the version that knows when to be careful.

References

[1]Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2), 431-441.

[2]Moré, J. J. (1978). The Levenberg-Marquardt algorithm: Implementation and theory. In Numerical Analysis (pp. 105-116). Springer.

[3]Nocedal, J., & Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer.

[4]Bates, D. M., & Watts, D. G. (1988). Nonlinear Regression Analysis and Its Applications. Wiley.