Least Squares: Gauss-Newton

The normal equations work beautifully when the model is linear in its parameters. But many scientific models are not like that.

Think of exponential decay, enzyme kinetics, adsorption isotherms, or chemical equilibrium models. Their parameters may appear inside exponentials, denominators, or other nonlinear expressions. In those cases, the model cannot be written as:

f (X, β) = X β

So we need another idea.

The Gauss-Newton method solves nonlinear least squares by repeatedly pretending, locally, that the nonlinear model is linear [1],[2].

The nonlinear least squares problem

The objective is still:

β min ∥ y - f (X, β) ∥^{2}

The difference is that $f$ is now nonlinear in $β$ .

Define the residual vector:

r (β) = y - f (X, β)

We want to make that residual vector as small as possible.

The local linear approximation

At the current parameter estimate $β^{(k)}$ , Gauss-Newton approximates the model with a first-order Taylor expansion [1],[2]:

f (X, β) \approx f (X, β^{(k)}) + J (β^{(k)}) (β - β^{(k)})

The matrix $J$ is the Jacobian. It contains the partial derivatives of each model prediction with respect to each parameter.

If the model has many outputs and several parameters, the Jacobian tells us how every prediction changes when every parameter changes slightly.

The Jacobian in plain language

The Jacobian is a sensitivity table. Each entry asks: if I nudge this parameter, how much does this predicted value move?

Interactive example

The example below fits a nonlinear exponential curve. Each step builds a local linear approximation around the current parameters, solves a small least squares correction, and updates the curve. Watch how the fit improves quickly when the local approximation is good.

Gauss-Newton: repeatedly linearize the nonlinear model

iteration

0 / 10

SSE

6.0656

parameters

3.40, 0.22, 0.25

The Gauss-Newton step

Instead of solving the nonlinear problem directly, we solve a linear least squares problem for a parameter correction $δ$ [1],[2]:

J^{T} J δ = J^{T} r

Then we update:

β^{(k + 1)} = β^{(k)} + δ

So each iteration says:

Look at the current model.
Build a local linear approximation.
Solve a least squares problem for the correction.
Update the parameters.

Then repeat.

When it works well

Gauss-Newton works well when the model is not too nonlinear near the solution and the starting guess is already reasonably close. In that situation, the local linear approximation is accurate, and the method can converge quickly [1],[2].

It can struggle when the starting point is poor, the residuals are large, or the model is strongly nonlinear. Then the local linear approximation may point in a direction that is mathematically reasonable but practically too bold.

That is where Levenberg-Marquardt becomes useful.

What to remember

Gauss-Newton is not a completely different philosophy from linear least squares. It is more like a clever loop around it. At each iteration, it turns a nonlinear problem into a temporary linear one.

That is why understanding the normal equations first pays off: they quietly reappear inside nonlinear optimization.

References

[1]Nocedal, J., & Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer.

[2]Bates, D. M., & Watts, D. G. (1988). Nonlinear Regression Analysis and Its Applications. Wiley.