Skip to content

Whittaker Smoothing

The mathematical foundations of this smoother trace back to Edmund Taylor Whittaker (1873—1956), a British mathematician and astronomer who spent most of his career at the University of Edinburgh. In 1923, Whittaker published “On a New Method of Graduation” in the Proceedings of the Edinburgh Mathematical Society (volume 41, pages 63—75). The word “graduation” came from actuarial science, where smoothing mortality tables to remove statistical fluctuations was called graduating the data. Whittaker’s idea was to frame smoothing as a penalized least squares problem: find the curve that balances fidelity to the observed values against a penalty on roughness, measured by the sum of squared finite differences. The formulation was elegant, but in 1923 there were no electronic computers, and solving the resulting linear systems by hand was impractical for all but the smallest datasets. The method was largely forgotten.

Eight decades later, Paul H.C. Eilers at Leiden University Medical Centre rediscovered Whittaker’s framework and showed it was ideally suited to modern computation. His 2003 paper “A Perfect Smoother” in Analytical Chemistry (volume 75, pages 3631—3636) demonstrated that Whittaker’s penalized least squares formulation, combined with sparse matrix techniques, produced a smoother that was fast, trivially easy to implement, and controlled by a single intuitive parameter . Eilers’ key contribution was using difference penalties rather than derivative penalties, which reduced the entire algorithm to solving a banded linear system — just a few lines of code in MATLAB, R, or Python. This was not Eilers’ first encounter with penalized methods: together with Brian Marx, he had introduced P-splines (penalized B-splines) in 1996, a framework that also relied on difference penalties for flexible curve fitting. The Whittaker smoother can be understood as the limiting case of P-splines where every data point has its own basis function.

Eilers’ revival turned an obscure 1923 idea into one of the most practical smoothing tools in modern chemometrics. Today, Whittaker smoothing and its weighted variants underpin methods for baseline correction (Asymmetric Least Squares), signal denoising, and even two-dimensional smoothing of spectral images.

The optimization approach to smoothing

Moving average smoothing is simple and intuitive, but it has limitations. The main problem? It treats smoothing as a purely local operation, averaging nearby points without considering the global structure of the signal. This leads to peak broadening and loss of resolution.

What if instead of mechanically averaging neighbors, we posed smoothing as an optimization problem?

Find the smoothest curve that still fits the data well.

This is exactly what Whittaker smoothing does. As described above, Whittaker framed the problem as penalized least squares in 1923, and Eilers revived it in 2003 as the “perfect smoother” for modern applications.

The core idea: balancing two goals

Whittaker smoothing recognizes that smoothing involves a fundamental tradeoff between two competing objectives:

  1. Fidelity to data — The smoothed signal should be close to the observed (noisy) data
  2. Smoothness — The smoothed signal should be, well, smooth

These goals conflict. Perfect fidelity means no smoothing (keeping all the noise). Perfect smoothness means a straight line (losing all features). We need to balance them.

Mathematically, we minimize:

Where:

  • y = your noisy observed data (n points)
  • z = the smoothed signal we’re trying to find
  • λ (lambda) = smoothing parameter that controls the tradeoff
  • Δd = d-th order difference operator
  • d = difference order (typically 2)

The fidelity term

This is the sum of squared differences between the observed data and the smooth signal. It’s small when z is close to y (good fit) and large when z differs from y (poor fit).

The smoothness penalty

This penalizes roughness in the smoothed signal. The difference operator Δ measures how much the signal changes:

  • First difference (d=1): Δzi = zi+1 - zi (measures slope)
  • Second difference (d=2): Δ²zi = zi+2 - 2zi+1 + zi (measures curvature)
  • Third difference (d=3): Δ³zi = zi+3 - 3zi+2 + 3zi+1 - zi

The λ parameter: the key to the tradeoff

The smoothing parameter λ controls the balance:

  • λ = 0 → No penalty for roughness → Perfect fit to data (no smoothing)
  • λ = ∞ → Infinite penalty for roughness → Perfectly smooth (straight line)
  • λ = “just right” → Balanced smoothing that reduces noise but preserves features

In practice, λ values typically range from 100 to 108, which is why it’s often specified on a log scale.

The solution: a simple linear system

Here’s the remarkable thing: even though this looks like a complex optimization problem, it has a closed-form solution. No iterative optimization needed.

Taking the derivative of S with respect to z and setting it to zero gives us:

Where:

  • I = identity matrix (n × n)
  • D = difference matrix (implements the Δd operator)
  • z = smoothed signal (what we want)
  • y = noisy observed data

This is just a linear system Ax = b. We can solve it directly:

Interactive visualization

Let’s explore how the λ parameter and difference order affect smoothing. The visualization below lets you adjust both parameters and see the results in real-time:

Try this:

  1. Start with log(λ) = 2 (λ = 100) and observe moderate smoothing
  2. Increase log(λ) to 6 and watch the signal become very smooth
  3. Decrease log(λ) to 0 and see it follow the noisy data closely
  4. Try different difference orders (d = 1, 2, 3) and notice how they affect the result
  5. Compare the smoothed signal with the true underlying signal (green line)

Notice how you can find a λ value that removes noise while preserving the peak shapes.

What does the difference matrix D look like?

The difference matrix D implements the difference operator. For second differences (d=2) and n=6 points, it looks like:

D = [ 1 -2 1 0 0 0 ]
[ 0 1 -2 1 0 0 ]
[ 0 0 1 -2 1 0 ]
[ 0 0 0 1 -2 1 ]

Each row computes one second difference: Δ²zi = zi - 2zi+1 + zi+2

For first differences (d=1):

D = [ 1 -1 0 0 0 0 ]
[ 0 1 -1 0 0 0 ]
[ 0 0 1 -1 0 0 ]
[ 0 0 0 1 -1 0 ]
[ 0 0 0 0 1 -1 ]

Notice how sparse these matrices are—mostly zeros with just a few -1, 1, -2 entries. This is why Whittaker is computationally efficient.

Choosing λ: how much to smooth?

The key question: how do I choose λ?

There’s no universal answer. The optimal λ depends on:

  • The noise level in your data
  • The feature size you want to preserve
  • Your application (exploratory vs. quantitative analysis)

Practical guidelines

Start with log(λ) in the range 0 to 6:

  • log(λ) = 0-2 (λ = 1-100): Light smoothing, preserves sharp features
  • log(λ) = 2-4 (λ = 100-10,000): Moderate smoothing, good for typical spectra
  • log(λ) = 4-6 (λ = 10,000-1,000,000): Heavy smoothing, for very noisy data
  • log(λ) > 6: Extreme smoothing, may over-smooth

Visual inspection:

  1. Plot the original and smoothed signals
  2. Increase λ gradually
  3. Stop when noise is reduced but peaks still look sharp

Cross-validation (for quantitative models):

  1. Smooth your data with different λ values
  2. Build a calibration model (e.g., PLS regression) on each
  3. Choose the λ that gives the best prediction error on test data

Residual analysis:

  1. Compute residuals: r = y - z (difference between data and smooth)
  2. Plot the residuals
  3. If they look like random noise → good smoothing
  4. If they show structure (peaks, trends) → you’ve over-smoothed

Code examples

import numpy as np
import matplotlib.pyplot as plt
from scipy import sparse
from scipy.sparse.linalg import spsolve
def whittaker_smooth(y, lam=1e4, d=2):
"""
Apply Whittaker smoothing to a 1D array.
Parameters:
-----------
y : array-like
Input signal to be smoothed
lam : float
Smoothing parameter (lambda). Larger = more smoothing.
Typical range: 1e0 to 1e8
d : int
Order of the difference penalty (typically 2)
Returns:
--------
z : array
Smoothed signal
"""
n = len(y)
# Create the difference matrix D
E = sparse.eye(n, format='csc')
D = sparse.diags([1, -2, 1], [0, 1, 2], shape=(n-2, n), format='csc')
# For higher order differences, apply D multiple times
for _ in range(d - 2):
D = D[1:] - D[:-1]
# Solve: (I + λD'D)z = y
coefMat = E + lam * D.T @ D
z = spsolve(coefMat, y)
return z
# Example: Generate noisy spectrum
np.random.seed(42)
wavelength = np.linspace(400, 800, 400) # 400-800 nm
true_signal = 2 * np.exp(-((wavelength - 500)**2) / 5000) + \
1 * np.exp(-((wavelength - 600)**2) / 3000)
noise = np.random.normal(0, 0.1, len(wavelength))
noisy_spectrum = true_signal + noise
# Apply Whittaker smoothing with different λ values
smooth_lam1e2 = whittaker_smooth(noisy_spectrum, lam=1e2, d=2)
smooth_lam1e4 = whittaker_smooth(noisy_spectrum, lam=1e4, d=2)
smooth_lam1e6 = whittaker_smooth(noisy_spectrum, lam=1e6, d=2)
# Plot results
plt.figure(figsize=(10, 6))
plt.plot(wavelength, noisy_spectrum, alpha=0.5, label='Noisy', color='gray')
plt.plot(wavelength, smooth_lam1e2, label='λ=1e2 (light)', linewidth=2)
plt.plot(wavelength, smooth_lam1e4, label='λ=1e4 (moderate)', linewidth=2)
plt.plot(wavelength, smooth_lam1e6, label='λ=1e6 (heavy)', linewidth=2)
plt.plot(wavelength, true_signal, '--', label='True signal',
color='green', linewidth=2, alpha=0.7)
plt.xlabel('Wavelength (nm)')
plt.ylabel('Absorbance')
plt.title('Whittaker Smoothing')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Advantages of Whittaker smoothing

Excellent peak preservation — Maintains sharp features better than moving average

Theoretically sound — Based on clear optimization principles

Flexible — λ parameter provides precise control over smoothing

Fast — Closed-form solution, no iteration required

Handles unequally spaced data — Works even when x-values aren’t uniform

Smooth derivatives — The smoothed signal has smooth derivatives (unlike moving average)

Well-studied — Extensive literature and proven track record

Disadvantages of Whittaker smoothing

Parameter choice — Requires choosing λ (though this also gives you control)

Less intuitive — Harder to explain than “average your neighbors”

Requires linear algebra — More complex to implement from scratch

Memory for large datasets — Sparse matrices needed for very large n (though still manageable)

Comparison with other methods

FeatureMoving AverageSavitzky-GolayWhittakerGaussian
Peak preservationPoorExcellentExcellentGood
FlexibilityLowModerateHighLow
Theoretical basisNonePolynomial fittingOptimizationWeighted average
Parameters1 (window)2 (window, order)1-2 (λ, d)1 (σ)
SpeedFastestFastFastFast
Unequal spacingNoNoYesNo
Best forQuick testsSpectroscopyGeneral useImages

When to use Whittaker smoothing

Whittaker smoothing is an excellent choice when:

  • You need excellent peak preservation with controlled smoothing
  • You want a theoretically principled approach
  • You’re willing to tune one parameter (λ)
  • You need to smooth unequally spaced data
  • You want smooth derivatives for further analysis
  • You need reproducible, automated smoothing (once λ is chosen)

Avoid Whittaker when:

  • You need the simplest possible explanation (use moving average)
  • You want to compute derivatives directly (use Savitzky-Golay)
  • You have extremely large datasets and limited memory

Real-world applications

Whittaker smoothing is widely used in:

Spectroscopy: NIR, Raman, IR spectra preprocessing

Chromatography: Smoothing chromatographic peaks

Environmental science: Smoothing time series data

Medical imaging: Signal processing of physiological signals

Remote sensing: Smoothing satellite time series data

Extensions: weighted Whittaker

You can assign different weights to different points if some measurements are more reliable:

This becomes: (W + λDTD)z = Wy where W is a diagonal matrix of weights.

This weighted variant is the foundation for more advanced techniques like Asymmetric Least Squares (AsLS) for baseline correction, which you can explore in the baseline correction section.

Common mistakes to avoid

λ too small → Not enough smoothing, noise remains

λ too large → Over-smoothing, peaks disappear

Different λ for train/test → Inconsistent preprocessing

Using d=1 → Creates piecewise linear (jagged) results

Not using sparse matrices → Slow for large n

Forgetting to validate → Always check residuals

Next steps

Now that you understand Whittaker smoothing, you can explore:

For most spectroscopic applications, Whittaker and Savitzky-Golay are the two workhorses of smoothing. Whittaker gives you precise control via λ, while Savitzky-Golay excels at computing smooth derivatives. Try both and see which works better for your data.

Further reading

  • Whittaker, E. T. (1923). “On a New Method of Graduation”. Proceedings of the Edinburgh Mathematical Society, 41, 63-75.
    • The original 1923 paper introducing penalized least squares for smoothing
  • Eilers, P. H. C., & Marx, B. D. (1996). “Flexible smoothing with B-splines and penalties”. Statistical Science, 11(2), 89-121.
    • The P-splines framework that laid the groundwork for Eilers’ revival of Whittaker smoothing
  • Eilers, P. H. C. (2003). “A perfect smoother”. Analytical Chemistry, 75(14), 3631-3636.
    • The modern reference on Whittaker smoothing
  • Eilers, P. H. C., & Boelens, H. F. M. (2005). “Baseline correction with asymmetric least squares smoothing”.
    • Extension to baseline correction (AsLS)