Whittaker Smoothing
The mathematical foundations of this smoother trace back to Edmund Taylor Whittaker (1873—1956), a British mathematician and astronomer who spent most of his career at the University of Edinburgh. In 1923, Whittaker published “On a New Method of Graduation” in the Proceedings of the Edinburgh Mathematical Society (volume 41, pages 63—75). The word “graduation” came from actuarial science, where smoothing mortality tables to remove statistical fluctuations was called graduating the data. Whittaker’s idea was to frame smoothing as a penalized least squares problem: find the curve that balances fidelity to the observed values against a penalty on roughness, measured by the sum of squared finite differences. The formulation was elegant, but in 1923 there were no electronic computers, and solving the resulting linear systems by hand was impractical for all but the smallest datasets. The method was largely forgotten.
Eight decades later, Paul H.C. Eilers at Leiden University Medical Centre rediscovered Whittaker’s framework and showed it was ideally suited to modern computation. His 2003 paper “A Perfect Smoother” in Analytical Chemistry (volume 75, pages 3631—3636) demonstrated that Whittaker’s penalized least squares formulation, combined with sparse matrix techniques, produced a smoother that was fast, trivially easy to implement, and controlled by a single intuitive parameter . Eilers’ key contribution was using difference penalties rather than derivative penalties, which reduced the entire algorithm to solving a banded linear system — just a few lines of code in MATLAB, R, or Python. This was not Eilers’ first encounter with penalized methods: together with Brian Marx, he had introduced P-splines (penalized B-splines) in 1996, a framework that also relied on difference penalties for flexible curve fitting. The Whittaker smoother can be understood as the limiting case of P-splines where every data point has its own basis function.
Eilers’ revival turned an obscure 1923 idea into one of the most practical smoothing tools in modern chemometrics. Today, Whittaker smoothing and its weighted variants underpin methods for baseline correction (Asymmetric Least Squares), signal denoising, and even two-dimensional smoothing of spectral images.
The optimization approach to smoothing
Moving average smoothing is simple and intuitive, but it has limitations. The main problem? It treats smoothing as a purely local operation, averaging nearby points without considering the global structure of the signal. This leads to peak broadening and loss of resolution.
What if instead of mechanically averaging neighbors, we posed smoothing as an optimization problem?
Find the smoothest curve that still fits the data well.
This is exactly what Whittaker smoothing does. As described above, Whittaker framed the problem as penalized least squares in 1923, and Eilers revived it in 2003 as the “perfect smoother” for modern applications.
The core idea: balancing two goals
Whittaker smoothing recognizes that smoothing involves a fundamental tradeoff between two competing objectives:
- Fidelity to data — The smoothed signal should be close to the observed (noisy) data
- Smoothness — The smoothed signal should be, well, smooth
These goals conflict. Perfect fidelity means no smoothing (keeping all the noise). Perfect smoothness means a straight line (losing all features). We need to balance them.
Mathematically, we minimize:
Where:
- y = your noisy observed data (n points)
- z = the smoothed signal we’re trying to find
- λ (lambda) = smoothing parameter that controls the tradeoff
- Δd = d-th order difference operator
- d = difference order (typically 2)
The fidelity term
This is the sum of squared differences between the observed data and the smooth signal. It’s small when z is close to y (good fit) and large when z differs from y (poor fit).
The smoothness penalty
This penalizes roughness in the smoothed signal. The difference operator Δ measures how much the signal changes:
- First difference (d=1): Δzi = zi+1 - zi (measures slope)
- Second difference (d=2): Δ²zi = zi+2 - 2zi+1 + zi (measures curvature)
- Third difference (d=3): Δ³zi = zi+3 - 3zi+2 + 3zi+1 - zi
The λ parameter: the key to the tradeoff
The smoothing parameter λ controls the balance:
- λ = 0 → No penalty for roughness → Perfect fit to data (no smoothing)
- λ = ∞ → Infinite penalty for roughness → Perfectly smooth (straight line)
- λ = “just right” → Balanced smoothing that reduces noise but preserves features
In practice, λ values typically range from 100 to 108, which is why it’s often specified on a log scale.
The solution: a simple linear system
Here’s the remarkable thing: even though this looks like a complex optimization problem, it has a closed-form solution. No iterative optimization needed.
Taking the derivative of S with respect to z and setting it to zero gives us:
Where:
- I = identity matrix (n × n)
- D = difference matrix (implements the Δd operator)
- z = smoothed signal (what we want)
- y = noisy observed data
This is just a linear system Ax = b. We can solve it directly:
Interactive visualization
Let’s explore how the λ parameter and difference order affect smoothing. The visualization below lets you adjust both parameters and see the results in real-time:
Try this:
- Start with log(λ) = 2 (λ = 100) and observe moderate smoothing
- Increase log(λ) to 6 and watch the signal become very smooth
- Decrease log(λ) to 0 and see it follow the noisy data closely
- Try different difference orders (d = 1, 2, 3) and notice how they affect the result
- Compare the smoothed signal with the true underlying signal (green line)
Notice how you can find a λ value that removes noise while preserving the peak shapes.
What does the difference matrix D look like?
The difference matrix D implements the difference operator. For second differences (d=2) and n=6 points, it looks like:
D = [ 1 -2 1 0 0 0 ] [ 0 1 -2 1 0 0 ] [ 0 0 1 -2 1 0 ] [ 0 0 0 1 -2 1 ]Each row computes one second difference: Δ²zi = zi - 2zi+1 + zi+2
For first differences (d=1):
D = [ 1 -1 0 0 0 0 ] [ 0 1 -1 0 0 0 ] [ 0 0 1 -1 0 0 ] [ 0 0 0 1 -1 0 ] [ 0 0 0 0 1 -1 ]Notice how sparse these matrices are—mostly zeros with just a few -1, 1, -2 entries. This is why Whittaker is computationally efficient.
Choosing λ: how much to smooth?
The key question: how do I choose λ?
There’s no universal answer. The optimal λ depends on:
- The noise level in your data
- The feature size you want to preserve
- Your application (exploratory vs. quantitative analysis)
Practical guidelines
Start with log(λ) in the range 0 to 6:
- log(λ) = 0-2 (λ = 1-100): Light smoothing, preserves sharp features
- log(λ) = 2-4 (λ = 100-10,000): Moderate smoothing, good for typical spectra
- log(λ) = 4-6 (λ = 10,000-1,000,000): Heavy smoothing, for very noisy data
- log(λ) > 6: Extreme smoothing, may over-smooth
Visual inspection:
- Plot the original and smoothed signals
- Increase λ gradually
- Stop when noise is reduced but peaks still look sharp
Cross-validation (for quantitative models):
- Smooth your data with different λ values
- Build a calibration model (e.g., PLS regression) on each
- Choose the λ that gives the best prediction error on test data
Residual analysis:
- Compute residuals: r = y - z (difference between data and smooth)
- Plot the residuals
- If they look like random noise → good smoothing
- If they show structure (peaks, trends) → you’ve over-smoothed
Code examples
import numpy as npimport matplotlib.pyplot as pltfrom scipy import sparsefrom scipy.sparse.linalg import spsolve
def whittaker_smooth(y, lam=1e4, d=2): """ Apply Whittaker smoothing to a 1D array.
Parameters: ----------- y : array-like Input signal to be smoothed lam : float Smoothing parameter (lambda). Larger = more smoothing. Typical range: 1e0 to 1e8 d : int Order of the difference penalty (typically 2)
Returns: -------- z : array Smoothed signal """ n = len(y)
# Create the difference matrix D E = sparse.eye(n, format='csc') D = sparse.diags([1, -2, 1], [0, 1, 2], shape=(n-2, n), format='csc')
# For higher order differences, apply D multiple times for _ in range(d - 2): D = D[1:] - D[:-1]
# Solve: (I + λD'D)z = y coefMat = E + lam * D.T @ D z = spsolve(coefMat, y)
return z
# Example: Generate noisy spectrumnp.random.seed(42)wavelength = np.linspace(400, 800, 400) # 400-800 nmtrue_signal = 2 * np.exp(-((wavelength - 500)**2) / 5000) + \ 1 * np.exp(-((wavelength - 600)**2) / 3000)noise = np.random.normal(0, 0.1, len(wavelength))noisy_spectrum = true_signal + noise
# Apply Whittaker smoothing with different λ valuessmooth_lam1e2 = whittaker_smooth(noisy_spectrum, lam=1e2, d=2)smooth_lam1e4 = whittaker_smooth(noisy_spectrum, lam=1e4, d=2)smooth_lam1e6 = whittaker_smooth(noisy_spectrum, lam=1e6, d=2)
# Plot resultsplt.figure(figsize=(10, 6))plt.plot(wavelength, noisy_spectrum, alpha=0.5, label='Noisy', color='gray')plt.plot(wavelength, smooth_lam1e2, label='λ=1e2 (light)', linewidth=2)plt.plot(wavelength, smooth_lam1e4, label='λ=1e4 (moderate)', linewidth=2)plt.plot(wavelength, smooth_lam1e6, label='λ=1e6 (heavy)', linewidth=2)plt.plot(wavelength, true_signal, '--', label='True signal', color='green', linewidth=2, alpha=0.7)plt.xlabel('Wavelength (nm)')plt.ylabel('Absorbance')plt.title('Whittaker Smoothing')plt.legend()plt.grid(True, alpha=0.3)plt.show()function z = whittaker_smooth(y, lam, d) % Apply Whittaker smoothing to a 1D array % % Parameters: % y - Input signal to be smoothed (vector) % lam - Smoothing parameter (lambda). Larger = more smoothing. % Typical range: 1e0 to 1e8 % d - Order of the difference penalty (typically 2) % % Returns: % z - Smoothed signal
if nargin < 3 d = 2; % Default to second-order differences end if nargin < 2 lam = 1e4; % Default lambda end
n = length(y);
% Create the difference matrix D E = speye(n); % Sparse identity matrix D = spdiags(ones(n-2, 1) * [1 -2 1], 0:2, n-2, n);
% For higher order differences, compose D with itself for i = 1:(d-2) D = diff(D, 1, 1); end
% Solve the linear system: (I + λD'D)z = y coefMat = E + lam * (D' * D); z = coefMat \ y(:); % Backslash operator solves the system efficientlyend
% Example: Generate noisy spectrumrng(42); % For reproducibilitywavelength = linspace(400, 800, 400); % 400-800 nmtrue_signal = 2 * exp(-((wavelength - 500).^2) / 5000) + ... 1 * exp(-((wavelength - 600).^2) / 3000);noise = 0.1 * randn(size(wavelength));noisy_spectrum = true_signal + noise;
% Apply Whittaker smoothing with different λ valuessmooth_lam1e2 = whittaker_smooth(noisy_spectrum, 1e2, 2);smooth_lam1e4 = whittaker_smooth(noisy_spectrum, 1e4, 2);smooth_lam1e6 = whittaker_smooth(noisy_spectrum, 1e6, 2);
% Plot resultsfigure;plot(wavelength, noisy_spectrum, 'Color', [0.5 0.5 0.5], ... 'DisplayName', 'Noisy');hold on;plot(wavelength, smooth_lam1e2, 'LineWidth', 2, ... 'DisplayName', '\lambda=10^2 (light)');plot(wavelength, smooth_lam1e4, 'LineWidth', 2, ... 'DisplayName', '\lambda=10^4 (moderate)');plot(wavelength, smooth_lam1e6, 'LineWidth', 2, ... 'DisplayName', '\lambda=10^6 (heavy)');plot(wavelength, true_signal, '--', 'Color', 'g', 'LineWidth', 2, ... 'DisplayName', 'True signal');xlabel('Wavelength (nm)');ylabel('Absorbance');title('Whittaker Smoothing');legend('Location', 'best');grid on;whittaker_smooth <- function(y, lambda = 1e4, d = 2) { # Apply Whittaker smoothing to a 1D array # # Parameters: # y - Input signal to be smoothed (numeric vector) # lambda - Smoothing parameter. Larger = more smoothing. # Typical range: 1e0 to 1e8 # d - Order of the difference penalty (typically 2) # # Returns: # z - Smoothed signal
library(Matrix) # For sparse matrices
n <- length(y)
# Create the difference matrix D E <- Diagonal(n) # Sparse identity matrix
# Create d-th order difference matrix D <- diff(E, differences = d)
# Solve the linear system: (I + λD'D)z = y coefMat <- E + lambda * t(D) %*% D z <- as.vector(solve(coefMat, y))
return(z)}
# Example: Generate noisy spectrumset.seed(42)wavelength <- seq(400, 800, length.out = 400) # 400-800 nmtrue_signal <- 2 * exp(-((wavelength - 500)^2) / 5000) + 1 * exp(-((wavelength - 600)^2) / 3000)noise <- rnorm(length(wavelength), mean = 0, sd = 0.1)noisy_spectrum <- true_signal + noise
# Apply Whittaker smoothing with different λ valuessmooth_lam1e2 <- whittaker_smooth(noisy_spectrum, lambda = 1e2, d = 2)smooth_lam1e4 <- whittaker_smooth(noisy_spectrum, lambda = 1e4, d = 2)smooth_lam1e6 <- whittaker_smooth(noisy_spectrum, lambda = 1e6, d = 2)
# Plot resultsplot(wavelength, noisy_spectrum, type = 'l', col = 'gray', xlab = 'Wavelength (nm)', ylab = 'Absorbance', main = 'Whittaker Smoothing')lines(wavelength, smooth_lam1e2, col = 'blue', lwd = 2)lines(wavelength, smooth_lam1e4, col = 'red', lwd = 2)lines(wavelength, smooth_lam1e6, col = 'purple', lwd = 2)lines(wavelength, true_signal, col = 'green', lwd = 2, lty = 2)legend('topright', legend = c('Noisy', 'λ=10² (light)', 'λ=10⁴ (moderate)', 'λ=10⁶ (heavy)', 'True signal'), col = c('gray', 'blue', 'red', 'purple', 'green'), lwd = c(1, 2, 2, 2, 2), lty = c(1, 1, 1, 1, 2))grid()Advantages of Whittaker smoothing
✅ Excellent peak preservation — Maintains sharp features better than moving average
✅ Theoretically sound — Based on clear optimization principles
✅ Flexible — λ parameter provides precise control over smoothing
✅ Fast — Closed-form solution, no iteration required
✅ Handles unequally spaced data — Works even when x-values aren’t uniform
✅ Smooth derivatives — The smoothed signal has smooth derivatives (unlike moving average)
✅ Well-studied — Extensive literature and proven track record
Disadvantages of Whittaker smoothing
❌ Parameter choice — Requires choosing λ (though this also gives you control)
❌ Less intuitive — Harder to explain than “average your neighbors”
❌ Requires linear algebra — More complex to implement from scratch
❌ Memory for large datasets — Sparse matrices needed for very large n (though still manageable)
Comparison with other methods
| Feature | Moving Average | Savitzky-Golay | Whittaker | Gaussian |
|---|---|---|---|---|
| Peak preservation | Poor | Excellent | Excellent | Good |
| Flexibility | Low | Moderate | High | Low |
| Theoretical basis | None | Polynomial fitting | Optimization | Weighted average |
| Parameters | 1 (window) | 2 (window, order) | 1-2 (λ, d) | 1 (σ) |
| Speed | Fastest | Fast | Fast | Fast |
| Unequal spacing | No | No | Yes | No |
| Best for | Quick tests | Spectroscopy | General use | Images |
When to use Whittaker smoothing
Whittaker smoothing is an excellent choice when:
- You need excellent peak preservation with controlled smoothing
- You want a theoretically principled approach
- You’re willing to tune one parameter (λ)
- You need to smooth unequally spaced data
- You want smooth derivatives for further analysis
- You need reproducible, automated smoothing (once λ is chosen)
Avoid Whittaker when:
- You need the simplest possible explanation (use moving average)
- You want to compute derivatives directly (use Savitzky-Golay)
- You have extremely large datasets and limited memory
Real-world applications
Whittaker smoothing is widely used in:
Spectroscopy: NIR, Raman, IR spectra preprocessing
Chromatography: Smoothing chromatographic peaks
Environmental science: Smoothing time series data
Medical imaging: Signal processing of physiological signals
Remote sensing: Smoothing satellite time series data
Extensions: weighted Whittaker
You can assign different weights to different points if some measurements are more reliable:
This becomes: (W + λDTD)z = Wy where W is a diagonal matrix of weights.
This weighted variant is the foundation for more advanced techniques like Asymmetric Least Squares (AsLS) for baseline correction, which you can explore in the baseline correction section.
Common mistakes to avoid
❌ λ too small → Not enough smoothing, noise remains
❌ λ too large → Over-smoothing, peaks disappear
❌ Different λ for train/test → Inconsistent preprocessing
❌ Using d=1 → Creates piecewise linear (jagged) results
❌ Not using sparse matrices → Slow for large n
❌ Forgetting to validate → Always check residuals
Next steps
Now that you understand Whittaker smoothing, you can explore:
- Moving Average — The simplest smoothing baseline
- Savitzky-Golay smoothing — Polynomial fitting approach
- Gaussian smoothing — Weighted averaging with Gaussian kernel
For most spectroscopic applications, Whittaker and Savitzky-Golay are the two workhorses of smoothing. Whittaker gives you precise control via λ, while Savitzky-Golay excels at computing smooth derivatives. Try both and see which works better for your data.
Further reading
- Whittaker, E. T. (1923). “On a New Method of Graduation”. Proceedings of the Edinburgh Mathematical Society, 41, 63-75.
- The original 1923 paper introducing penalized least squares for smoothing
- Eilers, P. H. C., & Marx, B. D. (1996). “Flexible smoothing with B-splines and penalties”. Statistical Science, 11(2), 89-121.
- The P-splines framework that laid the groundwork for Eilers’ revival of Whittaker smoothing
- Eilers, P. H. C. (2003). “A perfect smoother”. Analytical Chemistry, 75(14), 3631-3636.
- The modern reference on Whittaker smoothing
- Eilers, P. H. C., & Boelens, H. F. M. (2005). “Baseline correction with asymmetric least squares smoothing”.
- Extension to baseline correction (AsLS)