Savitzky-Golay Smoothing

In 1964, two scientists at the Perkin-Elmer Corporation -- one of the leading analytical instrument manufacturers of the era -- published a paper that would become one of the most cited in the history of analytical chemistry. Abraham Savitzky, a physicist and computer scientist, and Marcel J.E. Golay, a Swiss-born physicist and information theorist, authored "Smoothing and Differentiation of Data by Simplified Least Squares Procedures" in Analytical Chemistry (volume 36, pages 1627--1639). Their timing was not accidental. By the mid-1960s, analytical instruments were producing digital output that could be fed directly into the early minicomputers appearing in laboratories, and practitioners urgently needed practical algorithms for cleaning up noisy spectra without distorting the peaks that carried chemical information.

What made the Savitzky-Golay method revolutionary was its core insight: instead of crudely averaging neighboring points (as a moving average does), you could fit a local polynomial through a sliding window of data and read off the smoothed value from the polynomial at the center point. This preserved peak shapes, heights, and widths far better than simple averaging. Better still, Savitzky and Golay showed that the entire operation could be reduced to a set of precomputed convolution coefficients -- integer weights that could be looked up in a table and applied with minimal computation, a critical advantage on the limited hardware of the 1960s.

The original paper did contain errors in some of the published coefficient tables, a fact corrected by Steinier, Termonia, and Deltour in 1972 and later by others. But neither the errata nor the passage of six decades have diminished the method's dominance: Savitzky-Golay smoothing remains the standard preprocessing tool in spectroscopy, chromatography, and signal processing whenever peak fidelity matters.

The peak preservation problem

You've learned about moving average smoothing and Whittaker smoothing. Both methods reduce noise effectively, but they face a common challenge: how do you smooth data without destroying the peaks you care about?

Moving average has a particularly bad habit: it broadens and flattens peaks. If you have two nearby peaks that are barely resolved, moving average can blur them together into one blob. This is a disaster for spectroscopy, where peak positions, heights, and widths carry crucial chemical information!

Whittaker smoothing does better, but requires matrix operations and parameter tuning. Wouldn't it be nice to have something that:

Preserves peak shapes like Whittaker
Is as simple to understand as moving average
Has a solid mathematical foundation
Can even compute derivatives while smoothing

Enter Savitzky-Golay smoothing — the method that does all of this and more!

The big idea: local polynomial fitting

The Savitzky-Golay (SG) method takes a completely different approach from simple averaging. Instead of just averaging neighbors, it:

Fits a polynomial to each local window, then uses that polynomial to compute the smoothed value.

Let's break this down with an example. Suppose you're at point i and want to smooth it using a window of 5 points (points i-2, i-1, i, i+1, i+2):

Moving average says:

"Just average these 5 values: (y_{i-2} + y_{i-1} + y_i + y_{i+1} + y_{i+2}) / 5"
Simple, but treats all points equally and ignores the local trend

Savitzky-Golay says:

"Fit a polynomial (like y = a + bx + cx²) through these 5 points"
"Evaluate that polynomial at the center point"
"Use that as your smoothed value"

The polynomial captures the local trend, so if you're on a rising slope, the smooth value reflects that slope. If you're at a peak, the polynomial captures the curvature. This is why SG preserves features so well!

A visual analogy

Imagine you're hiking in fog and trying to figure out the terrain:

Moving average: You ask 5 people around you their altitude and take the average. If you're on a hill, this average might be lower than your actual position!
Savitzky-Golay: You ask those 5 people their altitudes, fit a smooth curve through all the points, and read off the curve at your position. This captures whether you're going uphill, downhill, or at a peak.

The mathematics (simplified)

Don't worry — you don't need to fit polynomials manually! The beauty of SG is that it can be computed as a weighted average, just like moving average, but with special weights.

For a window of size w and polynomial order p, there exists a set of coefficients c₋ₘ, ..., c₀, ..., cₘ such that:

\overset{y}{^}_{i} = j = - m \sum m c_{j} \cdot y_{i + j}

where m = (w-1)/2.

The key difference from moving average: These coefficients aren't all equal (like 1/w). Instead, they're computed to give the best polynomial fit. The math behind computing these coefficients involves least squares (sound familiar?), but the coefficients are precomputed and tabulated.

You don't need to compute them yourself — just look them up or use a library!

Connection to least squares

Remember the least squares optimization we studied? SG smoothing uses exactly that principle! For each window:

Set up a design matrix X with polynomial terms [1, x, x², x³, ...]
Solve the least squares problem: minimize ||y - Xβ||²
Evaluate the fitted polynomial at the center point

The SG coefficients are derived by solving this least squares problem analytically and extracting the center point. This is why SG is sometimes called "least squares polynomial smoothing."

Example coefficients

For a window size of 5 and polynomial order 2 (quadratic), the SG coefficients are:

[- 3, 12, 17, 12, - 3] \div 35

So the smoothed value is:

\overset{y}{^}_{i} = \frac{- 3 y _{i - 2} + 12 y _{i - 1} + 17 y _{i} + 12 y _{i + 1} - 3 y _{i + 2}}{35}

Notice:

The center point gets the highest weight (17/35 ≈ 0.49)
Immediate neighbors get good weight (12/35 ≈ 0.34 each)
Edge points actually get negative weight (-3/35)!

This pattern captures the local curvature and is why SG preserves peaks so well.

Parameters: window size and polynomial order

SG smoothing has two parameters to tune:

1. Window size (w)

Just like moving average, the window size controls how many neighbors you use:

Small window (5-7 points): Light smoothing, preserves fine details
Medium window (9-15 points): Moderate smoothing, good for typical spectra
Large window (17-25 points): Heavy smoothing, for very noisy data

Rule of thumb: The window should be larger than the noise fluctuations but smaller than your narrowest feature.

2. Polynomial order (p)

This is new! It controls the flexibility of the fitted polynomial:

Order 2 (quadratic): Most common, captures curvature around peaks
Order 3 (cubic): More flexible, can capture asymmetric peaks
Order 4 (quartic): Very flexible, but can overfit to noise
Order 5+: Rarely needed, high risk of overfitting

Rule of thumb:

Start with order 2 or 3
Higher orders need larger windows (otherwise you're fitting 5 parameters with 5 points = overfitting!)
Typical constraint: w ≥ p + 2 (preferably w ≥ 2p + 1)

Don't overfit!

A polynomial of order p has p+1 coefficients. If your window size is barely larger than p+1, you're fitting almost as many parameters as data points! This defeats the purpose of smoothing.

Example: With window size 5, don't use polynomial order 4 or higher. Stick with order 2 or 3.

Interactive visualization

Coming soon! An interactive tool where you can:

Adjust window size and polynomial order
Compare SG with moving average side-by-side
See how SG preserves peak heights and widths
Visualize the fitted polynomial in each window

Code examples

import numpy as np
from scipy.signal import savgol_filter
import matplotlib.pyplot as plt

# Generate noisy spectrum with sharp peaks
np.random.seed(42)
x = np.linspace(0, 100, 500)
true_signal = (0.8 * np.exp(-((x - 25)**2) / 20) +
               0.6 * np.exp(-((x - 50)**2) / 15) +
               0.9 * np.exp(-((x - 75)**2) / 25))
noise = np.random.normal(0, 0.08, len(x))
noisy_signal = true_signal + noise

# Apply Savitzky-Golay smoothing
# scipy.signal.savgol_filter(data, window_length, polyorder)
sg_smooth = savgol_filter(noisy_signal, window_length=11, polyorder=2)

# Compare different parameters
sg_w5_p2 = savgol_filter(noisy_signal, window_length=5, polyorder=2)
sg_w11_p2 = savgol_filter(noisy_signal, window_length=11, polyorder=2)
sg_w11_p3 = savgol_filter(noisy_signal, window_length=11, polyorder=3)
sg_w21_p2 = savgol_filter(noisy_signal, window_length=21, polyorder=2)

# Plot results
plt.figure(figsize=(12, 8))

plt.subplot(2, 2, 1)
plt.plot(x, noisy_signal, alpha=0.5, label='Noisy', color='gray')
plt.plot(x, sg_w5_p2, label='w=5, p=2', linewidth=2)
plt.plot(x, true_signal, '--', label='True', color='black', alpha=0.7)
plt.title('Light smoothing (w=5, p=2)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(2, 2, 2)
plt.plot(x, noisy_signal, alpha=0.5, label='Noisy', color='gray')
plt.plot(x, sg_w11_p2, label='w=11, p=2', linewidth=2)
plt.plot(x, true_signal, '--', label='True', color='black', alpha=0.7)
plt.title('Moderate smoothing (w=11, p=2)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(2, 2, 3)
plt.plot(x, noisy_signal, alpha=0.5, label='Noisy', color='gray')
plt.plot(x, sg_w11_p3, label='w=11, p=3', linewidth=2)
plt.plot(x, true_signal, '--', label='True', color='black', alpha=0.7)
plt.title('Higher polynomial (w=11, p=3)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(2, 2, 4)
plt.plot(x, noisy_signal, alpha=0.5, label='Noisy', color='gray')
plt.plot(x, sg_w21_p2, label='w=21, p=2', linewidth=2)
plt.plot(x, true_signal, '--', label='True', color='black', alpha=0.7)
plt.title('Heavy smoothing (w=21, p=2)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example: Computing derivatives while smoothing
# The deriv parameter computes the derivative of the polynomial
first_derivative = savgol_filter(noisy_signal, window_length=11, polyorder=2, deriv=1)
second_derivative = savgol_filter(noisy_signal, window_length=11, polyorder=2, deriv=2)

plt.figure(figsize=(10, 6))
plt.subplot(3, 1, 1)
plt.plot(x, noisy_signal, alpha=0.5, color='gray')
plt.plot(x, sg_smooth, linewidth=2)
plt.ylabel('Signal')
plt.title('Savitzky-Golay: Smoothing and Derivatives')
plt.grid(True, alpha=0.3)

plt.subplot(3, 1, 2)
plt.plot(x, first_derivative, linewidth=2, color='red')
plt.ylabel('1st Derivative')
plt.grid(True, alpha=0.3)

plt.subplot(3, 1, 3)
plt.plot(x, second_derivative, linewidth=2, color='green')
plt.ylabel('2nd Derivative')
plt.xlabel('Data Point')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

% Savitzky-Golay smoothing in MATLAB
% MATLAB has built-in sgolayfilt function

% Generate noisy spectrum with sharp peaks
rng(42);
x = linspace(0, 100, 500);
true_signal = 0.8 * exp(-((x - 25).^2) / 20) + ...
              0.6 * exp(-((x - 50).^2) / 15) + ...
              0.9 * exp(-((x - 75).^2) / 25);
noise = 0.08 * randn(size(x));
noisy_signal = true_signal + noise;

% Apply Savitzky-Golay smoothing
% sgolayfilt(data, polynomial_order, window_length)
sg_smooth = sgolayfilt(noisy_signal, 2, 11);

% Compare different parameters
sg_w5_p2 = sgolayfilt(noisy_signal, 2, 5);
sg_w11_p2 = sgolayfilt(noisy_signal, 2, 11);
sg_w11_p3 = sgolayfilt(noisy_signal, 3, 11);
sg_w21_p2 = sgolayfilt(noisy_signal, 2, 21);

% Plot results
figure('Position', [100 100 1200 800]);

subplot(2, 2, 1);
plot(x, noisy_signal, 'Color', [0.5 0.5 0.5 0.5], 'DisplayName', 'Noisy');
hold on;
plot(x, sg_w5_p2, 'LineWidth', 2, 'DisplayName', 'w=5, p=2');
plot(x, true_signal, '--k', 'LineWidth', 1.5, 'DisplayName', 'True');
title('Light smoothing (w=5, p=2)');
legend('Location', 'best');
grid on;

subplot(2, 2, 2);
plot(x, noisy_signal, 'Color', [0.5 0.5 0.5 0.5], 'DisplayName', 'Noisy');
hold on;
plot(x, sg_w11_p2, 'LineWidth', 2, 'DisplayName', 'w=11, p=2');
plot(x, true_signal, '--k', 'LineWidth', 1.5, 'DisplayName', 'True');
title('Moderate smoothing (w=11, p=2)');
legend('Location', 'best');
grid on;

subplot(2, 2, 3);
plot(x, noisy_signal, 'Color', [0.5 0.5 0.5 0.5], 'DisplayName', 'Noisy');
hold on;
plot(x, sg_w11_p3, 'LineWidth', 2, 'DisplayName', 'w=11, p=3');
plot(x, true_signal, '--k', 'LineWidth', 1.5, 'DisplayName', 'True');
title('Higher polynomial (w=11, p=3)');
legend('Location', 'best');
grid on;

subplot(2, 2, 4);
plot(x, noisy_signal, 'Color', [0.5 0.5 0.5 0.5], 'DisplayName', 'Noisy');
hold on;
plot(x, sg_w21_p2, 'LineWidth', 2, 'DisplayName', 'w=21, p=2');
plot(x, true_signal, '--k', 'LineWidth', 1.5, 'DisplayName', 'True');
title('Heavy smoothing (w=21, p=2)');
legend('Location', 'best');
grid on;

% Example: Computing derivatives
% For derivatives, we need to design the SG filter manually
order = 2;
framelen = 11;
[b, g] = sgolay(order, framelen);

% Smoothed signal (0th derivative)
smoothed = conv(noisy_signal, g(:,1), 'same');

% First derivative
% Note: need to account for data spacing (dx)
dx = x(2) - x(1);
first_deriv = conv(noisy_signal, g(:,2), 'same') / dx;

% Second derivative
second_deriv = conv(noisy_signal, g(:,3), 'same') / dx^2;

figure;
subplot(3, 1, 1);
plot(x, noisy_signal, 'Color', [0.5 0.5 0.5 0.5]);
hold on;
plot(x, smoothed, 'LineWidth', 2);
ylabel('Signal');
title('Savitzky-Golay: Smoothing and Derivatives');
grid on;

subplot(3, 1, 2);
plot(x, first_deriv, 'LineWidth', 2, 'Color', 'r');
ylabel('1st Derivative');
grid on;

subplot(3, 1, 3);
plot(x, second_deriv, 'LineWidth', 2, 'Color', 'g');
ylabel('2nd Derivative');
xlabel('Data Point');
grid on;

library(signal)  # For sgolayfilt function

# Generate noisy spectrum with sharp peaks
set.seed(42)
x <- seq(0, 100, length.out = 500)
true_signal <- 0.8 * exp(-((x - 25)^2) / 20) +
               0.6 * exp(-((x - 50)^2) / 15) +
               0.9 * exp(-((x - 75)^2) / 25)
noise <- rnorm(length(x), mean = 0, sd = 0.08)
noisy_signal <- true_signal + noise

# Apply Savitzky-Golay smoothing
# sgolayfilt(x, p = polynomial_order, n = window_length)
sg_smooth <- sgolayfilt(noisy_signal, p = 2, n = 11)

# Compare different parameters
sg_w5_p2 <- sgolayfilt(noisy_signal, p = 2, n = 5)
sg_w11_p2 <- sgolayfilt(noisy_signal, p = 2, n = 11)
sg_w11_p3 <- sgolayfilt(noisy_signal, p = 3, n = 11)
sg_w21_p2 <- sgolayfilt(noisy_signal, p = 2, n = 21)

# Plot results
par(mfrow = c(2, 2))

# Plot 1
plot(x, noisy_signal, type = 'l', col = 'gray',
     main = 'Light smoothing (w=5, p=2)',
     xlab = 'Data Point', ylab = 'Signal')
lines(x, sg_w5_p2, col = 'blue', lwd = 2)
lines(x, true_signal, col = 'black', lty = 2, lwd = 1.5)
legend('topright', legend = c('Noisy', 'w=5, p=2', 'True'),
       col = c('gray', 'blue', 'black'), lwd = c(1, 2, 1.5),
       lty = c(1, 1, 2))
grid()

# Plot 2
plot(x, noisy_signal, type = 'l', col = 'gray',
     main = 'Moderate smoothing (w=11, p=2)',
     xlab = 'Data Point', ylab = 'Signal')
lines(x, sg_w11_p2, col = 'red', lwd = 2)
lines(x, true_signal, col = 'black', lty = 2, lwd = 1.5)
legend('topright', legend = c('Noisy', 'w=11, p=2', 'True'),
       col = c('gray', 'red', 'black'), lwd = c(1, 2, 1.5),
       lty = c(1, 1, 2))
grid()

# Plot 3
plot(x, noisy_signal, type = 'l', col = 'gray',
     main = 'Higher polynomial (w=11, p=3)',
     xlab = 'Data Point', ylab = 'Signal')
lines(x, sg_w11_p3, col = 'green', lwd = 2)
lines(x, true_signal, col = 'black', lty = 2, lwd = 1.5)
legend('topright', legend = c('Noisy', 'w=11, p=3', 'True'),
       col = c('gray', 'green', 'black'), lwd = c(1, 2, 1.5),
       lty = c(1, 1, 2))
grid()

# Plot 4
plot(x, noisy_signal, type = 'l', col = 'gray',
     main = 'Heavy smoothing (w=21, p=2)',
     xlab = 'Data Point', ylab = 'Signal')
lines(x, sg_w21_p2, col = 'purple', lwd = 2)
lines(x, true_signal, col = 'black', lty = 2, lwd = 1.5)
legend('topright', legend = c('Noisy', 'w=21, p=2', 'True'),
       col = c('gray', 'purple', 'black'), lwd = c(1, 2, 1.5),
       lty = c(1, 1, 2))
grid()

# Example: Computing derivatives
# R's signal package supports derivatives through the 'm' parameter
par(mfrow = c(3, 1))

# Smoothed signal (m=0, no derivative)
sg_smooth <- sgolayfilt(noisy_signal, p = 2, n = 11, m = 0)
plot(x, noisy_signal, type = 'l', col = 'gray',
     main = 'Savitzky-Golay: Smoothing and Derivatives',
     ylab = 'Signal')
lines(x, sg_smooth, col = 'blue', lwd = 2)
grid()

# First derivative (m=1)
dx <- x[2] - x[1]
first_deriv <- sgolayfilt(noisy_signal, p = 2, n = 11, m = 1) / (dx^1)
plot(x, first_deriv, type = 'l', col = 'red', lwd = 2,
     ylab = '1st Derivative')
grid()

# Second derivative (m=2)
second_deriv <- sgolayfilt(noisy_signal, p = 2, n = 11, m = 2) / (dx^2)
plot(x, second_deriv, type = 'l', col = 'green', lwd = 2,
     xlab = 'Data Point', ylab = '2nd Derivative')
grid()

The derivative bonus: smooth and differentiate simultaneously

Here's where SG becomes truly powerful: it can compute derivatives while smoothing!

Remember from calculus that differentiating amplifies noise. If you differentiate raw noisy data, you get garbage. The traditional approach is:

Smooth the data first
Then compute numerical derivatives

But SG can do both in one step! Since it fits a polynomial to each window, it can:

Evaluate the polynomial → gives smoothed value
Evaluate the polynomial's derivative → gives smoothed derivative
Evaluate the second derivative → gives smoothed second derivative

This is huge for spectroscopy where derivatives enhance peak resolution. We'll cover derivatives in detail in a later article, but know that SG is the standard tool for this.

Advantages of Savitzky-Golay

✅ Excellent peak preservation — Minimal broadening compared to moving average ✅ Mathematically rigorous — Based on local polynomial least squares ✅ Flexible — Two parameters (window, polynomial order) allow fine-tuning ✅ Derivative capability — Can smooth and differentiate simultaneously ✅ Well-established — Widely used and trusted in spectroscopy ✅ Precomputed coefficients — Fast implementation (just a convolution) ✅ Predictable behavior — Clear relationship between parameters and smoothing

Disadvantages of Savitzky-Golay

❌ Two parameters to tune — More complex than single-parameter methods ❌ Can overfit — High polynomial orders with small windows capture noise ❌ Edge effects — Needs special handling at data boundaries ❌ Not optimal for all signals — Assumes locally polynomial behavior ❌ Less intuitive — Harder to explain than "averaging neighbors"

When to use Savitzky-Golay

Savitzky-Golay is the go-to method for spectroscopy when:

✅ You have sharp peaks that must be preserved ✅ You need to compute derivatives (it's the best choice!) ✅ You want a balance between simplicity and quality ✅ You're working with typical spectroscopic data (IR, Raman, UV-Vis, etc.) ✅ You want a method that's widely recognized and accepted

Avoid SG when: ❌ Your signal isn't smooth/continuous (use median filter instead) ❌ You have extreme noise (try Whittaker with high λ) ❌ You need the absolute simplest method (use moving average)

Comparison with other methods

Feature	Moving Average	Savitzky-Golay	Whittaker
Peak preservation	Poor	Excellent	Excellent
Number of parameters	1 (window)	2 (window, order)	1-2 (λ, order)
Computation	Very fast	Fast	Fast
Derivatives	No	Yes!	Not directly
Ease of use	Easiest	Moderate	Moderate
Best for	Quick exploration	Spectroscopy	General smoothing

Choosing parameters: practical guidelines

Step 1: Choose polynomial order

Start with p = 2 (quadratic). This works for 90% of cases.

Use p = 3 if:

Your peaks are asymmetric
You need maximum flexibility
You're using a large window (≥15)

Avoid p ≥ 4 unless you really know what you're doing!

Step 2: Choose window size

Start with w = 11 for typical spectroscopic data.

Adjust based on:

More noise → increase window (15, 21, 25)
Sharper peaks → decrease window (7, 9)
Check the constraint: w ≥ 2p + 1 for safety

Step 3: Validate

Visual check: Does it look smooth without destroying peaks?
Residuals: Plot (original - smoothed). Should look like pure noise.
Peak metrics: Compare peak heights and widths before/after.
Cross-validation: If building a model, does smoothing improve prediction?

Quick reference table

Noise Level	Recommended Settings
Low noise	w = 7-9, p = 2
Moderate noise	w = 11-15, p = 2
High noise	w = 17-25, p = 2-3
Very sharp peaks	w = 5-7, p = 2
Broad peaks	w = 15-21, p = 2-3

Common mistakes to avoid

❌ Using polynomial order too high → Overfitting, captures noise as signal ❌ Window too small for the polynomial → Violates w ≥ 2p + 1 rule ❌ Window too large for features → Broadens peaks unnecessarily ❌ Not checking edge effects → Edges can behave strangely ❌ Smoothing already-smoothed data → Diminishing returns, feature loss ❌ Different parameters for training/test → Inconsistent preprocessing