Savitzky-Golay Smoothing
In 1964, two scientists at the Perkin-Elmer Corporation — one of the leading analytical instrument manufacturers of the era — published a paper that would become one of the most cited in the history of analytical chemistry. Abraham Savitzky, a physicist and computer scientist, and Marcel J.E. Golay, a Swiss-born physicist and information theorist, authored “Smoothing and Differentiation of Data by Simplified Least Squares Procedures” in Analytical Chemistry (volume 36, pages 1627—1639). Their timing was not accidental. By the mid-1960s, analytical instruments were producing digital output that could be fed directly into the early minicomputers appearing in laboratories, and practitioners urgently needed practical algorithms for cleaning up noisy spectra without distorting the peaks that carried chemical information.
What made the Savitzky-Golay method revolutionary was its core insight: instead of crudely averaging neighboring points (as a moving average does), you could fit a local polynomial through a sliding window of data and read off the smoothed value from the polynomial at the center point. This preserved peak shapes, heights, and widths far better than simple averaging. Better still, Savitzky and Golay showed that the entire operation could be reduced to a set of precomputed convolution coefficients — integer weights that could be looked up in a table and applied with minimal computation, a critical advantage on the limited hardware of the 1960s.
The original paper did contain errors in some of the published coefficient tables, a fact corrected by Steinier, Termonia, and Deltour in 1972 and later by others. But neither the errata nor the passage of six decades have diminished the method’s dominance: Savitzky-Golay smoothing remains the standard preprocessing tool in spectroscopy, chromatography, and signal processing whenever peak fidelity matters.
The peak preservation problem
You’ve learned about moving average smoothing and Whittaker smoothing. Both methods reduce noise effectively, but they face a common challenge: how do you smooth data without destroying the peaks you care about?
Moving average has a particularly bad habit: it broadens and flattens peaks. If you have two nearby peaks that are barely resolved, moving average can blur them together into one blob. This is a disaster for spectroscopy, where peak positions, heights, and widths carry crucial chemical information!
Whittaker smoothing does better, but requires matrix operations and parameter tuning. Wouldn’t it be nice to have something that:
- Preserves peak shapes like Whittaker
- Is as simple to understand as moving average
- Has a solid mathematical foundation
- Can even compute derivatives while smoothing
Enter Savitzky-Golay smoothing — the method that does all of this and more!
The big idea: local polynomial fitting
The Savitzky-Golay (SG) method takes a completely different approach from simple averaging. Instead of just averaging neighbors, it:
Fits a polynomial to each local window, then uses that polynomial to compute the smoothed value.
Let’s break this down with an example. Suppose you’re at point i and want to smooth it using a window of 5 points (points i-2, i-1, i, i+1, i+2):
Moving average says:
- “Just average these 5 values:
(y_{i-2} + y_{i-1} + y_i + y_{i+1} + y_{i+2}) / 5” - Simple, but treats all points equally and ignores the local trend
Savitzky-Golay says:
- “Fit a polynomial (like y = a + bx + cx²) through these 5 points”
- “Evaluate that polynomial at the center point”
- “Use that as your smoothed value”
The polynomial captures the local trend, so if you’re on a rising slope, the smooth value reflects that slope. If you’re at a peak, the polynomial captures the curvature. This is why SG preserves features so well!
A visual analogy
Imagine you’re hiking in fog and trying to figure out the terrain:
-
Moving average: You ask 5 people around you their altitude and take the average. If you’re on a hill, this average might be lower than your actual position!
-
Savitzky-Golay: You ask those 5 people their altitudes, fit a smooth curve through all the points, and read off the curve at your position. This captures whether you’re going uphill, downhill, or at a peak.
The mathematics (simplified)
Don’t worry — you don’t need to fit polynomials manually! The beauty of SG is that it can be computed as a weighted average, just like moving average, but with special weights.
For a window of size w and polynomial order p, there exists a set of coefficients c₋ₘ, …, c₀, …, cₘ such that:
where m = (w-1)/2.
The key difference from moving average: These coefficients aren’t all equal (like 1/w). Instead, they’re computed to give the best polynomial fit. The math behind computing these coefficients involves least squares (sound familiar?), but the coefficients are precomputed and tabulated.
You don’t need to compute them yourself — just look them up or use a library!
Example coefficients
For a window size of 5 and polynomial order 2 (quadratic), the SG coefficients are:
So the smoothed value is:
Notice:
- The center point gets the highest weight (17/35 ≈ 0.49)
- Immediate neighbors get good weight (12/35 ≈ 0.34 each)
- Edge points actually get negative weight (-3/35)!
This pattern captures the local curvature and is why SG preserves peaks so well.
Parameters: window size and polynomial order
SG smoothing has two parameters to tune:
1. Window size (w)
Just like moving average, the window size controls how many neighbors you use:
- Small window (5-7 points): Light smoothing, preserves fine details
- Medium window (9-15 points): Moderate smoothing, good for typical spectra
- Large window (17-25 points): Heavy smoothing, for very noisy data
Rule of thumb: The window should be larger than the noise fluctuations but smaller than your narrowest feature.
2. Polynomial order (p)
This is new! It controls the flexibility of the fitted polynomial:
- Order 2 (quadratic): Most common, captures curvature around peaks
- Order 3 (cubic): More flexible, can capture asymmetric peaks
- Order 4 (quartic): Very flexible, but can overfit to noise
- Order 5+: Rarely needed, high risk of overfitting
Rule of thumb:
- Start with order 2 or 3
- Higher orders need larger windows (otherwise you’re fitting 5 parameters with 5 points = overfitting!)
- Typical constraint: w ≥ p + 2 (preferably w ≥ 2p + 1)
Interactive visualization
Coming soon! An interactive tool where you can:
- Adjust window size and polynomial order
- Compare SG with moving average side-by-side
- See how SG preserves peak heights and widths
- Visualize the fitted polynomial in each window
Code examples
import numpy as npfrom scipy.signal import savgol_filterimport matplotlib.pyplot as plt
# Generate noisy spectrum with sharp peaksnp.random.seed(42)x = np.linspace(0, 100, 500)true_signal = (0.8 * np.exp(-((x - 25)**2) / 20) + 0.6 * np.exp(-((x - 50)**2) / 15) + 0.9 * np.exp(-((x - 75)**2) / 25))noise = np.random.normal(0, 0.08, len(x))noisy_signal = true_signal + noise
# Apply Savitzky-Golay smoothing# scipy.signal.savgol_filter(data, window_length, polyorder)sg_smooth = savgol_filter(noisy_signal, window_length=11, polyorder=2)
# Compare different parameterssg_w5_p2 = savgol_filter(noisy_signal, window_length=5, polyorder=2)sg_w11_p2 = savgol_filter(noisy_signal, window_length=11, polyorder=2)sg_w11_p3 = savgol_filter(noisy_signal, window_length=11, polyorder=3)sg_w21_p2 = savgol_filter(noisy_signal, window_length=21, polyorder=2)
# Plot resultsplt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)plt.plot(x, noisy_signal, alpha=0.5, label='Noisy', color='gray')plt.plot(x, sg_w5_p2, label='w=5, p=2', linewidth=2)plt.plot(x, true_signal, '--', label='True', color='black', alpha=0.7)plt.title('Light smoothing (w=5, p=2)')plt.legend()plt.grid(True, alpha=0.3)
plt.subplot(2, 2, 2)plt.plot(x, noisy_signal, alpha=0.5, label='Noisy', color='gray')plt.plot(x, sg_w11_p2, label='w=11, p=2', linewidth=2)plt.plot(x, true_signal, '--', label='True', color='black', alpha=0.7)plt.title('Moderate smoothing (w=11, p=2)')plt.legend()plt.grid(True, alpha=0.3)
plt.subplot(2, 2, 3)plt.plot(x, noisy_signal, alpha=0.5, label='Noisy', color='gray')plt.plot(x, sg_w11_p3, label='w=11, p=3', linewidth=2)plt.plot(x, true_signal, '--', label='True', color='black', alpha=0.7)plt.title('Higher polynomial (w=11, p=3)')plt.legend()plt.grid(True, alpha=0.3)
plt.subplot(2, 2, 4)plt.plot(x, noisy_signal, alpha=0.5, label='Noisy', color='gray')plt.plot(x, sg_w21_p2, label='w=21, p=2', linewidth=2)plt.plot(x, true_signal, '--', label='True', color='black', alpha=0.7)plt.title('Heavy smoothing (w=21, p=2)')plt.legend()plt.grid(True, alpha=0.3)
plt.tight_layout()plt.show()
# Example: Computing derivatives while smoothing# The deriv parameter computes the derivative of the polynomialfirst_derivative = savgol_filter(noisy_signal, window_length=11, polyorder=2, deriv=1)second_derivative = savgol_filter(noisy_signal, window_length=11, polyorder=2, deriv=2)
plt.figure(figsize=(10, 6))plt.subplot(3, 1, 1)plt.plot(x, noisy_signal, alpha=0.5, color='gray')plt.plot(x, sg_smooth, linewidth=2)plt.ylabel('Signal')plt.title('Savitzky-Golay: Smoothing and Derivatives')plt.grid(True, alpha=0.3)
plt.subplot(3, 1, 2)plt.plot(x, first_derivative, linewidth=2, color='red')plt.ylabel('1st Derivative')plt.grid(True, alpha=0.3)
plt.subplot(3, 1, 3)plt.plot(x, second_derivative, linewidth=2, color='green')plt.ylabel('2nd Derivative')plt.xlabel('Data Point')plt.grid(True, alpha=0.3)
plt.tight_layout()plt.show()% Savitzky-Golay smoothing in MATLAB% MATLAB has built-in sgolayfilt function
% Generate noisy spectrum with sharp peaksrng(42);x = linspace(0, 100, 500);true_signal = 0.8 * exp(-((x - 25).^2) / 20) + ... 0.6 * exp(-((x - 50).^2) / 15) + ... 0.9 * exp(-((x - 75).^2) / 25);noise = 0.08 * randn(size(x));noisy_signal = true_signal + noise;
% Apply Savitzky-Golay smoothing% sgolayfilt(data, polynomial_order, window_length)sg_smooth = sgolayfilt(noisy_signal, 2, 11);
% Compare different parameterssg_w5_p2 = sgolayfilt(noisy_signal, 2, 5);sg_w11_p2 = sgolayfilt(noisy_signal, 2, 11);sg_w11_p3 = sgolayfilt(noisy_signal, 3, 11);sg_w21_p2 = sgolayfilt(noisy_signal, 2, 21);
% Plot resultsfigure('Position', [100 100 1200 800]);
subplot(2, 2, 1);plot(x, noisy_signal, 'Color', [0.5 0.5 0.5 0.5], 'DisplayName', 'Noisy');hold on;plot(x, sg_w5_p2, 'LineWidth', 2, 'DisplayName', 'w=5, p=2');plot(x, true_signal, '--k', 'LineWidth', 1.5, 'DisplayName', 'True');title('Light smoothing (w=5, p=2)');legend('Location', 'best');grid on;
subplot(2, 2, 2);plot(x, noisy_signal, 'Color', [0.5 0.5 0.5 0.5], 'DisplayName', 'Noisy');hold on;plot(x, sg_w11_p2, 'LineWidth', 2, 'DisplayName', 'w=11, p=2');plot(x, true_signal, '--k', 'LineWidth', 1.5, 'DisplayName', 'True');title('Moderate smoothing (w=11, p=2)');legend('Location', 'best');grid on;
subplot(2, 2, 3);plot(x, noisy_signal, 'Color', [0.5 0.5 0.5 0.5], 'DisplayName', 'Noisy');hold on;plot(x, sg_w11_p3, 'LineWidth', 2, 'DisplayName', 'w=11, p=3');plot(x, true_signal, '--k', 'LineWidth', 1.5, 'DisplayName', 'True');title('Higher polynomial (w=11, p=3)');legend('Location', 'best');grid on;
subplot(2, 2, 4);plot(x, noisy_signal, 'Color', [0.5 0.5 0.5 0.5], 'DisplayName', 'Noisy');hold on;plot(x, sg_w21_p2, 'LineWidth', 2, 'DisplayName', 'w=21, p=2');plot(x, true_signal, '--k', 'LineWidth', 1.5, 'DisplayName', 'True');title('Heavy smoothing (w=21, p=2)');legend('Location', 'best');grid on;
% Example: Computing derivatives% For derivatives, we need to design the SG filter manuallyorder = 2;framelen = 11;[b, g] = sgolay(order, framelen);
% Smoothed signal (0th derivative)smoothed = conv(noisy_signal, g(:,1), 'same');
% First derivative% Note: need to account for data spacing (dx)dx = x(2) - x(1);first_deriv = conv(noisy_signal, g(:,2), 'same') / dx;
% Second derivativesecond_deriv = conv(noisy_signal, g(:,3), 'same') / dx^2;
figure;subplot(3, 1, 1);plot(x, noisy_signal, 'Color', [0.5 0.5 0.5 0.5]);hold on;plot(x, smoothed, 'LineWidth', 2);ylabel('Signal');title('Savitzky-Golay: Smoothing and Derivatives');grid on;
subplot(3, 1, 2);plot(x, first_deriv, 'LineWidth', 2, 'Color', 'r');ylabel('1st Derivative');grid on;
subplot(3, 1, 3);plot(x, second_deriv, 'LineWidth', 2, 'Color', 'g');ylabel('2nd Derivative');xlabel('Data Point');grid on;library(signal) # For sgolayfilt function
# Generate noisy spectrum with sharp peaksset.seed(42)x <- seq(0, 100, length.out = 500)true_signal <- 0.8 * exp(-((x - 25)^2) / 20) + 0.6 * exp(-((x - 50)^2) / 15) + 0.9 * exp(-((x - 75)^2) / 25)noise <- rnorm(length(x), mean = 0, sd = 0.08)noisy_signal <- true_signal + noise
# Apply Savitzky-Golay smoothing# sgolayfilt(x, p = polynomial_order, n = window_length)sg_smooth <- sgolayfilt(noisy_signal, p = 2, n = 11)
# Compare different parameterssg_w5_p2 <- sgolayfilt(noisy_signal, p = 2, n = 5)sg_w11_p2 <- sgolayfilt(noisy_signal, p = 2, n = 11)sg_w11_p3 <- sgolayfilt(noisy_signal, p = 3, n = 11)sg_w21_p2 <- sgolayfilt(noisy_signal, p = 2, n = 21)
# Plot resultspar(mfrow = c(2, 2))
# Plot 1plot(x, noisy_signal, type = 'l', col = 'gray', main = 'Light smoothing (w=5, p=2)', xlab = 'Data Point', ylab = 'Signal')lines(x, sg_w5_p2, col = 'blue', lwd = 2)lines(x, true_signal, col = 'black', lty = 2, lwd = 1.5)legend('topright', legend = c('Noisy', 'w=5, p=2', 'True'), col = c('gray', 'blue', 'black'), lwd = c(1, 2, 1.5), lty = c(1, 1, 2))grid()
# Plot 2plot(x, noisy_signal, type = 'l', col = 'gray', main = 'Moderate smoothing (w=11, p=2)', xlab = 'Data Point', ylab = 'Signal')lines(x, sg_w11_p2, col = 'red', lwd = 2)lines(x, true_signal, col = 'black', lty = 2, lwd = 1.5)legend('topright', legend = c('Noisy', 'w=11, p=2', 'True'), col = c('gray', 'red', 'black'), lwd = c(1, 2, 1.5), lty = c(1, 1, 2))grid()
# Plot 3plot(x, noisy_signal, type = 'l', col = 'gray', main = 'Higher polynomial (w=11, p=3)', xlab = 'Data Point', ylab = 'Signal')lines(x, sg_w11_p3, col = 'green', lwd = 2)lines(x, true_signal, col = 'black', lty = 2, lwd = 1.5)legend('topright', legend = c('Noisy', 'w=11, p=3', 'True'), col = c('gray', 'green', 'black'), lwd = c(1, 2, 1.5), lty = c(1, 1, 2))grid()
# Plot 4plot(x, noisy_signal, type = 'l', col = 'gray', main = 'Heavy smoothing (w=21, p=2)', xlab = 'Data Point', ylab = 'Signal')lines(x, sg_w21_p2, col = 'purple', lwd = 2)lines(x, true_signal, col = 'black', lty = 2, lwd = 1.5)legend('topright', legend = c('Noisy', 'w=21, p=2', 'True'), col = c('gray', 'purple', 'black'), lwd = c(1, 2, 1.5), lty = c(1, 1, 2))grid()
# Example: Computing derivatives# R's signal package supports derivatives through the 'm' parameterpar(mfrow = c(3, 1))
# Smoothed signal (m=0, no derivative)sg_smooth <- sgolayfilt(noisy_signal, p = 2, n = 11, m = 0)plot(x, noisy_signal, type = 'l', col = 'gray', main = 'Savitzky-Golay: Smoothing and Derivatives', ylab = 'Signal')lines(x, sg_smooth, col = 'blue', lwd = 2)grid()
# First derivative (m=1)dx <- x[2] - x[1]first_deriv <- sgolayfilt(noisy_signal, p = 2, n = 11, m = 1) / (dx^1)plot(x, first_deriv, type = 'l', col = 'red', lwd = 2, ylab = '1st Derivative')grid()
# Second derivative (m=2)second_deriv <- sgolayfilt(noisy_signal, p = 2, n = 11, m = 2) / (dx^2)plot(x, second_deriv, type = 'l', col = 'green', lwd = 2, xlab = 'Data Point', ylab = '2nd Derivative')grid()The derivative bonus: smooth and differentiate simultaneously
Here’s where SG becomes truly powerful: it can compute derivatives while smoothing!
Remember from calculus that differentiating amplifies noise. If you differentiate raw noisy data, you get garbage. The traditional approach is:
- Smooth the data first
- Then compute numerical derivatives
But SG can do both in one step! Since it fits a polynomial to each window, it can:
- Evaluate the polynomial → gives smoothed value
- Evaluate the polynomial’s derivative → gives smoothed derivative
- Evaluate the second derivative → gives smoothed second derivative
This is huge for spectroscopy where derivatives enhance peak resolution. We’ll cover derivatives in detail in a later article, but know that SG is the standard tool for this.
Advantages of Savitzky-Golay
✅ Excellent peak preservation — Minimal broadening compared to moving average ✅ Mathematically rigorous — Based on local polynomial least squares ✅ Flexible — Two parameters (window, polynomial order) allow fine-tuning ✅ Derivative capability — Can smooth and differentiate simultaneously ✅ Well-established — Widely used and trusted in spectroscopy ✅ Precomputed coefficients — Fast implementation (just a convolution) ✅ Predictable behavior — Clear relationship between parameters and smoothing
Disadvantages of Savitzky-Golay
❌ Two parameters to tune — More complex than single-parameter methods ❌ Can overfit — High polynomial orders with small windows capture noise ❌ Edge effects — Needs special handling at data boundaries ❌ Not optimal for all signals — Assumes locally polynomial behavior ❌ Less intuitive — Harder to explain than “averaging neighbors”
When to use Savitzky-Golay
Savitzky-Golay is the go-to method for spectroscopy when:
✅ You have sharp peaks that must be preserved ✅ You need to compute derivatives (it’s the best choice!) ✅ You want a balance between simplicity and quality ✅ You’re working with typical spectroscopic data (IR, Raman, UV-Vis, etc.) ✅ You want a method that’s widely recognized and accepted
Avoid SG when: ❌ Your signal isn’t smooth/continuous (use median filter instead) ❌ You have extreme noise (try Whittaker with high λ) ❌ You need the absolute simplest method (use moving average)
Comparison with other methods
| Feature | Moving Average | Savitzky-Golay | Whittaker |
|---|---|---|---|
| Peak preservation | Poor | Excellent | Excellent |
| Number of parameters | 1 (window) | 2 (window, order) | 1-2 (λ, order) |
| Computation | Very fast | Fast | Fast |
| Derivatives | No | Yes! | Not directly |
| Ease of use | Easiest | Moderate | Moderate |
| Best for | Quick exploration | Spectroscopy | General smoothing |
Choosing parameters: practical guidelines
Step 1: Choose polynomial order
Start with p = 2 (quadratic). This works for 90% of cases.
Use p = 3 if:
- Your peaks are asymmetric
- You need maximum flexibility
- You’re using a large window (≥15)
Avoid p ≥ 4 unless you really know what you’re doing!
Step 2: Choose window size
Start with w = 11 for typical spectroscopic data.
Adjust based on:
- More noise → increase window (15, 21, 25)
- Sharper peaks → decrease window (7, 9)
- Check the constraint: w ≥ 2p + 1 for safety
Step 3: Validate
- Visual check: Does it look smooth without destroying peaks?
- Residuals: Plot (original - smoothed). Should look like pure noise.
- Peak metrics: Compare peak heights and widths before/after.
- Cross-validation: If building a model, does smoothing improve prediction?
Quick reference table
| Noise Level | Recommended Settings |
|---|---|
| Low noise | w = 7-9, p = 2 |
| Moderate noise | w = 11-15, p = 2 |
| High noise | w = 17-25, p = 2-3 |
| Very sharp peaks | w = 5-7, p = 2 |
| Broad peaks | w = 15-21, p = 2-3 |
Common mistakes to avoid
❌ Using polynomial order too high → Overfitting, captures noise as signal ❌ Window too small for the polynomial → Violates w ≥ 2p + 1 rule ❌ Window too large for features → Broadens peaks unnecessarily ❌ Not checking edge effects → Edges can behave strangely ❌ Smoothing already-smoothed data → Diminishing returns, feature loss ❌ Different parameters for training/test → Inconsistent preprocessing
Further reading
Original paper:
- Savitzky, A., & Golay, M. J. E. (1964). “Smoothing and Differentiation of Data by Simplified Least Squares Procedures”. Analytical Chemistry, 36(8), 1627-1639.
Corrections to original tables:
- Steinier, J., Termonia, Y., & Deltour, J. (1972). “Smoothing and differentiation of data by simplified least square procedure”. Analytical Chemistry, 44(11), 1906-1909.
Modern treatments:
- Press, W. H., et al. (2007). Numerical Recipes: The Art of Scientific Computing (3rd ed.). Chapter on “Savitzky-Golay Smoothing Filters”.
- Schafer, R. W. (2011). “What Is a Savitzky-Golay Filter?”. IEEE Signal Processing Magazine, 28(4), 111-117.
Next steps:
- Learn about Gaussian smoothing for weighted averaging
- Explore derivatives to enhance peak resolution
- Understand when to use which smoothing method