Moving Average Smoothing
The simplest smoothing method - averaging neighboring points
The idea of smoothing data by averaging neighboring values is older than the name we give it today. In 1901, the English statistician R. H. Hooker published a study correlating marriage rates with trade prosperity, where he computed what he called "instantaneous averages" -- replacing each year's figure with the mean of itself and its neighbors to reveal long-term trends hidden under year-to-year fluctuations. Eight years later, G. U. Yule described Hooker's technique as "moving-averages" in the Journal of the Royal Statistical Society (1909), and the term entered general use after W. I. King adopted it in his textbook Elements of Statistical Method (1912).
But the practice predates the name by at least a century. Actuaries smoothing mortality tables in the 18th and 19th centuries routinely averaged adjacent age-group death rates to iron out irregularities caused by small sample sizes -- a running mean in all but name. Astronomers did the same when reducing series of observations, averaging consecutive readings to suppress random measurement error long before anyone formalized the procedure.
What makes the moving average so enduring is its simplicity. When digital signal processing emerged in the mid-20th century and engineers needed filters they could implement on the earliest computers, the moving average -- or "boxcar filter," as it became known in that context -- was the obvious first choice. It requires only addition and division, no trigonometric functions, no matrix algebra, no iterative fitting. That simplicity is why it remains the first smoothing method most scientists learn, and a natural starting point before moving to more sophisticated techniques like Savitzky-Golay or Whittaker smoothing.
The noise problem in spectroscopy
Real spectroscopic data is noisy. When you measure a fluorescence emission spectrum, an IR absorption scan, or a Raman spectrum, the signal doesn't come out as a smooth curve. Instead, it wiggles and jumps (sometimes going higher than expected, sometimes lower) even when measuring the exact same sample under identical conditions.
This isn't a flaw in your instrument. It's fundamental physics. Spectroscopic measurements involve counting photons (or electrons, or other discrete events), and these arrivals are inherently random. Even the best instruments produce noisy data. (We discussed the nature of measurement uncertainty and where noise comes from in detail in Least Squares: your first step into chemometrics.)
Your brain can easily see past the noise to identify the underlying smooth pattern: a peak here, a baseline there. But computers just see numbers: 0.421, 0.398, 0.447, 0.389... And this creates real problems for analysis. Random fluctuations can create false peaks or hide real ones, making accurate peak identification difficult. Noise adds uncertainty to measurements of peak heights and areas, which are critical for quantitative work. When building calibration models, regression algorithms may fit noise instead of signal, reducing prediction accuracy. Noise also obscures real differences between samples during spectral comparison and can completely hide subtle spectral features.
To solve this, we need smoothing - methods that remove random fluctuations while preserving the underlying chemical information. This makes downstream analysis more reliable and reproducible.
The core idea: averaging with neighbors
The simplest smoothing method is based on one elegant observation:
Noise is random. Signal, sometimes, is continuous.
Many spectroscopic signals arise from physical phenomena (molecular vibrations, electronic transitions, fluorescence emission) that use to change smoothly with wavelength or frequency. A molecule's absorption doesn't jump wildly from one wavelength to the next - it follows continuous, smooth shapes. If the intensity at 500 nm is 0.40, the intensity at 501 nm will be something close to 0.40, not 0.95 or 0.02. This is continuity: neighboring points in a spectrum carry similar information.
Noise, on the other hand, has no such behavior. It's random at every point, independent of its neighbors. If noise pushes the value at 500 nm upward, it's just as likely to push the value at 501 nm downward.
This difference is what we exploit. Imagine the true signal at wavelengths 499, 500, and 501 nm is 0.39, 0.40, and 0.41 - nearly the same, because the signal is continuous. But noise makes your measurements come out as 0.35, 0.46, 0.38 - scattered around those true values. If you average these three measured points, you get 0.397, which is very close to the true value at 500 nm (0.40). The noise, being random (some positive, some negative), tends to cancel itself out. The signal, being nearly the same across neighbors, survives the averaging almost unchanged.
Not all signals are continuous, though. Some spectroscopic techniques (like mass spectrometry or atomic emission) produce sharp, discrete features where neighboring points are not similar. Smoothing methods that assume continuity can distort or destroy these features, so keep your signal's nature in mind before applying any smoothing.
Moving average smoothing (also called running mean or boxcar filter) applies this principle by replacing each point with the average of itself and its neighbors:
Where:
- ŷi = smoothed value at point i
- w = window size (number of points to average)
- The sum includes the point itself plus neighbors on both sides
That's it! Since noise pushes values randomly higher and lower, averaging nearby points tends to cancel out the randomness, leaving the smooth underlying trend.
See it in action
Before diving into the math, let's see how moving average actually works:
Try this:
- Click "Start Animation" to watch the window slide across the spectrum
- Use the slider to manually step through point by point
- Try different window sizes (3, 7, 11, 15) and observe the smoothing effect
- Notice how larger windows create smoother curves but blur the peaks
The purple star shows the smoothed value being calculated, while the window (shaded blue) highlights which neighbors are being averaged.
How it works in practice
Now that you've seen the process, let's break down the algorithm step by step.
For a fluorescence spectrum with intensity values at different wavelengths:
-
Choose a window size
This determines how many neighbors each point consults for averaging.
-
For each point i, gather neighboring values
Collect values from to (the point itself plus neighbors on both sides).
-
Calculate the average
Sum all values in the window and divide by the window size.
-
Replace the original value
The smoothed value at point i becomes this average.
Example calculation
Suppose you have these intensity values:
Original: [10.2, 15.1, 12.3, 19.8, 18.2, 15.9, 14.1, 18.7, 17.2, 15.3]With a window size of 3 (each point averaged with its two immediate neighbors):
- Point at index 1:
- Point at index 2:
- Point at index 3:
- Point at index 4:
Smoothed: [12.5, 15.7, 16.8, 18.0, 16.1, 16.1, 16.2, 16.7, ...]Notice how the wild jump from 12.3 to 19.8 becomes a gentler transition. The noise is being averaged away.
The fundamental tradeoff: smoothness vs. detail
The window size determines how aggressively you smooth. Larger windows give you smoother data, but they also blur your features. There's always a tradeoff:
Smaller windows (relative to your peak width):
- ✅ Preserve fine details and sharp peaks
- ✅ Maintain peak heights and widths
- ✅ Can resolve closely-spaced features
- ❌ Leave more noise in the data
Moderate windows (balance point):
- ✅ Good noise reduction
- ✅ Features are preserved reasonably well
- ⚠️ Peaks are slightly broadened
Larger windows (approaching peak width):
- ✅ Very smooth, aesthetically pleasing curves
- ✅ Noise is almost completely gone
- ❌ Peaks are significantly broadened and shortened
- ❌ Close peaks may merge together
- ❌ Small features may disappear
So how do you choose? The key is to think about your data characteristics:
Choosing window size
The window size should be much smaller than your spectral features but large enough to average out noise. Think in terms of percentage of your data or feature width, not absolute numbers.
General principle:
- Window should be 5-20% of your narrowest peak width (in data points)
- Larger windows smooth more but blur features
- The "right" size depends on noise level, feature sharpness, and total data points
Approach: Start small, plot the result, increase gradually while checking that features aren't blurred.
Example: scaling with data size
For a spectrum with 100 points and peaks spanning ~10-20 points:
- Start with w = 3 or 5 (5-10% of peak width)
- High noise? Try w = 7 or 9
- Very noisy? Up to w = 11 or 15
For a high-resolution spectrum with 1000 points and peaks spanning ~100 points:
- Start with w = 11 or 15 (~10% of peak width)
- High noise? Try w = 21 or 31
- Very noisy? Up to w = 51 or even 71
For ultra-high-resolution data with 50,000 points:
- Scale proportionally - windows of 101, 201, etc. may be appropriate
- The absolute number matters less than the ratio to your feature widths
Why use odd numbers?
Window sizes are typically odd (3, 5, 7, 9, 11) for symmetry. With an odd window, you get the same number of neighbors on both sides of the central point:
- w = 5: 2 left + center + 2 right
- w = 7: 3 left + center + 3 right
With an even window (like 4 or 8), you'd have to offset the window left or right, which subtly shifts your data. Odd windows keep everything centered and symmetric.
Edge effects
At the start and end of your spectrum, there aren't enough neighbors on one side to fill the window. Different approaches handle this differently:
Truncation (most common):
- Only smooth interior points where you have full windows
- Leave the first and last points unsmoothed
- Simple and honest
Edge padding:
- Repeat the first value to create "virtual" neighbors on the left
- Repeat the last value to create "virtual" neighbors on the right
- NumPy's
convolvewithmode='same'does this
Reflection:
- Mirror the spectrum at the edges
- Works well for some applications
Reduced window:
- Use a smaller window at the edges
- More complex but creates smooth transitions
For most spectroscopic applications, the interesting features are in the middle of the spectrum, so edge effects aren't critical. But if you need accurate edge values (e.g., for integration), pay attention to how your software handles edges.
Real-time applications: causal (one-sided) moving average
Everything we've discussed so far assumes you have your complete dataset available. But what if you're collecting data in real time, point by point?
In offline spectroscopy, you collect the entire spectrum first, then apply preprocessing. The centered moving average we've described works perfectly: you can use points both before and after the current position.
But in real-time applications, you can't use future data that hasn't been measured yet. This is critical in:
- PAT (Process Analytical Technology): Real-time NIR or Raman monitoring during pharmaceutical manufacturing
- Online quality control: Continuous monitoring of production streams
- Process monitoring: Chemical reactors, fermentation, polymerization
- Time-series spectroscopy: Kinetic measurements, reaction monitoring
In these scenarios, you need a causal (or one-sided) moving average that only uses the current point and past measurements:
Instead of centering the window around point i, the window extends backward in time from the current point.
Notice the difference from the centered version above: the window (red dots) only appears to the left of the current point. There are no "future" measurements to use. Try increasing the window size and watch how the smoothed signal lags further behind the true signal - this is the fundamental cost of real-time smoothing.
The tradeoff: lag vs. smoothing
Causal moving average introduces a time lag (or phase shift). The smoothed signal is delayed compared to the true signal by approximately time points.
Example: With w = 9 in a PAT application measuring NIR spectra every 30 seconds:
- Time lag ≈ (9-1)/2 = 4 measurements = 2 minutes delay
- When a process change occurs, the smoothed signal shows it 2 minutes later
Practical implications:
- Larger windows give better noise reduction but longer delays
- In process control, delays can affect feedback loop stability
- For alarm systems, delayed detection could be critical
Choosing window size for real-time:
The optimal window size depends on your process dynamics, noise level, and acceptable lag time. Consider:
- Fast processes (reactions, mixing): Small windows (e.g., w = 3 to 5) minimize lag but provide less noise reduction
- Slow processes (fermentation, curing): Larger windows (e.g., w = 7 to 11) acceptable if process changes slowly
- Critical alarms: Prioritize fast detection over smoothing - use minimal window sizes
- Trending/reporting: Lag is less critical, so stronger smoothing is acceptable
Balance noise reduction against the acceptable delay for your specific application.
Implementation for real-time
def causal_moving_average(y, window_size):
"""
One-sided moving average for real-time applications.
Only uses current and past data points.
"""
smoothed = []
for i in range(len(y)):
# Window extends backward from current point
start = max(0, i - window_size + 1)
window_values = y[start:i+1]
smoothed.append(sum(window_values) / len(window_values))
return smoothed
# Real-time PAT example: NIR spectra acquired every 30 seconds
spectra = acquire_nir_spectrum() # Latest measurement
smoothed_spectrum = causal_moving_average(spectra, window_size=5)Real-time vs. offline smoothing
Never use centered moving average in real-time applications. It requires future data that doesn't exist yet. Always use causal (one-sided) moving average for:
- Online process monitoring
- Real-time quality control
- Feedback control systems
- Time-series measurements
Use centered moving average for:
- Batch analysis of collected data
- Post-processing of stored spectra
- Research and development
- Any offline analysis where all data is available upfront
Common questions
Can I apply moving average multiple times?
You can, but it's not recommended. Each pass blurs features more. If one pass with w = 7 isn't enough, use a single pass with w = 11 instead - you'll get better results.
Should I use the same window size for all samples?
Yes, absolutely! Consistency is critical. If you're building a calibration model or comparing spectra, use the same window size for all data (calibration, validation, and test sets). Inconsistent smoothing creates inconsistent features.
How do I know if I'm removing signal instead of just noise?
Check the residuals:
- Smooth your data
- Subtract:
- Plot the residuals
Good smoothing: Residuals look like random noise (no patterns, symmetric around zero)
Over-smoothing: Residuals show structured patterns (like half of a peak) - you're removing signal. Use a smaller window.
Why does my peak look shorter after smoothing?
Averaging a sharp peak with its neighboring baseline values pulls the peak value down. The peak spreads out to become shorter and broader, though the area stays roughly the same.
This is why moving average isn't ideal for quantitative work on sharp peaks. Savitzky-Golay smoothing preserves peak heights much better.
Code implementation
Here's how to implement moving average in Python, MATLAB, and R:
import numpy as np
import matplotlib.pyplot as plt
def moving_average(y, window_size):
"""
Apply moving average smoothing to a 1D signal.
Parameters:
-----------
y : array-like
Input signal to be smoothed (e.g., spectrum intensities)
window_size : int
Number of points in the averaging window (use odd numbers!)
Returns:
--------
y_smooth : array
Smoothed signal (same length as input)
"""
# Create the averaging kernel (array of 1/w values)
window = np.ones(window_size) / window_size
# Apply convolution (NumPy handles edge padding automatically)
y_smooth = np.convolve(y, window, mode='same')
return y_smooth
# Example: noisy fluorescence emission spectrum
np.random.seed(42)
# Wavelength range typical for fluorescence (nm)
wavelength = np.linspace(400, 600, 100)
# Simulate a realistic fluorescence spectrum with vibronic structure
true_signal = (0.85 * np.exp(-((wavelength - 480)**2) / 400) +
0.35 * np.exp(-((wavelength - 510)**2) / 300) +
0.15 * np.exp(-((wavelength - 545)**2) / 250))
# Add realistic shot noise (proportional to sqrt of signal)
noise = np.random.normal(0, 0.08 * np.sqrt(true_signal))
noisy_spectrum = true_signal + noise
# Try different window sizes
smooth_w5 = moving_average(noisy_spectrum, window_size=5)
smooth_w9 = moving_average(noisy_spectrum, window_size=9)
smooth_w15 = moving_average(noisy_spectrum, window_size=15)
# Plot and compare
plt.figure(figsize=(12, 6))
plt.plot(wavelength, noisy_spectrum, 'o-', alpha=0.4,
label='Noisy data', color='gray', markersize=3)
plt.plot(wavelength, smooth_w5, linewidth=2.5,
label='Window = 5')
plt.plot(wavelength, smooth_w9, linewidth=2.5,
label='Window = 9')
plt.plot(wavelength, smooth_w15, linewidth=2.5,
label='Window = 15')
plt.xlabel('Wavelength (nm)', fontsize=12)
plt.ylabel('Fluorescence Intensity (a.u.)', fontsize=12)
plt.title('Moving Average Smoothing: Window Size Comparison', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Check residuals
residuals = noisy_spectrum - smooth_w9
plt.figure(figsize=(12, 4))
plt.plot(wavelength, residuals, 'o-', markersize=3)
plt.axhline(y=0, color='red', linestyle='--', alpha=0.5)
plt.xlabel('Wavelength (nm)')
plt.ylabel('Residual (Original - Smoothed)')
plt.title('Residuals: Should look like random noise with no patterns')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()function y_smooth = moving_average(y, window_size)
% Apply moving average smoothing to a 1D signal
%
% Parameters:
% y - Input signal to smooth (column or row vector)
% window_size - Number of points in averaging window (use odd!)
%
% Returns:
% y_smooth - Smoothed signal (same size as input)
% Create the averaging kernel
window = ones(1, window_size) / window_size;
% Apply convolution (MATLAB handles edge padding)
y_smooth = conv(y, window, 'same');
end
% Example: fluorescence emission spectrum
rng(42);
% Typical fluorescence wavelength range (nm)
wavelength = linspace(400, 600, 100);
% Simulate realistic fluorescence with vibronic bands
true_signal = 0.85 * exp(-((wavelength - 480).^2) / 400) + ...
0.35 * exp(-((wavelength - 510).^2) / 300) + ...
0.15 * exp(-((wavelength - 545).^2) / 250);
% Add realistic shot noise
noise = 0.08 * sqrt(true_signal) .* randn(size(wavelength));
noisy_spectrum = true_signal + noise;
% Try different window sizes
smooth_w5 = moving_average(noisy_spectrum, 5);
smooth_w9 = moving_average(noisy_spectrum, 9);
smooth_w15 = moving_average(noisy_spectrum, 15);
% Visualize the comparison
figure('Position', [100 100 1000 500]);
plot(wavelength, noisy_spectrum, 'o-', 'Color', [0.7 0.7 0.7], ...
'MarkerSize', 3, 'DisplayName', 'Noisy data');
hold on;
plot(wavelength, smooth_w5, 'LineWidth', 2.5, 'DisplayName', 'Window = 5');
plot(wavelength, smooth_w9, 'LineWidth', 2.5, 'DisplayName', 'Window = 9');
plot(wavelength, smooth_w15, 'LineWidth', 2.5, 'DisplayName', 'Window = 15');
xlabel('Wavelength (nm)', 'FontSize', 12);
ylabel('Fluorescence Intensity (a.u.)', 'FontSize', 12);
title('Moving Average: Window Size Comparison', 'FontSize', 14);
legend('Location', 'northeast', 'FontSize', 10);
grid on;
% Check residuals
figure('Position', [100 100 1000 300]);
residuals = noisy_spectrum - smooth_w9;
plot(wavelength, residuals, 'o-', 'MarkerSize', 3);
yline(0, '--r', 'LineWidth', 1.5, 'Alpha', 0.5);
xlabel('Wavelength (nm)');
ylabel('Residual');
title('Residuals: Should be random noise, no structure');
grid on;moving_average <- function(y, window_size) {
# Apply moving average smoothing to a 1D signal
#
# Parameters:
# y - Input signal to smooth (numeric vector)
# window_size - Number of points in window (use odd numbers!)
#
# Returns:
# y_smooth - Smoothed signal
# Create averaging kernel
window <- rep(1 / window_size, window_size)
# Apply convolution with centered window
y_smooth <- stats::filter(y, window, sides = 2)
# Handle edge NAs by using original edge values
# (R's filter leaves NAs at edges by default)
y_smooth[is.na(y_smooth)] <- y[is.na(y_smooth)]
return(as.vector(y_smooth))
}
# Example: fluorescence emission spectrum
set.seed(42)
# Wavelength range for fluorescence (nm)
wavelength <- seq(400, 600, length.out = 100)
# Simulate realistic fluorescence with vibronic structure
true_signal <- 0.85 * exp(-((wavelength - 480)^2) / 400) +
0.35 * exp(-((wavelength - 510)^2) / 300) +
0.15 * exp(-((wavelength - 545)^2) / 250)
# Add realistic shot noise
noise <- 0.08 * sqrt(true_signal) * rnorm(length(wavelength))
noisy_spectrum <- true_signal + noise
# Compare different window sizes
smooth_w5 <- moving_average(noisy_spectrum, 5)
smooth_w9 <- moving_average(noisy_spectrum, 9)
smooth_w15 <- moving_average(noisy_spectrum, 15)
# Visualize
par(mfrow = c(1, 1), mar = c(4, 4, 3, 1))
plot(wavelength, noisy_spectrum, type = 'o', col = 'gray60',
pch = 16, cex = 0.6,
xlab = 'Wavelength (nm)',
ylab = 'Fluorescence Intensity (a.u.)',
main = 'Moving Average: Window Size Comparison')
lines(wavelength, smooth_w5, col = '#2563eb', lwd = 2.5)
lines(wavelength, smooth_w9, col = '#059669', lwd = 2.5)
lines(wavelength, smooth_w15, col = '#dc2626', lwd = 2.5)
legend('topright',
legend = c('Noisy', 'w=5', 'w=9', 'w=15'),
col = c('gray60', '#2563eb', '#059669', '#dc2626'),
lwd = c(1, 2.5, 2.5, 2.5),
pch = c(16, NA, NA, NA),
pt.cex = 0.6,
cex = 0.9)
grid()
# Check residuals
par(mar = c(4, 4, 3, 1))
residuals <- noisy_spectrum - smooth_w9
plot(wavelength, residuals, type = 'o', pch = 16, cex = 0.6,
xlab = 'Wavelength (nm)', ylab = 'Residual',
main = 'Residuals: No patterns = good smoothing')
abline(h = 0, col = 'red', lty = 2, lwd = 2)
grid()When to use moving average
✅ Works well for
Quick exploratory analysis Fast and simple, perfect for initial data inspection
Broad, gentle features Wide, smooth peaks (like fluorescence emission) are handled well
Teaching and communication Intuitive concept ("averaging with neighbors") that's easy to explain
Baseline estimation Use a very large window (50+ points) to capture slowly-varying baseline drift
Speed-critical applications Extremely fast computation, ideal for batch processing thousands of spectra
❌ Use something better for
Sharp Raman peaks Moving average significantly reduces peak heights and increases widths. Use Savitzky-Golay instead.
Quantitative analysis Peak broadening and height reduction introduce bias in calibration curves
High-resolution work Resolution matters (vibronic structure, closely-spaced peaks) and moving average's blurring is unacceptable
Publication-quality figures Moving average can leave visible artifacts (slight stair-stepping effects)
Advantages and limitations
Advantages
✅ Extremely simple Easy to understand and implement
✅ Very fast Minimal computational cost
✅ No parameters beyond window size Just choose w and you're done
✅ Predictable behavior Always smooths, never does anything surprising
Limitations
❌ Reduces peak heights Averaging pulls sharp peaks down toward their neighbors
❌ Broadens features Peaks get wider, closely-spaced features may merge
❌ Equal weighting All neighbors are treated equally, even those far from the center point
❌ Not optimal for most applications Other methods (Savitzky-Golay, Whittaker) preserve features better
Practical tips
Window size selection:
- No single "best" window - it depends on noise level, feature width, and data resolution
- Start small (w = 5) and increase until noise is acceptable while features remain sharp
- More data points generally allows larger windows, but always check visually
- Always use odd numbers for symmetry
- Consistency is key: same window for all samples in a study
Quality control:
- Always plot before and after smoothing
- Check residuals (original - smoothed) for structured patterns
- If residuals show peak shapes, you're over-smoothing
Computational efficiency:
- NumPy's
convolveis highly optimized - For many spectra, vectorize the operation instead of looping
- Moving average is fast enough that optimization usually isn't needed
Next steps
Moving average teaches the fundamental concepts of smoothing: noise reduction vs. feature preservation, window size tradeoffs, and edge effects. These principles apply to all smoothing methods.
For most serious spectroscopic work, consider these more advanced methods:
-
Savitzky-Golay smoothing - Fits local polynomials instead of averaging. Much better peak preservation. The most common choice for Raman and IR spectroscopy.
-
Whittaker smoothing - Optimization-based approach that balances smoothness against data fidelity. Excellent general-purpose method.
-
Gaussian smoothing - Weighted averaging where close neighbors matter more than distant ones. Smoother than moving average with similar simplicity.
Moving average is the simplest smoothing method, making it perfect for quick exploratory analysis and teaching. But for quantitative work or publication-quality results, the more sophisticated methods above are usually worth the extra complexity.