The idea of smoothing data by averaging neighboring values is older than the name we give it today. In 1901, the English statistician R. H. Hooker published a study correlating marriage rates with trade prosperity, where he computed what he called “instantaneous averages” — replacing each year’s figure with the mean of itself and its neighbors to reveal long-term trends hidden under year-to-year fluctuations. Eight years later, G. U. Yule described Hooker’s technique as “moving-averages” in the Journal of the Royal Statistical Society (1909), and the term entered general use after W. I. King adopted it in his textbook Elements of Statistical Method (1912).
But the practice predates the name by at least a century. Actuaries smoothing mortality tables in the 18th and 19th centuries routinely averaged adjacent age-group death rates to iron out irregularities caused by small sample sizes — a running mean in all but name. Astronomers did the same when reducing series of observations, averaging consecutive readings to suppress random measurement error long before anyone formalized the procedure.
What makes the moving average so enduring is its simplicity. When digital signal processing emerged in the mid-20th century and engineers needed filters they could implement on the earliest computers, the moving average — or “boxcar filter,” as it became known in that context — was the obvious first choice. It requires only addition and division, no trigonometric functions, no matrix algebra, no iterative fitting. That simplicity is why it remains the first smoothing method most scientists learn, and a natural starting point before moving to more sophisticated techniques like Savitzky-Golay or Whittaker smoothing.
The noise problem in spectroscopy
Real spectroscopic data is noisy. When you measure a fluorescence emission spectrum, an IR absorption scan, or a Raman spectrum, the signal doesn’t come out as a smooth curve. Instead, it wiggles and jumps (sometimes going higher than expected, sometimes lower) even when measuring the exact same sample under identical conditions.
This isn’t a flaw in your instrument. It’s fundamental physics. Spectroscopic measurements involve counting photons (or electrons, or other discrete events), and these arrivals are inherently random. Even the best instruments produce noisy data. (We discussed the nature of measurement uncertainty and where noise comes from in detail in Least Squares: your first step into chemometrics.)
Your brain can easily see past the noise to identify the underlying smooth pattern: a peak here, a baseline there. But computers just see numbers: 0.421, 0.398, 0.447, 0.389… And this creates real problems for analysis. Random fluctuations can create false peaks or hide real ones, making accurate peak identification difficult. Noise adds uncertainty to measurements of peak heights and areas, which are critical for quantitative work. When building calibration models, regression algorithms may fit noise instead of signal, reducing prediction accuracy. Noise also obscures real differences between samples during spectral comparison and can completely hide subtle spectral features.
To solve this, we need smoothing - methods that remove random fluctuations while preserving the underlying chemical information. This makes downstream analysis more reliable and reproducible.
The core idea: averaging with neighbors
The simplest smoothing method is based on one elegant observation:
Noise is random. Signal, sometimes, is continuous.
Many spectroscopic signals arise from physical phenomena (molecular vibrations, electronic transitions, fluorescence emission) that use to change smoothly with wavelength or frequency. A molecule’s absorption doesn’t jump wildly from one wavelength to the next - it follows continuous, smooth shapes. If the intensity at 500 nm is 0.40, the intensity at 501 nm will be something close to 0.40, not 0.95 or 0.02. This is continuity: neighboring points in a spectrum carry similar information.
Noise, on the other hand, has no such behavior. It’s random at every point, independent of its neighbors. If noise pushes the value at 500 nm upward, it’s just as likely to push the value at 501 nm downward.
This difference is what we exploit. Imagine the true signal at wavelengths 499, 500, and 501 nm is 0.39, 0.40, and 0.41 - nearly the same, because the signal is continuous. But noise makes your measurements come out as 0.35, 0.46, 0.38 - scattered around those true values. If you average these three measured points, you get 0.397, which is very close to the true value at 500 nm (0.40). The noise, being random (some positive, some negative), tends to cancel itself out. The signal, being nearly the same across neighbors, survives the averaging almost unchanged.
Not all signals are continuous, though. Some spectroscopic techniques (like mass spectrometry or atomic emission) produce sharp, discrete features where neighboring points are not similar. Smoothing methods that assume continuity can distort or destroy these features, so keep your signal’s nature in mind before applying any smoothing.
Moving average smoothing (also called running mean or boxcar filter) applies this principle by replacing each point with the average of itself and its neighbors:
y^i=w1j=i−2w−1∑i+2w−1yj
Where:
ŷi = smoothed value at point i
w = window size (number of points to average)
The sum includes the point itself plus neighbors on both sides
That’s it! Since noise pushes values randomly higher and lower, averaging nearby points tends to cancel out the randomness, leaving the smooth underlying trend.
See it in action
Before diving into the math, let’s see how moving average actually works:
Try this:
Click “Start Animation” to watch the window slide across the spectrum
Use the slider to manually step through point by point
Try different window sizes (3, 7, 11, 15) and observe the smoothing effect
Notice how larger windows create smoother curves but blur the peaks
The purple star shows the smoothed value being calculated, while the window (shaded blue) highlights which neighbors are being averaged.
How it works in practice
Now that you’ve seen the process, let’s break down the algorithm step by step.
For a fluorescence spectrum with intensity values y1,y2,y3,…,yn at different wavelengths:
Choose a window size(w)
This determines how many neighbors each point consults for averaging.
For each point i, gather neighboring values
Collect values from i−2w−1 to i+2w−1 (the point itself plus neighbors on both sides).
Calculate the average
Sum all values in the window and divide by the window size.
Replace the original value
The smoothed value at point i becomes this average.
Notice how the wild jump from 12.3 to 19.8 becomes a gentler transition. The noise is being averaged away.
The fundamental tradeoff: smoothness vs. detail
The window size determines how aggressively you smooth. Larger windows give you smoother data, but they also blur your features. There’s always a tradeoff:
Smaller windows (relative to your peak width):
✅ Preserve fine details and sharp peaks
✅ Maintain peak heights and widths
✅ Can resolve closely-spaced features
❌ Leave more noise in the data
Moderate windows (balance point):
✅ Good noise reduction
✅ Features are preserved reasonably well
⚠️ Peaks are slightly broadened
Larger windows (approaching peak width):
✅ Very smooth, aesthetically pleasing curves
✅ Noise is almost completely gone
❌ Peaks are significantly broadened and shortened
❌ Close peaks may merge together
❌ Small features may disappear
So how do you choose? The key is to think about your data characteristics:
Example: scaling with data size
For a spectrum with 100 points and peaks spanning ~10-20 points:
Start with w = 3 or 5 (5-10% of peak width)
High noise? Try w = 7 or 9
Very noisy? Up to w = 11 or 15
For a high-resolution spectrum with 1000 points and peaks spanning ~100 points:
Start with w = 11 or 15 (~10% of peak width)
High noise? Try w = 21 or 31
Very noisy? Up to w = 51 or even 71
For ultra-high-resolution data with 50,000 points:
Scale proportionally - windows of 101, 201, etc. may be appropriate
The absolute number matters less than the ratio to your feature widths
Why use odd numbers?
Window sizes are typically odd (3, 5, 7, 9, 11) for symmetry. With an odd window, you get the same number of neighbors on both sides of the central point:
w = 5: 2 left + center + 2 right
w = 7: 3 left + center + 3 right
With an even window (like 4 or 8), you’d have to offset the window left or right, which subtly shifts your data. Odd windows keep everything centered and symmetric.
Edge effects
At the start and end of your spectrum, there aren’t enough neighbors on one side to fill the window. Different approaches handle this differently:
Truncation (most common):
Only smooth interior points where you have full windows
Leave the first and last (w−1)/2 points unsmoothed
Simple and honest
Edge padding:
Repeat the first value to create “virtual” neighbors on the left
Repeat the last value to create “virtual” neighbors on the right
NumPy’s convolve with mode='same' does this
Reflection:
Mirror the spectrum at the edges
Works well for some applications
Reduced window:
Use a smaller window at the edges
More complex but creates smooth transitions
For most spectroscopic applications, the interesting features are in the middle of the spectrum, so edge effects aren’t critical. But if you need accurate edge values (e.g., for integration), pay attention to how your software handles edges.
Real-time applications: causal (one-sided) moving average
Everything we’ve discussed so far assumes you have your complete dataset available. But what if you’re collecting data in real time, point by point?
In offline spectroscopy, you collect the entire spectrum first, then apply preprocessing. The centered moving average we’ve described works perfectly: you can use points both before and after the current position.
But in real-time applications, you can’t use future data that hasn’t been measured yet. This is critical in:
PAT (Process Analytical Technology): Real-time NIR or Raman monitoring during pharmaceutical manufacturing
Online quality control: Continuous monitoring of production streams
Process monitoring: Chemical reactors, fermentation, polymerization
In these scenarios, you need a causal (or one-sided) moving average that only uses the current point and past measurements:
y^i=w1j=i−w+1∑iyj
Instead of centering the window around point i, the window extends backward in time from the current point.
Notice the difference from the centered version above: the window (red dots) only appears to the left of the current point. There are no “future” measurements to use. Try increasing the window size and watch how the smoothed signal lags further behind the true signal - this is the fundamental cost of real-time smoothing.
The tradeoff: lag vs. smoothing
Causal moving average introduces a time lag (or phase shift). The smoothed signal is delayed compared to the true signal by approximately (w−1)/2 time points.
Example: With w = 9 in a PAT application measuring NIR spectra every 30 seconds:
Time lag ≈ (9-1)/2 = 4 measurements = 2 minutes delay
When a process change occurs, the smoothed signal shows it 2 minutes later
Practical implications:
Larger windows give better noise reduction but longer delays
In process control, delays can affect feedback loop stability
For alarm systems, delayed detection could be critical
Choosing window size for real-time:
The optimal window size depends on your process dynamics, noise level, and acceptable lag time. Consider:
Fast processes (reactions, mixing): Small windows (e.g., w = 3 to 5) minimize lag but provide less noise reduction
Slow processes (fermentation, curing): Larger windows (e.g., w = 7 to 11) acceptable if process changes slowly
Critical alarms: Prioritize fast detection over smoothing - use minimal window sizes
Trending/reporting: Lag is less critical, so stronger smoothing is acceptable
Balance noise reduction against the acceptable delay for your specific application.
Implementation for real-time
defcausal_moving_average(y, window_size):
"""
One-sided moving average for real-time applications.
You can, but it’s not recommended. Each pass blurs features more. If one pass with w = 7 isn’t enough, use a single pass with w = 11 instead - you’ll get better results.
Should I use the same window size for all samples?
Yes, absolutely! Consistency is critical. If you’re building a calibration model or comparing spectra, use the same window size for all data (calibration, validation, and test sets). Inconsistent smoothing creates inconsistent features.
How do I know if I’m removing signal instead of just noise?
Check the residuals:
Smooth your data
Subtract: residual=yoriginal−ysmoothed
Plot the residuals
Good smoothing: Residuals look like random noise (no patterns, symmetric around zero)
Over-smoothing: Residuals show structured patterns (like half of a peak) - you’re removing signal. Use a smaller window.
Why does my peak look shorter after smoothing?
Averaging a sharp peak with its neighboring baseline values pulls the peak value down. The peak spreads out to become shorter and broader, though the area stays roughly the same.
This is why moving average isn’t ideal for quantitative work on sharp peaks. Savitzky-Golay smoothing preserves peak heights much better.
Code implementation
Here’s how to implement moving average in Python, MATLAB, and R:
Quick exploratory analysis
Fast and simple, perfect for initial data inspection
Broad, gentle features
Wide, smooth peaks (like fluorescence emission) are handled well
Teaching and communication
Intuitive concept (“averaging with neighbors”) that’s easy to explain
Baseline estimation
Use a very large window (50+ points) to capture slowly-varying baseline drift
Speed-critical applications
Extremely fast computation, ideal for batch processing thousands of spectra
❌ Use something better for
Sharp Raman peaks
Moving average significantly reduces peak heights and increases widths. Use Savitzky-Golay instead.
Quantitative analysis
Peak broadening and height reduction introduce bias in calibration curves
High-resolution work
Resolution matters (vibronic structure, closely-spaced peaks) and moving average’s blurring is unacceptable
Publication-quality figures
Moving average can leave visible artifacts (slight stair-stepping effects)
Advantages and limitations
Advantages
✅ Extremely simple
Easy to understand and implement
✅ Very fast
Minimal computational cost
✅ No parameters beyond window size
Just choose w and you’re done
✅ Predictable behavior
Always smooths, never does anything surprising
Limitations
❌ Reduces peak heights
Averaging pulls sharp peaks down toward their neighbors
❌ Broadens features
Peaks get wider, closely-spaced features may merge
❌ Equal weighting
All neighbors are treated equally, even those far from the center point
❌ Not optimal for most applications
Other methods (Savitzky-Golay, Whittaker) preserve features better
Practical tips
Window size selection:
No single “best” window - it depends on noise level, feature width, and data resolution
Start small (w = 5) and increase until noise is acceptable while features remain sharp
More data points generally allows larger windows, but always check visually
Always use odd numbers for symmetry
Consistency is key: same window for all samples in a study
Quality control:
Always plot before and after smoothing
Check residuals (original - smoothed) for structured patterns
If residuals show peak shapes, you’re over-smoothing
Computational efficiency:
NumPy’s convolve is highly optimized
For many spectra, vectorize the operation instead of looping
Moving average is fast enough that optimization usually isn’t needed
Next steps
Moving average teaches the fundamental concepts of smoothing: noise reduction vs. feature preservation, window size tradeoffs, and edge effects. These principles apply to all smoothing methods.
For most serious spectroscopic work, consider these more advanced methods:
Savitzky-Golay smoothing - Fits local polynomials instead of averaging. Much better peak preservation. The most common choice for Raman and IR spectroscopy.
Whittaker smoothing - Optimization-based approach that balances smoothness against data fidelity. Excellent general-purpose method.
Gaussian smoothing - Weighted averaging where close neighbors matter more than distant ones. Smoother than moving average with similar simplicity.
Moving average is the simplest smoothing method, making it perfect for quick exploratory analysis and teaching. But for quantitative work or publication-quality results, the more sophisticated methods above are usually worth the extra complexity.
References
[1]Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627-1639.
[2]Rinnan, Å., van den Berg, F., & Engelsen, S. B. (2009). Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends in Analytical Chemistry, 28(10), 1201-1222.
[3]Martens, H., & Næs, T. (1989). Multivariate Calibration. Wiley.
[4]Brereton, R. G. (2003). Chemometrics: Data Analysis for the Laboratory and Chemical Plant. Wiley.
[5]Mark, H., & Workman, J. (2007). Chemometrics in Spectroscopy. Academic Press.
[6]Ingle, J. D., & Crouch, S. R. (1988). Spectrochemical Analysis. Prentice Hall.