Skip to content

Moving Average Smoothing

The idea of smoothing data by averaging neighboring values is older than the name we give it today. In 1901, the English statistician R. H. Hooker published a study correlating marriage rates with trade prosperity, where he computed what he called “instantaneous averages” — replacing each year’s figure with the mean of itself and its neighbors to reveal long-term trends hidden under year-to-year fluctuations. Eight years later, G. U. Yule described Hooker’s technique as “moving-averages” in the Journal of the Royal Statistical Society (1909), and the term entered general use after W. I. King adopted it in his textbook Elements of Statistical Method (1912).

But the practice predates the name by at least a century. Actuaries smoothing mortality tables in the 18th and 19th centuries routinely averaged adjacent age-group death rates to iron out irregularities caused by small sample sizes — a running mean in all but name. Astronomers did the same when reducing series of observations, averaging consecutive readings to suppress random measurement error long before anyone formalized the procedure.

What makes the moving average so enduring is its simplicity. When digital signal processing emerged in the mid-20th century and engineers needed filters they could implement on the earliest computers, the moving average — or “boxcar filter,” as it became known in that context — was the obvious first choice. It requires only addition and division, no trigonometric functions, no matrix algebra, no iterative fitting. That simplicity is why it remains the first smoothing method most scientists learn, and a natural starting point before moving to more sophisticated techniques like Savitzky-Golay or Whittaker smoothing.

The noise problem in spectroscopy

Real spectroscopic data is noisy. When you measure a fluorescence emission spectrum, an IR absorption scan, or a Raman spectrum, the signal doesn’t come out as a smooth curve. Instead, it wiggles and jumps (sometimes going higher than expected, sometimes lower) even when measuring the exact same sample under identical conditions.

This isn’t a flaw in your instrument. It’s fundamental physics. Spectroscopic measurements involve counting photons (or electrons, or other discrete events), and these arrivals are inherently random. Even the best instruments produce noisy data. (We discussed the nature of measurement uncertainty and where noise comes from in detail in Least Squares: your first step into chemometrics.)

Your brain can easily see past the noise to identify the underlying smooth pattern: a peak here, a baseline there. But computers just see numbers: 0.421, 0.398, 0.447, 0.389… And this creates real problems for analysis. Random fluctuations can create false peaks or hide real ones, making accurate peak identification difficult. Noise adds uncertainty to measurements of peak heights and areas, which are critical for quantitative work. When building calibration models, regression algorithms may fit noise instead of signal, reducing prediction accuracy. Noise also obscures real differences between samples during spectral comparison and can completely hide subtle spectral features.

To solve this, we need smoothing - methods that remove random fluctuations while preserving the underlying chemical information. This makes downstream analysis more reliable and reproducible.

The core idea: averaging with neighbors

The simplest smoothing method is based on one elegant observation:

Noise is random. Signal, sometimes, is continuous.

Many spectroscopic signals arise from physical phenomena (molecular vibrations, electronic transitions, fluorescence emission) that use to change smoothly with wavelength or frequency. A molecule’s absorption doesn’t jump wildly from one wavelength to the next - it follows continuous, smooth shapes. If the intensity at 500 nm is 0.40, the intensity at 501 nm will be something close to 0.40, not 0.95 or 0.02. This is continuity: neighboring points in a spectrum carry similar information.

Noise, on the other hand, has no such behavior. It’s random at every point, independent of its neighbors. If noise pushes the value at 500 nm upward, it’s just as likely to push the value at 501 nm downward.

This difference is what we exploit. Imagine the true signal at wavelengths 499, 500, and 501 nm is 0.39, 0.40, and 0.41 - nearly the same, because the signal is continuous. But noise makes your measurements come out as 0.35, 0.46, 0.38 - scattered around those true values. If you average these three measured points, you get 0.397, which is very close to the true value at 500 nm (0.40). The noise, being random (some positive, some negative), tends to cancel itself out. The signal, being nearly the same across neighbors, survives the averaging almost unchanged.

Not all signals are continuous, though. Some spectroscopic techniques (like mass spectrometry or atomic emission) produce sharp, discrete features where neighboring points are not similar. Smoothing methods that assume continuity can distort or destroy these features, so keep your signal’s nature in mind before applying any smoothing.

Moving average smoothing (also called running mean or boxcar filter) applies this principle by replacing each point with the average of itself and its neighbors:

Where:

  • ŷi = smoothed value at point i
  • w = window size (number of points to average)
  • The sum includes the point itself plus neighbors on both sides

That’s it! Since noise pushes values randomly higher and lower, averaging nearby points tends to cancel out the randomness, leaving the smooth underlying trend.

See it in action

Before diving into the math, let’s see how moving average actually works:

Try this:

  1. Click “Start Animation” to watch the window slide across the spectrum
  2. Use the slider to manually step through point by point
  3. Try different window sizes (3, 7, 11, 15) and observe the smoothing effect
  4. Notice how larger windows create smoother curves but blur the peaks

The purple star shows the smoothed value being calculated, while the window (shaded blue) highlights which neighbors are being averaged.

How it works in practice

Now that you’ve seen the process, let’s break down the algorithm step by step.

For a fluorescence spectrum with intensity values at different wavelengths:

  1. Choose a window size

    This determines how many neighbors each point consults for averaging.

  2. For each point i, gather neighboring values

    Collect values from to (the point itself plus neighbors on both sides).

  3. Calculate the average

    Sum all values in the window and divide by the window size.

  4. Replace the original value

    The smoothed value at point i becomes this average.

Example calculation

Suppose you have these intensity values:

Original: [10.2, 15.1, 12.3, 19.8, 18.2, 15.9, 14.1, 18.7, 17.2, 15.3]

With a window size of 3 (each point averaged with its two immediate neighbors):

  • Point at index 1:
  • Point at index 2:
  • Point at index 3:
  • Point at index 4:
Smoothed: [12.5, 15.7, 16.8, 18.0, 16.1, 16.1, 16.2, 16.7, ...]

Notice how the wild jump from 12.3 to 19.8 becomes a gentler transition. The noise is being averaged away.

The fundamental tradeoff: smoothness vs. detail

The window size determines how aggressively you smooth. Larger windows give you smoother data, but they also blur your features. There’s always a tradeoff:

Smaller windows (relative to your peak width):

  • ✅ Preserve fine details and sharp peaks
  • ✅ Maintain peak heights and widths
  • ✅ Can resolve closely-spaced features
  • ❌ Leave more noise in the data

Moderate windows (balance point):

  • ✅ Good noise reduction
  • ✅ Features are preserved reasonably well
  • ⚠️ Peaks are slightly broadened

Larger windows (approaching peak width):

  • ✅ Very smooth, aesthetically pleasing curves
  • ✅ Noise is almost completely gone
  • ❌ Peaks are significantly broadened and shortened
  • ❌ Close peaks may merge together
  • ❌ Small features may disappear

So how do you choose? The key is to think about your data characteristics:

Example: scaling with data size

For a spectrum with 100 points and peaks spanning ~10-20 points:

  • Start with w = 3 or 5 (5-10% of peak width)
  • High noise? Try w = 7 or 9
  • Very noisy? Up to w = 11 or 15

For a high-resolution spectrum with 1000 points and peaks spanning ~100 points:

  • Start with w = 11 or 15 (~10% of peak width)
  • High noise? Try w = 21 or 31
  • Very noisy? Up to w = 51 or even 71

For ultra-high-resolution data with 50,000 points:

  • Scale proportionally - windows of 101, 201, etc. may be appropriate
  • The absolute number matters less than the ratio to your feature widths

Why use odd numbers?

Window sizes are typically odd (3, 5, 7, 9, 11) for symmetry. With an odd window, you get the same number of neighbors on both sides of the central point:

  • w = 5: 2 left + center + 2 right
  • w = 7: 3 left + center + 3 right

With an even window (like 4 or 8), you’d have to offset the window left or right, which subtly shifts your data. Odd windows keep everything centered and symmetric.

Edge effects

At the start and end of your spectrum, there aren’t enough neighbors on one side to fill the window. Different approaches handle this differently:

Truncation (most common):

  • Only smooth interior points where you have full windows
  • Leave the first and last points unsmoothed
  • Simple and honest

Edge padding:

  • Repeat the first value to create “virtual” neighbors on the left
  • Repeat the last value to create “virtual” neighbors on the right
  • NumPy’s convolve with mode='same' does this

Reflection:

  • Mirror the spectrum at the edges
  • Works well for some applications

Reduced window:

  • Use a smaller window at the edges
  • More complex but creates smooth transitions

For most spectroscopic applications, the interesting features are in the middle of the spectrum, so edge effects aren’t critical. But if you need accurate edge values (e.g., for integration), pay attention to how your software handles edges.

Real-time applications: causal (one-sided) moving average

Everything we’ve discussed so far assumes you have your complete dataset available. But what if you’re collecting data in real time, point by point?

In offline spectroscopy, you collect the entire spectrum first, then apply preprocessing. The centered moving average we’ve described works perfectly: you can use points both before and after the current position.

But in real-time applications, you can’t use future data that hasn’t been measured yet. This is critical in:

  • PAT (Process Analytical Technology): Real-time NIR or Raman monitoring during pharmaceutical manufacturing
  • Online quality control: Continuous monitoring of production streams
  • Process monitoring: Chemical reactors, fermentation, polymerization
  • Time-series spectroscopy: Kinetic measurements, reaction monitoring

In these scenarios, you need a causal (or one-sided) moving average that only uses the current point and past measurements:

Instead of centering the window around point i, the window extends backward in time from the current point.

Notice the difference from the centered version above: the window (red dots) only appears to the left of the current point. There are no “future” measurements to use. Try increasing the window size and watch how the smoothed signal lags further behind the true signal - this is the fundamental cost of real-time smoothing.

The tradeoff: lag vs. smoothing

Causal moving average introduces a time lag (or phase shift). The smoothed signal is delayed compared to the true signal by approximately time points.

Example: With w = 9 in a PAT application measuring NIR spectra every 30 seconds:

  • Time lag ≈ (9-1)/2 = 4 measurements = 2 minutes delay
  • When a process change occurs, the smoothed signal shows it 2 minutes later

Practical implications:

  • Larger windows give better noise reduction but longer delays
  • In process control, delays can affect feedback loop stability
  • For alarm systems, delayed detection could be critical

Choosing window size for real-time:

The optimal window size depends on your process dynamics, noise level, and acceptable lag time. Consider:

  • Fast processes (reactions, mixing): Small windows (e.g., w = 3 to 5) minimize lag but provide less noise reduction
  • Slow processes (fermentation, curing): Larger windows (e.g., w = 7 to 11) acceptable if process changes slowly
  • Critical alarms: Prioritize fast detection over smoothing - use minimal window sizes
  • Trending/reporting: Lag is less critical, so stronger smoothing is acceptable

Balance noise reduction against the acceptable delay for your specific application.

Implementation for real-time

def causal_moving_average(y, window_size):
"""
One-sided moving average for real-time applications.
Only uses current and past data points.
"""
smoothed = []
for i in range(len(y)):
# Window extends backward from current point
start = max(0, i - window_size + 1)
window_values = y[start:i+1]
smoothed.append(sum(window_values) / len(window_values))
return smoothed
# Real-time PAT example: NIR spectra acquired every 30 seconds
spectra = acquire_nir_spectrum() # Latest measurement
smoothed_spectrum = causal_moving_average(spectra, window_size=5)

Common questions

Can I apply moving average multiple times?

You can, but it’s not recommended. Each pass blurs features more. If one pass with w = 7 isn’t enough, use a single pass with w = 11 instead - you’ll get better results.

Should I use the same window size for all samples?

Yes, absolutely! Consistency is critical. If you’re building a calibration model or comparing spectra, use the same window size for all data (calibration, validation, and test sets). Inconsistent smoothing creates inconsistent features.

How do I know if I’m removing signal instead of just noise?

Check the residuals:

  1. Smooth your data
  2. Subtract:
  3. Plot the residuals

Good smoothing: Residuals look like random noise (no patterns, symmetric around zero)

Over-smoothing: Residuals show structured patterns (like half of a peak) - you’re removing signal. Use a smaller window.

Why does my peak look shorter after smoothing?

Averaging a sharp peak with its neighboring baseline values pulls the peak value down. The peak spreads out to become shorter and broader, though the area stays roughly the same.

This is why moving average isn’t ideal for quantitative work on sharp peaks. Savitzky-Golay smoothing preserves peak heights much better.

Code implementation

Here’s how to implement moving average in Python, MATLAB, and R:

import numpy as np
import matplotlib.pyplot as plt
def moving_average(y, window_size):
"""
Apply moving average smoothing to a 1D signal.
Parameters:
-----------
y : array-like
Input signal to be smoothed (e.g., spectrum intensities)
window_size : int
Number of points in the averaging window (use odd numbers!)
Returns:
--------
y_smooth : array
Smoothed signal (same length as input)
"""
# Create the averaging kernel (array of 1/w values)
window = np.ones(window_size) / window_size
# Apply convolution (NumPy handles edge padding automatically)
y_smooth = np.convolve(y, window, mode='same')
return y_smooth
# Example: noisy fluorescence emission spectrum
np.random.seed(42)
# Wavelength range typical for fluorescence (nm)
wavelength = np.linspace(400, 600, 100)
# Simulate a realistic fluorescence spectrum with vibronic structure
true_signal = (0.85 * np.exp(-((wavelength - 480)**2) / 400) +
0.35 * np.exp(-((wavelength - 510)**2) / 300) +
0.15 * np.exp(-((wavelength - 545)**2) / 250))
# Add realistic shot noise (proportional to sqrt of signal)
noise = np.random.normal(0, 0.08 * np.sqrt(true_signal))
noisy_spectrum = true_signal + noise
# Try different window sizes
smooth_w5 = moving_average(noisy_spectrum, window_size=5)
smooth_w9 = moving_average(noisy_spectrum, window_size=9)
smooth_w15 = moving_average(noisy_spectrum, window_size=15)
# Plot and compare
plt.figure(figsize=(12, 6))
plt.plot(wavelength, noisy_spectrum, 'o-', alpha=0.4,
label='Noisy data', color='gray', markersize=3)
plt.plot(wavelength, smooth_w5, linewidth=2.5,
label='Window = 5')
plt.plot(wavelength, smooth_w9, linewidth=2.5,
label='Window = 9')
plt.plot(wavelength, smooth_w15, linewidth=2.5,
label='Window = 15')
plt.xlabel('Wavelength (nm)', fontsize=12)
plt.ylabel('Fluorescence Intensity (a.u.)', fontsize=12)
plt.title('Moving Average Smoothing: Window Size Comparison', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Check residuals
residuals = noisy_spectrum - smooth_w9
plt.figure(figsize=(12, 4))
plt.plot(wavelength, residuals, 'o-', markersize=3)
plt.axhline(y=0, color='red', linestyle='--', alpha=0.5)
plt.xlabel('Wavelength (nm)')
plt.ylabel('Residual (Original - Smoothed)')
plt.title('Residuals: Should look like random noise with no patterns')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

When to use moving average

✅ Works well for

Quick exploratory analysis Fast and simple, perfect for initial data inspection

Broad, gentle features Wide, smooth peaks (like fluorescence emission) are handled well

Teaching and communication Intuitive concept (“averaging with neighbors”) that’s easy to explain

Baseline estimation Use a very large window (50+ points) to capture slowly-varying baseline drift

Speed-critical applications Extremely fast computation, ideal for batch processing thousands of spectra

❌ Use something better for

Sharp Raman peaks Moving average significantly reduces peak heights and increases widths. Use Savitzky-Golay instead.

Quantitative analysis Peak broadening and height reduction introduce bias in calibration curves

High-resolution work Resolution matters (vibronic structure, closely-spaced peaks) and moving average’s blurring is unacceptable

Publication-quality figures Moving average can leave visible artifacts (slight stair-stepping effects)

Advantages and limitations

Advantages

Extremely simple Easy to understand and implement

Very fast Minimal computational cost

No parameters beyond window size Just choose w and you’re done

Predictable behavior Always smooths, never does anything surprising

Limitations

Reduces peak heights Averaging pulls sharp peaks down toward their neighbors

Broadens features Peaks get wider, closely-spaced features may merge

Equal weighting All neighbors are treated equally, even those far from the center point

Not optimal for most applications Other methods (Savitzky-Golay, Whittaker) preserve features better

Practical tips

Window size selection:

  • No single “best” window - it depends on noise level, feature width, and data resolution
  • Start small (w = 5) and increase until noise is acceptable while features remain sharp
  • More data points generally allows larger windows, but always check visually
  • Always use odd numbers for symmetry
  • Consistency is key: same window for all samples in a study

Quality control:

  • Always plot before and after smoothing
  • Check residuals (original - smoothed) for structured patterns
  • If residuals show peak shapes, you’re over-smoothing

Computational efficiency:

  • NumPy’s convolve is highly optimized
  • For many spectra, vectorize the operation instead of looping
  • Moving average is fast enough that optimization usually isn’t needed

Next steps

Moving average teaches the fundamental concepts of smoothing: noise reduction vs. feature preservation, window size tradeoffs, and edge effects. These principles apply to all smoothing methods.

For most serious spectroscopic work, consider these more advanced methods:

  • Savitzky-Golay smoothing - Fits local polynomials instead of averaging. Much better peak preservation. The most common choice for Raman and IR spectroscopy.

  • Whittaker smoothing - Optimization-based approach that balances smoothness against data fidelity. Excellent general-purpose method.

  • Gaussian smoothing - Weighted averaging where close neighbors matter more than distant ones. Smoother than moving average with similar simplicity.

Moving average is the simplest smoothing method, making it perfect for quick exploratory analysis and teaching. But for quantitative work or publication-quality results, the more sophisticated methods above are usually worth the extra complexity.

References

[1]Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627-1639.
[2]Rinnan, Å., van den Berg, F., & Engelsen, S. B. (2009). Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends in Analytical Chemistry, 28(10), 1201-1222.
[3]Martens, H., & Næs, T. (1989). Multivariate Calibration. Wiley.
[4]Brereton, R. G. (2003). Chemometrics: Data Analysis for the Laboratory and Chemical Plant. Wiley.
[5]Mark, H., & Workman, J. (2007). Chemometrics in Spectroscopy. Academic Press.
[6]Ingle, J. D., & Crouch, S. R. (1988). Spectrochemical Analysis. Prentice Hall.