Noise Reduction

An overview of noise sources in spectroscopy and the smoothing methods used to reduce them

The concept of signal-to-noise ratio (SNR) migrated into analytical chemistry from electrical engineering, where it was formalized during the 1940s and 1950s as radar, telecommunications, and early computing demanded rigorous frameworks for separating meaningful signals from background interference. Claude Shannon's landmark 1948 paper "A Mathematical Theory of Communication" established the theoretical limits of information transmission through noisy channels, and Norbert Wiener's wartime work on optimal filtering (published as Extrapolation, Interpolation, and Smoothing of Stationary Time Series in 1949) provided the mathematical tools for extracting signals buried in noise. These ideas did not stay confined to engineering for long. By the 1950s and 1960s, spectroscopists were adopting the same language and mathematics to quantify the performance of their instruments and the quality of their measurements.

The digital revolution of the 1960s and 1970s transformed noise reduction from an analog art into a computational science. Before digitization, smoothing meant physical tricks: slowing the scan speed of a spectrometer, increasing the slit width, or using analog RC filters on the detector output. Each of these improved the SNR at the cost of spectral resolution or measurement time. When laboratory instruments began producing digital output -- discrete numerical values at evenly spaced wavelength or frequency intervals -- the entire toolkit of digital signal processing (DSP) became available. Suddenly, smoothing could be applied after acquisition, without any compromise to how the data was collected. Savitzky and Golay's 1964 paper on polynomial smoothing filters, published in Analytical Chemistry, was one of the first to exploit this new paradigm directly for spectroscopic data, and it remains one of the most cited papers in the history of the field.

In modern spectroscopy, noise is not a single phenomenon but a family of effects with distinct physical origins. Shot noise arises from the quantum nature of photon detection -- photons arrive at the detector as discrete events following Poisson statistics, creating statistical fluctuations that scale with the square root of the signal. Thermal noise (Johnson-Nyquist noise) originates from random electron motion in the detector and electronics, present even in the absence of any optical signal. Dark current generates a signal in photodetectors even when no light is incident, due to thermally generated charge carriers. Digitization noise appears when the analog detector signal is converted to digital numbers, introducing rounding errors proportional to the resolution of the analog-to-digital converter. Understanding which noise source dominates your measurement is the first step toward choosing an effective reduction strategy.

What is noise?

In the context of spectroscopy and chemometrics, noise is any unwanted random fluctuation superimposed on the true signal. If you measure the same sample ten times under identical conditions, you will get ten slightly different spectra. The variation between those repeated measurements is noise.

The quality of a measurement is quantified by the signal-to-noise ratio (SNR):

SNR = \frac{x ˉ}{s}

where $\overset{x}{ˉ}$ is the mean signal intensity and $s$ is the standard deviation of repeated measurements at that point. A higher SNR means a cleaner measurement.

In practice, SNR is often expressed in decibels:

SNR_{dB} = 20 lo g_{10} (\frac{x ˉ}{s})

Why noise matters for chemometrics

Noise is not just an aesthetic problem. It has concrete consequences for multivariate analysis:

Calibration models fit noise as if it were signal (overfitting), reducing prediction accuracy on new samples
Classification models may find spurious differences between groups that are actually just noise fluctuations
Peak detection algorithms produce false positives (noise spikes mistaken for peaks) and false negatives (real peaks buried in noise)
Derivative spectra amplify noise dramatically -- the first derivative doubles the noise level, the second derivative amplifies it further
Variable selection methods may select noisy, uninformative variables

The goal of noise reduction is to suppress these random fluctuations while preserving the chemical information encoded in the spectral features (peak positions, heights, widths, and shapes).

Types of noise in spectroscopy

Not all noise is the same. Different physical mechanisms produce different noise characteristics, and the distinction matters for choosing the right reduction strategy.

Random noise (additive, homoscedastic)

The simplest and most common noise model: a random value, drawn from a Gaussian distribution with zero mean, is added independently to each data point.

y_{i} = x_{i} + ε_{i}, ε_{i} \sim N (0, σ^{2})

Here $x_{i}$ is the true signal at point $i$ , $ε_{i}$ is the noise, and $σ$ is constant across the spectrum. This is called homoscedastic noise because its magnitude does not depend on the signal level.

Thermal noise in electronic circuits is the prototypical example. It is present at all wavelengths with roughly equal intensity, independent of the signal. Most smoothing methods are designed and analyzed assuming this noise model, and they work well when it holds.

Shot noise (signal-dependent)

Shot noise arises from the discrete nature of photon counting. Because photons arrive randomly, the number detected in a given time interval follows a Poisson distribution, whose variance equals the mean count. For a signal of intensity $I$ :

σ_{shot} = I

This means stronger signals have more absolute noise, but their relative noise (as a fraction of the signal) is lower. In UV-Vis or fluorescence spectroscopy where photon counts are moderate, shot noise is often the dominant noise source. Its signal-dependent character means that smoothing methods, which assume constant noise, may over-smooth low-intensity regions and under-smooth high-intensity regions.

Systematic noise

Not all unwanted variation is random. Systematic noise includes:

Baseline drift: A slow, smooth variation of the entire spectrum due to temperature changes, lamp aging, or detector drift. This is not random noise -- it is a deterministic trend that changes slowly over time. Baseline drift is better addressed by baseline correction than by smoothing.
Interference fringes: Periodic oscillations caused by thin-film interference in sample cells or optical elements. These appear as a sinusoidal modulation of the spectrum and cannot be removed by simple smoothing. Fourier filtering or dedicated fringe-removal algorithms are needed.
Spikes and cosmic rays: In Raman spectroscopy and CCD-based detectors, cosmic rays occasionally strike the detector, producing extremely sharp, intense spikes that span one or two pixels. These are not Gaussian noise and should be removed by spike detection algorithms (e.g., median filtering or derivative-based detection) before smoothing.

Multiplicative noise

In diffuse reflectance spectroscopy (NIR, DRIFTS), scattering effects cause the entire spectrum to be multiplied by a random factor that varies from sample to sample. This is not additive noise at all -- it is a multiplicative distortion:

y_{i} = (a + b x_{i}) \cdot x_{i}^{true} + ε_{i}

Multiplicative scatter effects are better handled by MSC (Multiplicative Scatter Correction) or EMSC and SNV (Standard Normal Variate) rather than smoothing.

Heteroscedastic noise

When the noise level changes across the spectrum -- for example, higher noise in regions of low detector sensitivity or near the edges of the spectral range -- the noise is called heteroscedastic. This is common in real instruments. Standard smoothing methods apply the same degree of smoothing everywhere, which may be too much in low-noise regions and too little in high-noise regions.

Know your noise before you smooth

Smoothing methods are designed for random, additive noise. They are not the right tool for baseline drift (use baseline correction), multiplicative scatter (use MSC/SNV), cosmic ray spikes (use spike detection), or interference fringes (use Fourier filtering). Applying smoothing to systematic artifacts will not remove them and may distort real spectral features. Diagnose your noise sources before choosing a preprocessing strategy.

The smoothing approach

Smoothing methods exploit a simple asymmetry between signal and noise: spectroscopic signals tend to change smoothly from one wavelength to the next (because they arise from continuous physical phenomena like molecular vibrations or electronic transitions), while random noise fluctuates independently at each point. By averaging or fitting over a local window of data points, the random fluctuations tend to cancel out while the smooth signal survives.

The four main smoothing methods used in chemometrics, from simplest to most sophisticated, are summarized below. Each has a dedicated article with full mathematical derivations, interactive visualizations, and code examples.

Moving average

The simplest smoother: replace each point with the unweighted average of itself and its neighbors.

\overset{y}{^}_{i} = \frac{1}{w} j = i - m \sum i + m y_{j}, m = \frac{w - 1}{2}

Strengths: Extremely simple, fast, easy to understand. One parameter (window size $w$ ).

Weaknesses: All neighbors weighted equally (even distant ones). Broadens and flattens peaks. Not ideal for quantitative work.

Best for: Quick exploratory analysis, very broad features, teaching the concept of smoothing.

Full details: Moving Average Smoothing

Gaussian smoothing

Weighted averaging where closer neighbors get more weight, following the Gaussian bell curve. One parameter: $σ$ controls the width of the weighting function.

\overset{y}{^}_{i} = \frac{\sum _{j} w ( j - i ) \cdot y _{j}}{\sum _{j} w ( j - i )}, w (x) = e^{- x^{2} / (2 σ^{2})}

Strengths: More natural weighting than moving average. Smooth results without blocky artifacts. Extends naturally to 2D data (images).

Weaknesses: Still broadens peaks (though less than moving average). Cannot compute derivatives.

Best for: Natural-looking smooth curves, 2D spectral imaging data, when simplicity is valued.

Full details: Gaussian Smoothing

Savitzky-Golay smoothing

Instead of averaging, fits a local polynomial through each window and evaluates it at the center point. The gold standard for spectroscopy.

\overset{y}{^}_{i} = j = - m \sum m c_{j} \cdot y_{i + j}

where the coefficients $c_{j}$ are derived from local polynomial least squares fitting.

Strengths: Excellent peak preservation. Can compute smoothed derivatives simultaneously. Well-established and widely accepted. Two parameters (window size and polynomial order) allow fine-tuning.

Weaknesses: Two parameters to tune. Can overfit with high polynomial orders. Assumes locally polynomial signal.

Best for: IR, Raman, UV-Vis spectroscopy. Any application where peak fidelity matters. Computing smoothed derivatives.

Full details: Savitzky-Golay Smoothing

Whittaker smoothing

Frames smoothing as a penalized least squares optimization: find the smoothest curve that still fits the data. One parameter: $λ$ balances fidelity to data against smoothness.

\hat{y} min {i \sum (y_{i} - \overset{y}{^}_{i})^{2} + λ i \sum (Δ^{d} \overset{y}{^}_{i})^{2}}

Strengths: Global optimization (considers the entire spectrum at once). Single intuitive parameter. Excellent peak preservation. Natural extension to weighted smoothing and baseline correction.

Weaknesses: Requires matrix operations (though fast with sparse matrices). Parameter $λ$ spans orders of magnitude.

Best for: General-purpose smoothing. Baseline correction (via asymmetric variants). When a single-parameter method with excellent quality is desired.

Full details: Whittaker Smoothing

Choosing a smoothing method

The choice of smoothing method depends on your data, your analytical goal, and the nature of your spectral features.

Decision guide for smoothing

How sharp are your peaks?

Broad, gentle features (fluorescence, UV-Vis absorbance): Moving average or Gaussian may suffice
Sharp peaks (Raman, high-resolution IR): Use Savitzky-Golay or Whittaker

Do you need derivatives?

Yes: Savitzky-Golay is the clear choice -- it computes smoothed derivatives in one step
No: Any method will work; choose based on other criteria

How much noise?

Low noise: Light smoothing with small windows (any method)
High noise: Larger windows or stronger smoothing, but watch for feature distortion
Extreme noise: Consider ensemble averaging (multiple scans) before digital smoothing

Real-time or offline?

Offline (all data available): Any method; use centered (symmetric) windows
Real-time (PAT, process monitoring): Use causal (one-sided) moving average or exponential smoothing

How many parameters do you want to tune?

One parameter: Moving average (window size), Gaussian ( $σ$ ), or Whittaker ( $λ$ )
Two parameters: Savitzky-Golay (window size + polynomial order)

Comparison table

Feature	Moving Average	Gaussian	Savitzky-Golay	Whittaker
Peak preservation	Poor	Moderate	Excellent	Excellent
Parameters	1 (window)	1 (sigma)	2 (window, order)	1-2 (lambda, d)
Computation	Very fast	Fast	Fast	Fast (sparse)
Derivatives	No	No	Yes	Not directly
Ease of use	Easiest	Easy	Moderate	Moderate
Best for	Quick exploration	Natural smoothing	Spectroscopy	General purpose

Beyond smoothing

Smoothing is the most common approach to noise reduction, but it is not the only one. Several other strategies are used in spectroscopy, either as alternatives to smoothing or in combination with it.

Ensemble averaging (signal averaging)

The most fundamental noise reduction technique: measure the same sample multiple times and average the spectra. If the noise is random and independent between measurements, averaging $n$ spectra reduces the noise standard deviation by a factor of $n$ :

σ_{averaged} = \frac{σ _{single}}{n}

This means 4 scans halve the noise, 16 scans reduce it by a factor of 4, and 100 scans reduce it by a factor of 10. The improvement follows the law of diminishing returns -- each additional factor-of-two improvement requires four times as many scans.

Ensemble averaging is often the best first line of defense against noise, because it introduces no distortion at all (unlike smoothing, which always involves some tradeoff with resolution). Most FTIR instruments default to 16 or 32 co-added scans for exactly this reason. The limitation is measurement time: in process monitoring or kinetic studies, you may not have time for multiple scans.

Fourier filtering

The signal and noise often occupy different regions of the frequency domain. Spectroscopic features (peaks, baselines) correspond to low-frequency components, while random noise is spread across all frequencies. By transforming the spectrum to the frequency domain (via the Fast Fourier Transform, FFT), suppressing high-frequency components, and transforming back, you can reduce noise while preserving spectral features.

The practical challenge is choosing the frequency cutoff. Set it too low and you remove real spectral features along with the noise. Set it too high and too much noise remains. Interference fringes, which appear as sharp peaks in the frequency domain, can be selectively removed by this approach -- a task that smoothing methods cannot accomplish.

Wavelet denoising

Wavelets decompose a signal into components at different scales (both frequency and position), offering a more flexible decomposition than Fourier analysis. Noise, which tends to produce small coefficients at fine scales, can be suppressed by thresholding: set small wavelet coefficients to zero and reconstruct the signal from the remaining coefficients.

Wavelet denoising can achieve excellent results, particularly for signals with sharp localized features (like Raman peaks) superimposed on smooth backgrounds. However, it requires choosing a wavelet family, a decomposition level, and a thresholding strategy, making it more complex to tune than standard smoothing methods. It is less commonly used in routine chemometric workflows but sees application in specialized contexts.

Median filtering

Median filtering replaces each point with the median (not the mean) of itself and its neighbors. Unlike mean-based smoothing, the median is robust to outliers. A single cosmic ray spike surrounded by normal values is completely eliminated by the median filter, whereas a moving average would only reduce it.

Median filtering is typically used as a preprocessing step to remove spikes before applying a standard smoothing method. It is not a general-purpose smoother because it can flatten broad features and create a staircase effect on smooth curves.

Measuring noise reduction

After applying any noise reduction method, you should verify that it worked as intended: noise was reduced without distorting the signal. Three approaches are standard.

Signal-to-noise ratio improvement

Measure the SNR before and after preprocessing. For a region of the spectrum where you know the signal should be approximately constant (e.g., a flat baseline region), compute:

SNR improvement = \frac{SNR _{after}}{SNR _{before}}

For random noise, the theoretical SNR improvement from a moving average with window $w$ is $w$ . A window of 9 should improve SNR by a factor of 3. If you see much less improvement, the noise may not be random, or the window is too small.

Residual analysis

Subtract the smoothed spectrum from the original:

residual_{i} = y_{i} - \overset{y}{^}_{i}

Plot the residuals and inspect them:

Good smoothing: Residuals look like random noise -- no visible patterns, approximately symmetric around zero, constant amplitude
Over-smoothing: Residuals show structured patterns (bumps that look like half a peak). You are removing signal.
Under-smoothing: Residuals still contain visible noise with higher amplitude than expected

Checking for signal distortion

Compare key spectral features before and after smoothing:

Peak positions: Did peaks shift? (They should not.)
Peak heights: Did peaks shrink? (Some reduction is expected with moving average and Gaussian; minimal with Savitzky-Golay and Whittaker.)
Peak widths: Did peaks broaden? (FWHM should remain approximately constant.)
Peak areas: Integration under peaks should be approximately preserved, even if heights change.
Resolution: Can you still resolve closely spaced peaks? Over-smoothing merges them.

Practical tips

Before smoothing

Diagnose your noise. Is it random (Gaussian), signal-dependent (shot noise), or systematic (baseline drift, fringes, spikes)? Smoothing is designed for random noise only.

Consider ensemble averaging first. If you can measure multiple scans, averaging is distortion-free and should be your first line of defense.

Remove spikes before smoothing. Cosmic ray spikes and outlier points should be detected and removed (e.g., by median filtering) before applying any smoothing method. A single spike can corrupt a large window of averaged values.

During smoothing

Start with mild smoothing and increase gradually. It is easier to add more smoothing than to undo over-smoothing.

Use the same preprocessing for all samples. Calibration, validation, and test sets must be smoothed with identical parameters. Inconsistent smoothing is a subtle but serious source of error.

Always compare before and after. Plot the original and smoothed spectra on the same axes. If you cannot see a difference, the smoothing is too mild. If the features look different, it may be too strong.

After smoothing

Check the residuals. Plot (original - smoothed). Residuals should look like random noise with no structure. If you see peak-shaped features in the residuals, you are removing signal.

Verify peak metrics. Compare peak positions, heights, and widths before and after. Significant changes indicate over-smoothing or an inappropriate method.

Document your parameters. Record the method, window size, polynomial order, or any other parameters used. Future users (including yourself six months from now) need to know exactly what was done.

Code implementation

The following example demonstrates how to calculate SNR and compare smoothing methods on a synthetic noisy spectrum. For detailed code for each individual method, see the dedicated articles linked above.

import numpy as np
from scipy.signal import savgol_filter
from scipy.ndimage import gaussian_filter1d
import matplotlib.pyplot as plt

# Generate a synthetic spectrum with known noise level
np.random.seed(42)
wavelength = np.linspace(400, 800, 500)

# True signal: two peaks on a sloped baseline
true_signal = (0.7 * np.exp(-((wavelength - 520)**2) / 300) +
               0.5 * np.exp(-((wavelength - 650)**2) / 200) +
               0.001 * (wavelength - 400))

# Add Gaussian noise
noise_std = 0.05
noise = np.random.normal(0, noise_std, len(wavelength))
noisy_spectrum = true_signal + noise

# Apply different smoothing methods
ma_smooth = np.convolve(noisy_spectrum, np.ones(11)/11, mode='same')
gauss_smooth = gaussian_filter1d(noisy_spectrum, sigma=2.0)
sg_smooth = savgol_filter(noisy_spectrum, window_length=11, polyorder=2)


def estimate_snr(signal, noise_region_start, noise_region_end, wavelength):
    """
    Estimate SNR from a flat baseline region.

    Parameters:
    -----------
    signal : array
        Spectrum (smoothed or raw)
    noise_region_start, noise_region_end : float
        Wavelength range of a flat baseline region
    wavelength : array
        Wavelength axis

    Returns:
    --------
    snr : float
        Signal-to-noise ratio
    """
    mask = (wavelength >= noise_region_start) & (wavelength <= noise_region_end)
    region = signal[mask]
    return np.mean(np.abs(region)) / np.std(region)


# Measure SNR in a baseline region (750-800 nm)
snr_raw = estimate_snr(noisy_spectrum, 750, 800, wavelength)
snr_ma = estimate_snr(ma_smooth, 750, 800, wavelength)
snr_gauss = estimate_snr(gauss_smooth, 750, 800, wavelength)
snr_sg = estimate_snr(sg_smooth, 750, 800, wavelength)

print(f"SNR (raw):              {snr_raw:.1f}")
print(f"SNR (Moving Average):   {snr_ma:.1f}")
print(f"SNR (Gaussian):         {snr_gauss:.1f}")
print(f"SNR (Savitzky-Golay):   {snr_sg:.1f}")

# Plot comparison
fig, axes = plt.subplots(2, 1, figsize=(12, 8))

# Top panel: smoothed spectra
ax1 = axes[0]
ax1.plot(wavelength, noisy_spectrum, alpha=0.4, color='gray',
         label='Noisy', linewidth=0.8)
ax1.plot(wavelength, ma_smooth, label='Moving Average (w=11)',
         linewidth=2)
ax1.plot(wavelength, gauss_smooth, label='Gaussian (sigma=2)',
         linewidth=2, linestyle='--')
ax1.plot(wavelength, sg_smooth, label='Savitzky-Golay (w=11, p=2)',
         linewidth=2, linestyle='-.')
ax1.plot(wavelength, true_signal, 'k--', alpha=0.5,
         label='True signal', linewidth=1.5)
ax1.set_xlabel('Wavelength (nm)')
ax1.set_ylabel('Intensity (a.u.)')
ax1.set_title('Smoothing Method Comparison')
ax1.legend(fontsize=9)
ax1.grid(True, alpha=0.3)

# Bottom panel: residuals (original - smoothed)
ax2 = axes[1]
ax2.plot(wavelength, noisy_spectrum - ma_smooth, alpha=0.7,
         label='MA residuals')
ax2.plot(wavelength, noisy_spectrum - gauss_smooth, alpha=0.7,
         label='Gaussian residuals')
ax2.plot(wavelength, noisy_spectrum - sg_smooth, alpha=0.7,
         label='SG residuals')
ax2.axhline(0, color='black', linewidth=0.5)
ax2.set_xlabel('Wavelength (nm)')
ax2.set_ylabel('Residual')
ax2.set_title('Residual Analysis (should look like random noise)')
ax2.legend(fontsize=9)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

% Generate a synthetic spectrum with known noise
rng(42);
wavelength = linspace(400, 800, 500);

% True signal: two peaks on a sloped baseline
true_signal = 0.7 * exp(-((wavelength - 520).^2) / 300) + ...
              0.5 * exp(-((wavelength - 650).^2) / 200) + ...
              0.001 * (wavelength - 400);

% Add Gaussian noise
noise_std = 0.05;
noise = noise_std * randn(size(wavelength));
noisy_spectrum = true_signal + noise;

% Apply different smoothing methods
ma_smooth = conv(noisy_spectrum, ones(1,11)/11, 'same');
gauss_smooth = imgaussfilt(noisy_spectrum, 2.0);
sg_smooth = sgolayfilt(noisy_spectrum, 2, 11);

% Measure SNR in baseline region (750-800 nm)
mask = wavelength >= 750 & wavelength <= 800;
snr_raw = mean(abs(noisy_spectrum(mask))) / std(noisy_spectrum(mask));
snr_ma = mean(abs(ma_smooth(mask))) / std(ma_smooth(mask));
snr_gauss = mean(abs(gauss_smooth(mask))) / std(gauss_smooth(mask));
snr_sg = mean(abs(sg_smooth(mask))) / std(sg_smooth(mask));

fprintf('SNR (raw):              %.1f\n', snr_raw);
fprintf('SNR (Moving Average):   %.1f\n', snr_ma);
fprintf('SNR (Gaussian):         %.1f\n', snr_gauss);
fprintf('SNR (Savitzky-Golay):   %.1f\n', snr_sg);

% Plot comparison
figure('Position', [100 100 1000 700]);

subplot(2, 1, 1);
plot(wavelength, noisy_spectrum, 'Color', [0.7 0.7 0.7 0.5], ...
     'LineWidth', 0.8, 'DisplayName', 'Noisy');
hold on;
plot(wavelength, ma_smooth, 'LineWidth', 2, ...
     'DisplayName', 'Moving Average (w=11)');
plot(wavelength, gauss_smooth, '--', 'LineWidth', 2, ...
     'DisplayName', 'Gaussian (\sigma=2)');
plot(wavelength, sg_smooth, '-.', 'LineWidth', 2, ...
     'DisplayName', 'Savitzky-Golay (w=11, p=2)');
plot(wavelength, true_signal, '--k', 'LineWidth', 1.5, ...
     'DisplayName', 'True signal');
xlabel('Wavelength (nm)');
ylabel('Intensity (a.u.)');
title('Smoothing Method Comparison');
legend('Location', 'northeast', 'FontSize', 9);
grid on;

subplot(2, 1, 2);
plot(wavelength, noisy_spectrum - ma_smooth, ...
     'DisplayName', 'MA residuals');
hold on;
plot(wavelength, noisy_spectrum - gauss_smooth, ...
     'DisplayName', 'Gaussian residuals');
plot(wavelength, noisy_spectrum - sg_smooth, ...
     'DisplayName', 'SG residuals');
yline(0, 'k-', 'LineWidth', 0.5);
xlabel('Wavelength (nm)');
ylabel('Residual');
title('Residual Analysis');
legend('Location', 'northeast', 'FontSize', 9);
grid on;

library(signal)  # For sgolayfilt

# Generate a synthetic spectrum with known noise
set.seed(42)
wavelength <- seq(400, 800, length.out = 500)

# True signal: two peaks on a sloped baseline
true_signal <- 0.7 * exp(-((wavelength - 520)^2) / 300) +
               0.5 * exp(-((wavelength - 650)^2) / 200) +
               0.001 * (wavelength - 400)

# Add Gaussian noise
noise_std <- 0.05
noise <- rnorm(length(wavelength), mean = 0, sd = noise_std)
noisy_spectrum <- true_signal + noise

# Apply different smoothing methods
# Moving average
ma_kernel <- rep(1/11, 11)
ma_smooth <- stats::filter(noisy_spectrum, ma_kernel, sides = 2)
ma_smooth[is.na(ma_smooth)] <- noisy_spectrum[is.na(ma_smooth)]
ma_smooth <- as.vector(ma_smooth)

# Gaussian smoothing
gaussian_kernel_fn <- function(sigma, size = NULL) {
  if (is.null(size)) {
    size <- ceiling(6 * sigma)
    if (size %% 2 == 0) size <- size + 1
  }
  x <- seq(-(size - 1)/2, (size - 1)/2, by = 1)
  kernel <- exp(-x^2 / (2 * sigma^2))
  kernel / sum(kernel)
}
gauss_k <- gaussian_kernel_fn(2.0)
gauss_smooth <- stats::filter(noisy_spectrum, gauss_k, sides = 2)
gauss_smooth[is.na(gauss_smooth)] <- noisy_spectrum[is.na(gauss_smooth)]
gauss_smooth <- as.vector(gauss_smooth)

# Savitzky-Golay
sg_smooth <- sgolayfilt(noisy_spectrum, p = 2, n = 11)

# Measure SNR in baseline region (750-800 nm)
mask <- wavelength >= 750 & wavelength <= 800
snr_raw <- mean(abs(noisy_spectrum[mask])) / sd(noisy_spectrum[mask])
snr_ma <- mean(abs(ma_smooth[mask])) / sd(ma_smooth[mask])
snr_gauss <- mean(abs(gauss_smooth[mask])) / sd(gauss_smooth[mask])
snr_sg <- mean(abs(sg_smooth[mask])) / sd(sg_smooth[mask])

cat(sprintf("SNR (raw):              %.1f\n", snr_raw))
cat(sprintf("SNR (Moving Average):   %.1f\n", snr_ma))
cat(sprintf("SNR (Gaussian):         %.1f\n", snr_gauss))
cat(sprintf("SNR (Savitzky-Golay):   %.1f\n", snr_sg))

# Plot comparison
par(mfrow = c(2, 1), mar = c(4, 4, 3, 1))

# Top panel: smoothed spectra
plot(wavelength, noisy_spectrum, type = "l",
     col = rgb(0.7, 0.7, 0.7, 0.5), lwd = 0.8,
     xlab = "Wavelength (nm)", ylab = "Intensity (a.u.)",
     main = "Smoothing Method Comparison")
lines(wavelength, ma_smooth, col = "#2563eb", lwd = 2)
lines(wavelength, gauss_smooth, col = "#dc2626", lwd = 2, lty = 2)
lines(wavelength, sg_smooth, col = "#059669", lwd = 2, lty = 4)
lines(wavelength, true_signal, col = "black", lwd = 1.5, lty = 2)
legend("topright",
       legend = c("Noisy", "Moving Average", "Gaussian",
                  "Savitzky-Golay", "True"),
       col = c(rgb(0.7, 0.7, 0.7), "#2563eb", "#dc2626",
               "#059669", "black"),
       lwd = c(0.8, 2, 2, 2, 1.5),
       lty = c(1, 1, 2, 4, 2), cex = 0.8)
grid()

# Bottom panel: residuals
plot(wavelength, noisy_spectrum - ma_smooth, type = "l",
     col = "#2563eb", lwd = 1,
     xlab = "Wavelength (nm)", ylab = "Residual",
     main = "Residual Analysis",
     ylim = range(c(noisy_spectrum - ma_smooth,
                     noisy_spectrum - sg_smooth)))
lines(wavelength, noisy_spectrum - gauss_smooth,
      col = "#dc2626", lwd = 1)
lines(wavelength, noisy_spectrum - sg_smooth,
      col = "#059669", lwd = 1)
abline(h = 0, col = "black", lwd = 0.5)
legend("topright",
       legend = c("MA residuals", "Gaussian residuals",
                  "SG residuals"),
       col = c("#2563eb", "#dc2626", "#059669"),
       lwd = 1, cex = 0.8)
grid()

References

[1] Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627-1639.

[2] Eilers, P. H. C. (2003). A perfect smoother. Analytical Chemistry, 75(14), 3631-3636.

[3] Rinnan, A., van den Berg, F., & Engelsen, S. B. (2009). Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends in Analytical Chemistry, 28(10), 1201-1222.

[4] Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379-423.

[5] Wiener, N. (1949). Extrapolation, Interpolation, and Smoothing of Stationary Time Series. MIT Press.

[6] Mark, H., & Workman, J. (2007). Chemometrics in Spectroscopy. Academic Press.

[7] Brereton, R. G. (2003). Chemometrics: Data Analysis for the Laboratory and Chemical Plant. Wiley.

[8] Ingle, J. D., & Crouch, S. R. (1988). Spectrochemical Analysis. Prentice Hall.

[9] Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41(3), 613-627.

[10] Martens, H., & Naes, T. (1989). Multivariate Calibration. Wiley.

Noise Reduction

Before smoothing

During smoothing

After smoothing

On this page