Multiplicative Scatter Correction (MSC)
In the 1980s, near-infrared (NIR) spectroscopy was rapidly gaining ground as a fast, non-destructive tool for analyzing agricultural and food products. Researchers could point an NIR instrument at a sample of grain, flour, or powdered milk and, in seconds, obtain a spectrum related to its protein, moisture, or fat content. But there was a stubborn problem: the spectra were dominated by light scattering. When NIR radiation hits a powder or granular sample, much of it bounces around between particles before reaching the detector. The amount of scattering depends on physical properties (particle size, surface texture, how tightly the sample is packed) that have nothing to do with the chemistry. Two samples with identical chemical composition but different grind sizes would produce visibly different spectra, and this physical noise made it difficult to build reliable calibration models.
Harald Martens and Tormod Naes, both working in Norway at the time, addressed this challenge directly. Their insight was that the scatter effect on each spectrum could be approximated as a simple linear transformation: a multiplicative scaling (slope change) and an additive offset relative to some ideal, scatter-free reference spectrum. If you could estimate those two parameters for each spectrum by regressing it against the reference, you could reverse the distortion and recover the underlying chemical signal. They called this Multiplicative Scatter Correction (MSC) and presented it as part of their broader framework for multivariate calibration.
The method was published in detail in their 1989 book Multivariate Calibration (Wiley), which became one of the foundational textbooks of chemometrics. The book covered not only MSC but also PLS regression, cross-validation, and many other methods that would define standard practice in the field. MSC itself became a default preprocessing step for NIR diffuse reflectance spectroscopy, particularly in the food, agricultural, and pharmaceutical industries where powdered and granular samples are routine. Its simplicity (just a linear regression per spectrum) and its clear physical interpretation (correcting for scatter) made it one of those rare methods that practitioners adopted almost universally.
The scatter problem in spectroscopy
When you measure diffuse reflectance spectra—especially in NIR spectroscopy—you’re dealing with a fundamental challenge: light scattering. The light doesn’t just interact with your sample’s chemistry; it also scatters based on physical properties like particle size, surface roughness, and sample packing.
This scattering creates two main effects:
- Multiplicative scatter: Changes the overall slope/curvature of your spectrum
- Additive scatter: Shifts the baseline up or down
The problem? These physical effects obscure the chemical information you actually care about. Two identical samples with different particle sizes will give you different spectra, even though their chemistry is the same.
MSC (Multiplicative Scatter Correction) removes these scatter effects by treating them as linear distortions of a reference spectrum.
The core idea: linear regression to a reference
MSC assumes each spectrum is a linear transformation of an ideal “scatter-free” reference spectrum:
Where:
- yi = your measured spectrum (with scatter)
- yref = reference spectrum (usually the mean of all spectra)
- ai = additive scatter offset
- bi = multiplicative scatter scaling
The correction simply reverses this transformation:
By fitting each spectrum to the reference and removing the scatter-induced slope and offset, you recover the underlying chemical signal.
The algorithm
Step 1: Calculate reference spectrum Usually the mean spectrum across all your samples:
Step 2: For each spectrum, fit linear regression Regress your spectrum against the reference:
This gives you the scatter coefficients ai (offset) and bi (slope).
Step 3: Correct the spectrum Remove the scatter effects:
That’s it! The corrected spectra all have the same scatter characteristics as the reference.
When to use MSC
MSC is ideal for:
- NIR diffuse reflectance spectroscopy
- Solid samples where particle size varies
- Powdered samples with different packing densities
- Tablets with surface roughness differences
- Agricultural samples (grain, flour, etc.)
MSC works best when:
- Scatter effects are the dominant source of variation
- Your samples have similar chemistry but different physical properties
- You’re building calibration models for quantitative analysis
Avoid MSC when:
- Your samples have very different chemical compositions
- You’re looking at neat liquids (no scatter effects)
- Sample variation is primarily chemical, not physical
Code examples
import numpy as npimport matplotlib.pyplot as plt
def msc(spectra, reference=None): """ Multiplicative Scatter Correction.
Parameters: ----------- spectra : array-like, shape (n_samples, n_wavelengths) Input spectra to be corrected reference : array-like, shape (n_wavelengths,), optional Reference spectrum. If None, uses mean of all spectra.
Returns: -------- corrected : array MSC-corrected spectra, same shape as input """ spectra = np.asarray(spectra)
# Calculate reference spectrum (mean if not provided) if reference is None: reference = np.mean(spectra, axis=0)
# Initialize corrected spectra corrected = np.zeros_like(spectra)
# Correct each spectrum for i in range(spectra.shape[0]): # Fit linear regression: spectrum = a + b * reference # Using least squares: [a, b] = (X^T X)^-1 X^T y # where X = [ones, reference]
X = np.vstack([np.ones(len(reference)), reference]).T fit = np.linalg.lstsq(X, spectra[i], rcond=None)[0]
a = fit[0] # Offset b = fit[1] # Slope
# Correct: remove scatter effects corrected[i] = (spectra[i] - a) / b
return corrected
# Example: NIR spectra with scatternp.random.seed(42)n_samples = 20n_wavelengths = 100wavelengths = np.linspace(1000, 2500, n_wavelengths)
# Create "ideal" spectrum (chemical signal)ideal_spectrum = (0.5 * np.exp(-((wavelengths - 1500)**2) / 50000) + 0.3 * np.exp(-((wavelengths - 2000)**2) / 30000))
# Simulate scatter effects (different for each sample)raw_spectra = []for i in range(n_samples): # Random scatter: offset + slope a = np.random.uniform(-0.1, 0.1) # Additive offset b = np.random.uniform(0.8, 1.2) # Multiplicative slope
# Add small chemical variation chemical_var = ideal_spectrum + np.random.normal(0, 0.02, n_wavelengths)
# Apply scatter + noise spectrum = a + b * chemical_var + np.random.normal(0, 0.01, n_wavelengths) raw_spectra.append(spectrum)
raw_spectra = np.array(raw_spectra)
# Apply MSC correctioncorrected_spectra = msc(raw_spectra)
# Plot resultsfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Before MSCfor i in range(n_samples): ax1.plot(wavelengths, raw_spectra[i], alpha=0.6, linewidth=1)ax1.set_xlabel('Wavelength (nm)')ax1.set_ylabel('Absorbance')ax1.set_title('Before MSC (scatter effects visible)')ax1.grid(True, alpha=0.3)
# After MSCfor i in range(n_samples): ax2.plot(wavelengths, corrected_spectra[i], alpha=0.6, linewidth=1)ax2.set_xlabel('Wavelength (nm)')ax2.set_ylabel('Absorbance')ax2.set_title('After MSC (scatter corrected)')ax2.grid(True, alpha=0.3)
plt.tight_layout()plt.show()
# Compare standard deviationsprint(f"Std before MSC: {np.std(raw_spectra, axis=0).mean():.4f}")print(f"Std after MSC: {np.std(corrected_spectra, axis=0).mean():.4f}")function corrected = msc(spectra, reference) % Multiplicative Scatter Correction % % Parameters: % spectra - Matrix of spectra (rows = samples, cols = wavelengths) % reference - Optional reference spectrum (default: mean) % % Returns: % corrected - MSC-corrected spectra
[n_samples, n_wavelengths] = size(spectra);
% Calculate reference (mean if not provided) if nargin < 2 || isempty(reference) reference = mean(spectra, 1); end
% Initialize output corrected = zeros(size(spectra));
% Correct each spectrum for i = 1:n_samples % Design matrix for linear regression X = [ones(n_wavelengths, 1), reference(:)];
% Fit: spectrum = a + b * reference fit = X \ spectra(i, :)';
a = fit(1); % Offset b = fit(2); % Slope
% Correct corrected(i, :) = (spectra(i, :) - a) / b; endend
% Example usagerng(42);n_samples = 20;n_wavelengths = 100;wavelengths = linspace(1000, 2500, n_wavelengths);
% Ideal spectrumideal_spectrum = 0.5 * exp(-((wavelengths - 1500).^2) / 50000) + ... 0.3 * exp(-((wavelengths - 2000).^2) / 30000);
% Simulate scatterraw_spectra = zeros(n_samples, n_wavelengths);for i = 1:n_samples a = rand() * 0.2 - 0.1; % Offset b = rand() * 0.4 + 0.8; % Slope
chemical_var = ideal_spectrum + randn(1, n_wavelengths) * 0.02; raw_spectra(i, :) = a + b * chemical_var + randn(1, n_wavelengths) * 0.01;end
% Apply MSCcorrected_spectra = msc(raw_spectra);
% Plotfigure;subplot(1, 2, 1);plot(wavelengths, raw_spectra', 'LineWidth', 1);xlabel('Wavelength (nm)');ylabel('Absorbance');title('Before MSC');grid on;
subplot(1, 2, 2);plot(wavelengths, corrected_spectra', 'LineWidth', 1);xlabel('Wavelength (nm)');ylabel('Absorbance');title('After MSC');grid on;
fprintf('Std before MSC: %.4f\n', mean(std(raw_spectra, 0, 1)));fprintf('Std after MSC: %.4f\n', mean(std(corrected_spectra, 0, 1)));msc <- function(spectra, reference = NULL) { # Multiplicative Scatter Correction # # Parameters: # spectra - Matrix of spectra (rows = samples, cols = wavelengths) # reference - Optional reference spectrum (default: mean) # # Returns: # corrected - MSC-corrected spectra
# Calculate reference (mean if not provided) if (is.null(reference)) { reference <- colMeans(spectra) }
n_samples <- nrow(spectra) n_wavelengths <- ncol(spectra)
# Initialize output corrected <- matrix(0, nrow = n_samples, ncol = n_wavelengths)
# Correct each spectrum for (i in 1:n_samples) { # Design matrix for linear regression X <- cbind(1, reference)
# Fit: spectrum = a + b * reference fit <- lm.fit(X, spectra[i, ])$coefficients
a <- fit[1] # Offset b <- fit[2] # Slope
# Correct corrected[i, ] <- (spectra[i, ] - a) / b }
return(corrected)}
# Example usageset.seed(42)n_samples <- 20n_wavelengths <- 100wavelengths <- seq(1000, 2500, length.out = n_wavelengths)
# Ideal spectrumideal_spectrum <- 0.5 * exp(-((wavelengths - 1500)^2) / 50000) + 0.3 * exp(-((wavelengths - 2000)^2) / 30000)
# Simulate scatterraw_spectra <- matrix(0, nrow = n_samples, ncol = n_wavelengths)for (i in 1:n_samples) { a <- runif(1, -0.1, 0.1) # Offset b <- runif(1, 0.8, 1.2) # Slope
chemical_var <- ideal_spectrum + rnorm(n_wavelengths, 0, 0.02) raw_spectra[i, ] <- a + b * chemical_var + rnorm(n_wavelengths, 0, 0.01)}
# Apply MSCcorrected_spectra <- msc(raw_spectra)
# Plotpar(mfrow = c(1, 2))
matplot(wavelengths, t(raw_spectra), type = 'l', lty = 1, xlab = 'Wavelength (nm)', ylab = 'Absorbance', main = 'Before MSC')grid()
matplot(wavelengths, t(corrected_spectra), type = 'l', lty = 1, xlab = 'Wavelength (nm)', ylab = 'Absorbance', main = 'After MSC')grid()
cat(sprintf("Std before MSC: %.4f\n", mean(apply(raw_spectra, 2, sd))))cat(sprintf("Std after MSC: %.4f\n", mean(apply(corrected_spectra, 2, sd))))Advantages and limitations
Advantages
✅ Simple and fast — Just linear regression, very efficient
✅ Effective for scatter — Removes multiplicative and additive scatter well
✅ Improves model performance — Reduces physical variation, highlights chemical variation
✅ Widely used — Standard preprocessing for NIR spectroscopy
✅ Easy to interpret — Clear physical meaning (scatter correction)
Limitations
❌ Requires similar samples — Assumes all spectra are variations of the same underlying signal
❌ Reference spectrum matters — Poor choice of reference gives poor correction
❌ Can overcorrect — May remove chemical information if scatter and chemistry are correlated
❌ Doesn’t handle complex scatter — Linear model may be too simple for some samples
❌ Edge effects — Can distort spectra at wavelength extremes
Practical tips
Choosing a reference spectrum:
- Default: Use mean of all spectra (most common)
- Alternative: Use median (more robust to outliers)
- Custom: Use a specific “ideal” sample if you have one
- Never: Use a single random sample (too noisy)
When MSC helps:
- You see “fan-shaped” variation in your spectra (multiplicative scatter)
- Spectra have different baseline offsets
- Principal Component 1 captures scatter, not chemistry
- Your PLS model has too many components (scatter dominates)
When MSC doesn’t help:
- Samples are chemically very different
- Scatter is already minimal (liquid samples)
- You need to preserve absolute intensities
Combining with other methods:
- MSC → derivatives: First correct scatter, then enhance features
- MSC → mean centering: Standard workflow for PLS models
- MSC → SNV: Usually pick one or the other, not both
Common mistakes to avoid
❌ Applying MSC to test data with wrong reference → Always use the same reference as training data
❌ MSC before data splitting → Apply MSC separately to train/test to avoid data leakage
❌ Using MSC on single spectra → MSC needs multiple spectra to calculate a meaningful reference
❌ Combining MSC and SNV → They do similar things; pick one
❌ Ignoring outliers in reference → Outliers skew the mean reference; remove them first
Further reading
-
Barnes, R. J., Dhanoa, M. S., & Lister, S. J. (1989). “Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra”. Applied Spectroscopy, 43(5), 772-777.
- Classic paper on scatter correction methods
-
Rinnan, Å., van den Berg, F., & Engelsen, S. B. (2009). “Review of the most common pre-processing techniques for near-infrared spectra”. TrAC Trends in Analytical Chemistry, 28(10), 1201-1222.
- Comprehensive review including MSC
-
Martens, H., & Næs, T. (1989). Multivariate Calibration. Wiley.
- Foundational textbook covering MSC theory
MSC is one of the most important preprocessing methods in NIR spectroscopy. Master it, and you’ll be equipped to handle the majority of scatter correction needs in chemometric applications.