Extended Multiplicative Scatter Correction (EMSC)
In 1985, Paul Geladi, Douglas MacDougall, and Harald Martens introduced Multiplicative Scatter Correction (MSC) to separate chemical light absorption from physical light scatter in near-infrared reflectance spectra. The method was elegant and effective for many routine applications, but it assumed a simple linear relationship between each spectrum and a reference. By the early 1990s, Martens recognized that real-world spectroscopic data often contained more complex artifacts than basic MSC could handle. In 1991, he and Edward Stark published the first formal extension in the Journal of Pharmaceutical and Biomedical Analysis, adding polynomial baseline terms and spectral interference subtraction to the original MSC framework. This paper laid the theoretical groundwork for what would become EMSC.
The method found its most compelling applications through the work of Achim Kohler at the Norwegian University of Life Sciences (NMBU) in the early 2000s. Kohler was using Fourier transform infrared (FTIR) microspectroscopy to study biological samples, including tissue cross-sections and individual cells. These samples presented spectroscopic challenges that went far beyond simple scatter: complex baseline curvature from varying sample thickness, interference fringes, and strong scattering artifacts caused by the size and shape of cells interacting with mid-infrared light. Basic MSC was not enough. Kohler and Martens collaborated to refine and extend the EMSC framework for these demanding applications, publishing key work on separating physical and chemical information in FTIR microscopy images of biological tissue (2005) and on physics-based scatter correction approaches (2006).
Their collaboration continued to push the boundaries of the method. By the late 2000s, Kohler and colleagues had developed the resonant Mie scattering EMSC (RMieS-EMSC), which incorporated full Mie scattering theory to model the wavelength-dependent distortions caused by spherical and near-spherical biological cells. This variant, later accelerated with GPU computing, became a standard preprocessing tool in biomedical infrared spectroscopy. Nils Kristian Afseth and Kohler published an influential tutorial in Chemometrics and Intelligent Laboratory Systems in 2012 that made the method accessible to a broader audience. Today, EMSC and its variants are among the most widely used preprocessing techniques in vibrational spectroscopy, particularly for biological and biomedical applications where scatter effects are complex and physically meaningful.
Beyond basic MSC: handling complex scatter
Multiplicative Scatter Correction (MSC) works great when scatter is your main problem. But real spectroscopic data often throws additional challenges at you:
- Polynomial baseline drift due to instrumental artifacts
- Known interfering compounds that vary across samples
- Temperature effects causing systematic baseline shifts
- Physical effects that can’t be captured by simple linear scaling
EMSC (Extended Multiplicative Scatter Correction) extends the basic MSC model to handle these complications. Instead of just fitting a line to a reference spectrum, EMSC fits a more complex model that includes polynomial baselines and optional interference terms.
The EMSC model: MSC plus extra terms
Basic MSC assumes:
EMSC extends this with additional terms:
Where:
- ai, bi = same scatter coefficients as MSC (offset and slope)
- zk = known interference spectra (e.g., water, CO₂)
- ci,k = coefficients for each interference
- λj = polynomial baseline terms (wavelength to power j)
- di,j = polynomial coefficients
The correction removes all these effects:
This gives you a spectrum with scatter, interferences, and polynomial drift all removed.
When EMSC helps more than MSC
Use EMSC instead of basic MSC when you have:
Polynomial baseline drift:
- Instrument baseline wander (common in FTIR)
- Temperature-induced baseline shifts
- Strong curvature that MSC can’t capture
- Diffuse reflectance with non-linear scatter effects
Known interferences:
- Water vapor absorption in NIR
- CO₂ bands in FTIR
- Packaging material signals in food analysis
- Solvent peaks in solution-state measurements
Complex sample matrices:
- Biological tissues with varying water content
- Pharmaceutical tablets with excipients
- Foods with fat/moisture variations
- Environmental samples with matrix effects
Stick with basic MSC when:
- Scatter is linear and dominant
- No systematic baseline drift
- No known interfering compounds
- Simpler is better for your application
The algorithm
EMSC uses multiple linear regression with the reference spectrum and additional terms as predictors.
Step 1: Calculate reference spectrum
Same as MSC—usually the mean:
Step 2: Build the design matrix
For polynomial order J and K interferences:
Each column is a predictor:
- Column 1: ones (for intercept ai)
- Column 2: reference spectrum (for slope bi)
- Columns 3 to K+2: interference spectra
- Remaining columns: wavelength polynomials
Step 3: Fit regression for each spectrum
Solve the least-squares problem:
This gives you all coefficients: [ai, bi, ci,1, …, ci,K, di,1, …, di,J]
Step 4: Correct the spectrum
Remove everything except the reference-scaled signal:
Code examples
import numpy as npimport matplotlib.pyplot as plt
def emsc(spectra, wavelengths, reference=None, poly_order=2, interferences=None): """ Extended Multiplicative Scatter Correction.
Parameters: ----------- spectra : array-like, shape (n_samples, n_wavelengths) Input spectra to be corrected wavelengths : array-like, shape (n_wavelengths,) Wavelength values reference : array-like, shape (n_wavelengths,), optional Reference spectrum. If None, uses mean of all spectra. poly_order : int, default=2 Order of polynomial baseline (0 = none, 1 = linear, 2 = quadratic, etc.) interferences : list of arrays, optional Known interference spectra to be removed
Returns: -------- corrected : array EMSC-corrected spectra, same shape as input """ spectra = np.asarray(spectra) wavelengths = np.asarray(wavelengths)
# Calculate reference spectrum if reference is None: reference = np.mean(spectra, axis=0)
n_samples, n_wavelengths = spectra.shape
# Normalize wavelengths to [-1, 1] for numerical stability wl_norm = 2 * (wavelengths - wavelengths.min()) / (wavelengths.max() - wavelengths.min()) - 1
# Build design matrix # Start with intercept and reference X = [np.ones(n_wavelengths), reference]
# Add interference spectra if interferences is not None: for interference in interferences: X.append(np.asarray(interference))
# Add polynomial terms for j in range(1, poly_order + 1): X.append(wl_norm ** j)
X = np.column_stack(X)
# Correct each spectrum corrected = np.zeros_like(spectra)
for i in range(n_samples): # Fit: spectrum = X @ beta beta = np.linalg.lstsq(X, spectra[i], rcond=None)[0]
a = beta[0] # Intercept b = beta[1] # Reference scaling
# Reconstruct interference and polynomial contributions n_interference = len(interferences) if interferences is not None else 0 interference_contrib = np.sum([beta[2 + k] * X[:, 2 + k] for k in range(n_interference)], axis=0) if n_interference > 0 else 0 poly_contrib = np.sum([beta[2 + n_interference + j - 1] * X[:, 2 + n_interference + j - 1] for j in range(1, poly_order + 1)], axis=0) if poly_order > 0 else 0
# Correct: remove offset, interferences, and polynomial baseline corrected[i] = (spectra[i] - a - interference_contrib - poly_contrib) / b
return corrected
# Example: NIR spectra with scatter + polynomial baselinenp.random.seed(42)n_samples = 20n_wavelengths = 100wavelengths = np.linspace(1000, 2500, n_wavelengths)
# Ideal spectrum (chemical signal)ideal_spectrum = (0.5 * np.exp(-((wavelengths - 1500)**2) / 50000) + 0.3 * np.exp(-((wavelengths - 2000)**2) / 30000))
# Simulate complex effectsraw_spectra = []for i in range(n_samples): # Random scatter (MSC part) a = np.random.uniform(-0.1, 0.1) b = np.random.uniform(0.8, 1.2)
# Random polynomial baseline drift p0 = np.random.uniform(-0.05, 0.05) p1 = np.random.uniform(-0.00002, 0.00002) p2 = np.random.uniform(-5e-9, 5e-9) poly_baseline = p0 + p1 * wavelengths + p2 * wavelengths**2
# Chemical variation chemical_var = ideal_spectrum + np.random.normal(0, 0.01, n_wavelengths)
# Combine: scatter + baseline drift + noise spectrum = a + b * chemical_var + poly_baseline + np.random.normal(0, 0.01, n_wavelengths) raw_spectra.append(spectrum)
raw_spectra = np.array(raw_spectra)
# Apply EMSC with 2nd-order polynomialcorrected_spectra = emsc(raw_spectra, wavelengths, poly_order=2)
# Plot comparisonfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Before EMSCfor i in range(n_samples): ax1.plot(wavelengths, raw_spectra[i], alpha=0.6, linewidth=1)ax1.set_xlabel('Wavelength (nm)')ax1.set_ylabel('Absorbance')ax1.set_title('Before EMSC (scatter + baseline drift)')ax1.grid(True, alpha=0.3)
# After EMSCfor i in range(n_samples): ax2.plot(wavelengths, corrected_spectra[i], alpha=0.6, linewidth=1)ax2.set_xlabel('Wavelength (nm)')ax2.set_ylabel('Absorbance')ax2.set_title('After EMSC (corrected)')ax2.grid(True, alpha=0.3)
plt.tight_layout()plt.show()
# Compare variabilityprint(f"Std before EMSC: {np.std(raw_spectra, axis=0).mean():.4f}")print(f"Std after EMSC: {np.std(corrected_spectra, axis=0).mean():.4f}")function corrected = emsc(spectra, wavelengths, varargin) % Extended Multiplicative Scatter Correction % % Parameters: % spectra - Matrix of spectra (rows = samples, cols = wavelengths) % wavelengths - Wavelength vector % 'Reference' - Optional reference spectrum (default: mean) % 'PolyOrder' - Polynomial order (default: 2) % 'Interferences' - Cell array of interference spectra (default: {}) % % Returns: % corrected - EMSC-corrected spectra
% Parse inputs p = inputParser; addParameter(p, 'Reference', []); addParameter(p, 'PolyOrder', 2); addParameter(p, 'Interferences', {}); parse(p, varargin{:});
reference = p.Results.Reference; poly_order = p.Results.PolyOrder; interferences = p.Results.Interferences;
[n_samples, n_wavelengths] = size(spectra);
% Calculate reference if isempty(reference) reference = mean(spectra, 1); end
% Normalize wavelengths to [-1, 1] wl_norm = 2 * (wavelengths - min(wavelengths)) / (max(wavelengths) - min(wavelengths)) - 1;
% Build design matrix X = [ones(n_wavelengths, 1), reference(:)];
% Add interference spectra for k = 1:length(interferences) X = [X, interferences{k}(:)]; end
% Add polynomial terms for j = 1:poly_order X = [X, wl_norm(:).^j]; end
% Correct each spectrum corrected = zeros(size(spectra));
for i = 1:n_samples % Fit: spectrum = X * beta beta = X \ spectra(i, :)';
a = beta(1); % Intercept b = beta(2); % Reference scaling
% Reconstruct unwanted contributions n_interference = length(interferences); unwanted = a;
for k = 1:n_interference unwanted = unwanted + beta(2 + k) * X(:, 2 + k)'; end
for j = 1:poly_order unwanted = unwanted + beta(2 + n_interference + j) * X(:, 2 + n_interference + j)'; end
% Correct corrected(i, :) = (spectra(i, :) - unwanted) / b; endend
% Example usagerng(42);n_samples = 20;n_wavelengths = 100;wavelengths = linspace(1000, 2500, n_wavelengths);
% Ideal spectrumideal_spectrum = 0.5 * exp(-((wavelengths - 1500).^2) / 50000) + ... 0.3 * exp(-((wavelengths - 2000).^2) / 30000);
% Simulate scatter + polynomial baselineraw_spectra = zeros(n_samples, n_wavelengths);for i = 1:n_samples a = rand() * 0.2 - 0.1; b = rand() * 0.4 + 0.8;
% Polynomial baseline p0 = rand() * 0.1 - 0.05; p1 = rand() * 0.00004 - 0.00002; p2 = rand() * 1e-8 - 5e-9; poly_baseline = p0 + p1 * wavelengths + p2 * wavelengths.^2;
chemical_var = ideal_spectrum + randn(1, n_wavelengths) * 0.01; raw_spectra(i, :) = a + b * chemical_var + poly_baseline + randn(1, n_wavelengths) * 0.01;end
% Apply EMSCcorrected_spectra = emsc(raw_spectra, wavelengths, 'PolyOrder', 2);
% Plotfigure;subplot(1, 2, 1);plot(wavelengths, raw_spectra', 'LineWidth', 1);xlabel('Wavelength (nm)');ylabel('Absorbance');title('Before EMSC');grid on;
subplot(1, 2, 2);plot(wavelengths, corrected_spectra', 'LineWidth', 1);xlabel('Wavelength (nm)');ylabel('Absorbance');title('After EMSC');grid on;
fprintf('Std before EMSC: %.4f\n', mean(std(raw_spectra, 0, 1)));fprintf('Std after EMSC: %.4f\n', mean(std(corrected_spectra, 0, 1)));emsc <- function(spectra, wavelengths, reference = NULL, poly_order = 2, interferences = NULL) { # Extended Multiplicative Scatter Correction # # Parameters: # spectra - Matrix of spectra (rows = samples, cols = wavelengths) # wavelengths - Wavelength vector # reference - Optional reference spectrum (default: mean) # poly_order - Polynomial order (default: 2) # interferences - List of interference spectra (default: NULL) # # Returns: # corrected - EMSC-corrected spectra
# Calculate reference if (is.null(reference)) { reference <- colMeans(spectra) }
n_samples <- nrow(spectra) n_wavelengths <- ncol(spectra)
# Normalize wavelengths to [-1, 1] wl_norm <- 2 * (wavelengths - min(wavelengths)) / (max(wavelengths) - min(wavelengths)) - 1
# Build design matrix X <- cbind(1, reference)
# Add interference spectra if (!is.null(interferences)) { for (interference in interferences) { X <- cbind(X, interference) } }
# Add polynomial terms if (poly_order > 0) { for (j in 1:poly_order) { X <- cbind(X, wl_norm^j) } }
# Correct each spectrum corrected <- matrix(0, nrow = n_samples, ncol = n_wavelengths)
for (i in 1:n_samples) { # Fit: spectrum = X * beta fit <- lm.fit(X, spectra[i, ]) beta <- fit$coefficients
a <- beta[1] # Intercept b <- beta[2] # Reference scaling
# Reconstruct unwanted contributions n_interference <- if (!is.null(interferences)) length(interferences) else 0 unwanted <- a
if (n_interference > 0) { for (k in 1:n_interference) { unwanted <- unwanted + beta[2 + k] * X[, 2 + k] } }
if (poly_order > 0) { for (j in 1:poly_order) { unwanted <- unwanted + beta[2 + n_interference + j] * X[, 2 + n_interference + j] } }
# Correct corrected[i, ] <- (spectra[i, ] - unwanted) / b }
return(corrected)}
# Example usageset.seed(42)n_samples <- 20n_wavelengths <- 100wavelengths <- seq(1000, 2500, length.out = n_wavelengths)
# Ideal spectrumideal_spectrum <- 0.5 * exp(-((wavelengths - 1500)^2) / 50000) + 0.3 * exp(-((wavelengths - 2000)^2) / 30000)
# Simulate scatter + polynomial baselineraw_spectra <- matrix(0, nrow = n_samples, ncol = n_wavelengths)for (i in 1:n_samples) { a <- runif(1, -0.1, 0.1) b <- runif(1, 0.8, 1.2)
# Polynomial baseline p0 <- runif(1, -0.05, 0.05) p1 <- runif(1, -0.00002, 0.00002) p2 <- runif(1, -5e-9, 5e-9) poly_baseline <- p0 + p1 * wavelengths + p2 * wavelengths^2
chemical_var <- ideal_spectrum + rnorm(n_wavelengths, 0, 0.01) raw_spectra[i, ] <- a + b * chemical_var + poly_baseline + rnorm(n_wavelengths, 0, 0.01)}
# Apply EMSCcorrected_spectra <- emsc(raw_spectra, wavelengths, poly_order = 2)
# Plotpar(mfrow = c(1, 2))
matplot(wavelengths, t(raw_spectra), type = 'l', lty = 1, xlab = 'Wavelength (nm)', ylab = 'Absorbance', main = 'Before EMSC')grid()
matplot(wavelengths, t(corrected_spectra), type = 'l', lty = 1, xlab = 'Wavelength (nm)', ylab = 'Absorbance', main = 'After EMSC')grid()
cat(sprintf("Std before EMSC: %.4f\n", mean(apply(raw_spectra, 2, sd))))cat(sprintf("Std after EMSC: %.4f\n", mean(apply(corrected_spectra, 2, sd))))Choosing polynomial order
The polynomial order controls how complex the baseline correction can be:
Order 0 (none): Standard MSC—just offset and slope
- Use when baseline is already flat
Order 1 (linear): MSC + linear baseline drift
- Use for gradual baseline tilt
- Common in transmission spectroscopy
Order 2 (quadratic): MSC + curved baseline
- Use for parabolic baseline curvature
- Most common choice for NIR diffuse reflectance
Order 3 (cubic): MSC + more complex curvature
- Use for strong non-linear baselines
- Common in biological samples
Higher orders (>3): Rarely needed
- Risk overfitting and removing chemical information
- Only use if you have strong evidence of complex drift
General advice: Start with order 2. Increase only if you see clear residual baseline structure after correction.
Adding interference spectra
If you know certain compounds interfere across all samples (but with varying amounts), include their pure spectra in the EMSC model.
Example: Water vapor in NIR
# You have a reference water spectrumwater_spectrum = load_pure_water_spectrum()
# Apply EMSC with water as interferencecorrected = emsc(spectra, wavelengths, interferences=[water_spectrum])Common interferences to model:
- Water vapor (NIR, FTIR)
- CO₂ absorption (FTIR)
- Packaging material (transmission through containers)
- Solvent peaks (solution-state NMR, IR)
- Excipients (pharmaceutical tablets)
When to include interferences:
- You have a pure reference spectrum of the interference
- The interference varies in intensity across samples
- You want to remove the interference, not just correct for scatter
When not to include:
- The interference is constant across all samples (it’s already in the reference)
- You don’t have a pure reference spectrum
- The interference is part of what you want to measure
Advantages and limitations
Advantages
✅ Handles complex scatter — Polynomial baselines capture non-linear effects
✅ Removes known interferences — Explicitly models unwanted spectral contributions
✅ More flexible than MSC — Can adapt to more challenging datasets
✅ Still interpretable — Each term has clear physical meaning
✅ Improves model performance — Especially when baseline drift dominates
Limitations
❌ More parameters = more risk — Can overfit if polynomial order is too high
❌ Requires all spectra together — Can’t apply to single spectra (same as MSC)
❌ Reference spectrum still matters — Poor reference gives poor correction
❌ Polynomial edges can be unstable — Correction may distort spectral extremes
❌ Slower than MSC — More complex regression, especially with interferences
❌ Need pure interference spectra — May not be available for complex mixtures
Practical tips
Choosing polynomial order:
- Default: Start with order 2
- Check residuals: If baseline remains after correction, increase order
- Don’t go above 3-4 unless you have a very good reason
- Cross-validate to avoid overfitting
Using interference spectra:
- Ensure interference spectra have same wavelength range as your data
- Normalize interference spectra to similar intensity scale
- Don’t include too many interferences (risk overfitting)
- Verify interferences actually improve cross-validation performance
EMSC vs MSC decision:
- Try MSC first—it’s simpler and often sufficient
- Use EMSC if MSC leaves visible baseline curvature
- Use EMSC if you have known, varying interferences
- Compare cross-validation performance to justify the complexity
Computational efficiency:
- EMSC is slower than MSC due to larger regression problem
- For large datasets, consider parallelizing the loop over samples
- Pre-compute the normal equations (XTX)-1XT if reference doesn’t change
Test set correction:
- Use the same reference and same interference spectra as training data
- Apply EMSC to test data with these fixed references
- Never recompute the reference from test data (causes data leakage)
Common mistakes to avoid
❌ Using too high polynomial order → Overfitting removes chemical information
❌ Including interferences that aren’t actually present → Introduces artifacts
❌ Recomputing reference for test data → Data leakage breaks validation
❌ Combining EMSC with SNV → They correct similar effects; pick one
❌ Using unnormalized wavelengths in polynomials → Numerical instability in regression
❌ Applying EMSC before data splitting → Apply separately to train/test
❌ Not checking if EMSC helps → Compare cross-validation performance vs simpler methods
EMSC variations
Standard EMSC: Polynomial baseline + optional interferences (what we’ve described)
Constituent EMSC: Models specific target analytes explicitly to preserve their signal
Adaptive EMSC: Automatically selects polynomial order for each spectrum
Orthogonal EMSC: Uses orthogonal polynomials (Legendre, Chebyshev) for numerical stability
Weighted EMSC: Downweights spectral regions with low signal-to-noise
Most of these are research extensions. Standard EMSC is what you’ll use in practice.
Further reading
-
Martens, H., & Stark, E. (1991). “Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy”. Journal of Pharmaceutical and Biomedical Analysis, 9(8), 625-635.
- Original EMSC paper
-
Afseth, N. K., & Kohler, A. (2012). “Extended multiplicative signal correction in vibrational spectroscopy, a tutorial”. Chemometrics and Intelligent Laboratory Systems, 117, 92-99.
- Excellent tutorial with practical examples
-
Rinnan, Å., van den Berg, F., & Engelsen, S. B. (2009). “Review of the most common pre-processing techniques for near-infrared spectra”. TrAC Trends in Analytical Chemistry, 28(10), 1201-1222.
- Comprehensive review comparing EMSC with other methods
EMSC is a powerful tool when basic MSC isn’t enough. Use it when your spectra have complex baseline drift or known interferences, but always validate that the added complexity actually improves your results.