Skip to content

Extended Multiplicative Scatter Correction (EMSC)

In 1985, Paul Geladi, Douglas MacDougall, and Harald Martens introduced Multiplicative Scatter Correction (MSC) to separate chemical light absorption from physical light scatter in near-infrared reflectance spectra. The method was elegant and effective for many routine applications, but it assumed a simple linear relationship between each spectrum and a reference. By the early 1990s, Martens recognized that real-world spectroscopic data often contained more complex artifacts than basic MSC could handle. In 1991, he and Edward Stark published the first formal extension in the Journal of Pharmaceutical and Biomedical Analysis, adding polynomial baseline terms and spectral interference subtraction to the original MSC framework. This paper laid the theoretical groundwork for what would become EMSC.

The method found its most compelling applications through the work of Achim Kohler at the Norwegian University of Life Sciences (NMBU) in the early 2000s. Kohler was using Fourier transform infrared (FTIR) microspectroscopy to study biological samples, including tissue cross-sections and individual cells. These samples presented spectroscopic challenges that went far beyond simple scatter: complex baseline curvature from varying sample thickness, interference fringes, and strong scattering artifacts caused by the size and shape of cells interacting with mid-infrared light. Basic MSC was not enough. Kohler and Martens collaborated to refine and extend the EMSC framework for these demanding applications, publishing key work on separating physical and chemical information in FTIR microscopy images of biological tissue (2005) and on physics-based scatter correction approaches (2006).

Their collaboration continued to push the boundaries of the method. By the late 2000s, Kohler and colleagues had developed the resonant Mie scattering EMSC (RMieS-EMSC), which incorporated full Mie scattering theory to model the wavelength-dependent distortions caused by spherical and near-spherical biological cells. This variant, later accelerated with GPU computing, became a standard preprocessing tool in biomedical infrared spectroscopy. Nils Kristian Afseth and Kohler published an influential tutorial in Chemometrics and Intelligent Laboratory Systems in 2012 that made the method accessible to a broader audience. Today, EMSC and its variants are among the most widely used preprocessing techniques in vibrational spectroscopy, particularly for biological and biomedical applications where scatter effects are complex and physically meaningful.

Beyond basic MSC: handling complex scatter

Multiplicative Scatter Correction (MSC) works great when scatter is your main problem. But real spectroscopic data often throws additional challenges at you:

  • Polynomial baseline drift due to instrumental artifacts
  • Known interfering compounds that vary across samples
  • Temperature effects causing systematic baseline shifts
  • Physical effects that can’t be captured by simple linear scaling

EMSC (Extended Multiplicative Scatter Correction) extends the basic MSC model to handle these complications. Instead of just fitting a line to a reference spectrum, EMSC fits a more complex model that includes polynomial baselines and optional interference terms.

The EMSC model: MSC plus extra terms

Basic MSC assumes:

EMSC extends this with additional terms:

Where:

  • ai, bi = same scatter coefficients as MSC (offset and slope)
  • zk = known interference spectra (e.g., water, CO₂)
  • ci,k = coefficients for each interference
  • λj = polynomial baseline terms (wavelength to power j)
  • di,j = polynomial coefficients

The correction removes all these effects:

This gives you a spectrum with scatter, interferences, and polynomial drift all removed.

When EMSC helps more than MSC

Use EMSC instead of basic MSC when you have:

Polynomial baseline drift:

  • Instrument baseline wander (common in FTIR)
  • Temperature-induced baseline shifts
  • Strong curvature that MSC can’t capture
  • Diffuse reflectance with non-linear scatter effects

Known interferences:

  • Water vapor absorption in NIR
  • CO₂ bands in FTIR
  • Packaging material signals in food analysis
  • Solvent peaks in solution-state measurements

Complex sample matrices:

  • Biological tissues with varying water content
  • Pharmaceutical tablets with excipients
  • Foods with fat/moisture variations
  • Environmental samples with matrix effects

Stick with basic MSC when:

  • Scatter is linear and dominant
  • No systematic baseline drift
  • No known interfering compounds
  • Simpler is better for your application

The algorithm

EMSC uses multiple linear regression with the reference spectrum and additional terms as predictors.

Step 1: Calculate reference spectrum

Same as MSC—usually the mean:

Step 2: Build the design matrix

For polynomial order J and K interferences:

Each column is a predictor:

  • Column 1: ones (for intercept ai)
  • Column 2: reference spectrum (for slope bi)
  • Columns 3 to K+2: interference spectra
  • Remaining columns: wavelength polynomials

Step 3: Fit regression for each spectrum

Solve the least-squares problem:

This gives you all coefficients: [ai, bi, ci,1, …, ci,K, di,1, …, di,J]

Step 4: Correct the spectrum

Remove everything except the reference-scaled signal:

Code examples

import numpy as np
import matplotlib.pyplot as plt
def emsc(spectra, wavelengths, reference=None, poly_order=2, interferences=None):
"""
Extended Multiplicative Scatter Correction.
Parameters:
-----------
spectra : array-like, shape (n_samples, n_wavelengths)
Input spectra to be corrected
wavelengths : array-like, shape (n_wavelengths,)
Wavelength values
reference : array-like, shape (n_wavelengths,), optional
Reference spectrum. If None, uses mean of all spectra.
poly_order : int, default=2
Order of polynomial baseline (0 = none, 1 = linear, 2 = quadratic, etc.)
interferences : list of arrays, optional
Known interference spectra to be removed
Returns:
--------
corrected : array
EMSC-corrected spectra, same shape as input
"""
spectra = np.asarray(spectra)
wavelengths = np.asarray(wavelengths)
# Calculate reference spectrum
if reference is None:
reference = np.mean(spectra, axis=0)
n_samples, n_wavelengths = spectra.shape
# Normalize wavelengths to [-1, 1] for numerical stability
wl_norm = 2 * (wavelengths - wavelengths.min()) / (wavelengths.max() - wavelengths.min()) - 1
# Build design matrix
# Start with intercept and reference
X = [np.ones(n_wavelengths), reference]
# Add interference spectra
if interferences is not None:
for interference in interferences:
X.append(np.asarray(interference))
# Add polynomial terms
for j in range(1, poly_order + 1):
X.append(wl_norm ** j)
X = np.column_stack(X)
# Correct each spectrum
corrected = np.zeros_like(spectra)
for i in range(n_samples):
# Fit: spectrum = X @ beta
beta = np.linalg.lstsq(X, spectra[i], rcond=None)[0]
a = beta[0] # Intercept
b = beta[1] # Reference scaling
# Reconstruct interference and polynomial contributions
n_interference = len(interferences) if interferences is not None else 0
interference_contrib = np.sum([beta[2 + k] * X[:, 2 + k]
for k in range(n_interference)], axis=0) if n_interference > 0 else 0
poly_contrib = np.sum([beta[2 + n_interference + j - 1] * X[:, 2 + n_interference + j - 1]
for j in range(1, poly_order + 1)], axis=0) if poly_order > 0 else 0
# Correct: remove offset, interferences, and polynomial baseline
corrected[i] = (spectra[i] - a - interference_contrib - poly_contrib) / b
return corrected
# Example: NIR spectra with scatter + polynomial baseline
np.random.seed(42)
n_samples = 20
n_wavelengths = 100
wavelengths = np.linspace(1000, 2500, n_wavelengths)
# Ideal spectrum (chemical signal)
ideal_spectrum = (0.5 * np.exp(-((wavelengths - 1500)**2) / 50000) +
0.3 * np.exp(-((wavelengths - 2000)**2) / 30000))
# Simulate complex effects
raw_spectra = []
for i in range(n_samples):
# Random scatter (MSC part)
a = np.random.uniform(-0.1, 0.1)
b = np.random.uniform(0.8, 1.2)
# Random polynomial baseline drift
p0 = np.random.uniform(-0.05, 0.05)
p1 = np.random.uniform(-0.00002, 0.00002)
p2 = np.random.uniform(-5e-9, 5e-9)
poly_baseline = p0 + p1 * wavelengths + p2 * wavelengths**2
# Chemical variation
chemical_var = ideal_spectrum + np.random.normal(0, 0.01, n_wavelengths)
# Combine: scatter + baseline drift + noise
spectrum = a + b * chemical_var + poly_baseline + np.random.normal(0, 0.01, n_wavelengths)
raw_spectra.append(spectrum)
raw_spectra = np.array(raw_spectra)
# Apply EMSC with 2nd-order polynomial
corrected_spectra = emsc(raw_spectra, wavelengths, poly_order=2)
# Plot comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Before EMSC
for i in range(n_samples):
ax1.plot(wavelengths, raw_spectra[i], alpha=0.6, linewidth=1)
ax1.set_xlabel('Wavelength (nm)')
ax1.set_ylabel('Absorbance')
ax1.set_title('Before EMSC (scatter + baseline drift)')
ax1.grid(True, alpha=0.3)
# After EMSC
for i in range(n_samples):
ax2.plot(wavelengths, corrected_spectra[i], alpha=0.6, linewidth=1)
ax2.set_xlabel('Wavelength (nm)')
ax2.set_ylabel('Absorbance')
ax2.set_title('After EMSC (corrected)')
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Compare variability
print(f"Std before EMSC: {np.std(raw_spectra, axis=0).mean():.4f}")
print(f"Std after EMSC: {np.std(corrected_spectra, axis=0).mean():.4f}")

Choosing polynomial order

The polynomial order controls how complex the baseline correction can be:

Order 0 (none): Standard MSC—just offset and slope

  • Use when baseline is already flat

Order 1 (linear): MSC + linear baseline drift

  • Use for gradual baseline tilt
  • Common in transmission spectroscopy

Order 2 (quadratic): MSC + curved baseline

  • Use for parabolic baseline curvature
  • Most common choice for NIR diffuse reflectance

Order 3 (cubic): MSC + more complex curvature

  • Use for strong non-linear baselines
  • Common in biological samples

Higher orders (>3): Rarely needed

  • Risk overfitting and removing chemical information
  • Only use if you have strong evidence of complex drift

General advice: Start with order 2. Increase only if you see clear residual baseline structure after correction.

Adding interference spectra

If you know certain compounds interfere across all samples (but with varying amounts), include their pure spectra in the EMSC model.

Example: Water vapor in NIR

# You have a reference water spectrum
water_spectrum = load_pure_water_spectrum()
# Apply EMSC with water as interference
corrected = emsc(spectra, wavelengths, interferences=[water_spectrum])

Common interferences to model:

  • Water vapor (NIR, FTIR)
  • CO₂ absorption (FTIR)
  • Packaging material (transmission through containers)
  • Solvent peaks (solution-state NMR, IR)
  • Excipients (pharmaceutical tablets)

When to include interferences:

  • You have a pure reference spectrum of the interference
  • The interference varies in intensity across samples
  • You want to remove the interference, not just correct for scatter

When not to include:

  • The interference is constant across all samples (it’s already in the reference)
  • You don’t have a pure reference spectrum
  • The interference is part of what you want to measure

Advantages and limitations

Advantages

Handles complex scatter — Polynomial baselines capture non-linear effects

Removes known interferences — Explicitly models unwanted spectral contributions

More flexible than MSC — Can adapt to more challenging datasets

Still interpretable — Each term has clear physical meaning

Improves model performance — Especially when baseline drift dominates

Limitations

More parameters = more risk — Can overfit if polynomial order is too high

Requires all spectra together — Can’t apply to single spectra (same as MSC)

Reference spectrum still matters — Poor reference gives poor correction

Polynomial edges can be unstable — Correction may distort spectral extremes

Slower than MSC — More complex regression, especially with interferences

Need pure interference spectra — May not be available for complex mixtures

Practical tips

Choosing polynomial order:

  • Default: Start with order 2
  • Check residuals: If baseline remains after correction, increase order
  • Don’t go above 3-4 unless you have a very good reason
  • Cross-validate to avoid overfitting

Using interference spectra:

  • Ensure interference spectra have same wavelength range as your data
  • Normalize interference spectra to similar intensity scale
  • Don’t include too many interferences (risk overfitting)
  • Verify interferences actually improve cross-validation performance

EMSC vs MSC decision:

  • Try MSC first—it’s simpler and often sufficient
  • Use EMSC if MSC leaves visible baseline curvature
  • Use EMSC if you have known, varying interferences
  • Compare cross-validation performance to justify the complexity

Computational efficiency:

  • EMSC is slower than MSC due to larger regression problem
  • For large datasets, consider parallelizing the loop over samples
  • Pre-compute the normal equations (XTX)-1XT if reference doesn’t change

Test set correction:

  • Use the same reference and same interference spectra as training data
  • Apply EMSC to test data with these fixed references
  • Never recompute the reference from test data (causes data leakage)

Common mistakes to avoid

Using too high polynomial order → Overfitting removes chemical information

Including interferences that aren’t actually present → Introduces artifacts

Recomputing reference for test data → Data leakage breaks validation

Combining EMSC with SNV → They correct similar effects; pick one

Using unnormalized wavelengths in polynomials → Numerical instability in regression

Applying EMSC before data splitting → Apply separately to train/test

Not checking if EMSC helps → Compare cross-validation performance vs simpler methods

EMSC variations

Standard EMSC: Polynomial baseline + optional interferences (what we’ve described)

Constituent EMSC: Models specific target analytes explicitly to preserve their signal

Adaptive EMSC: Automatically selects polynomial order for each spectrum

Orthogonal EMSC: Uses orthogonal polynomials (Legendre, Chebyshev) for numerical stability

Weighted EMSC: Downweights spectral regions with low signal-to-noise

Most of these are research extensions. Standard EMSC is what you’ll use in practice.

Further reading

  • Martens, H., & Stark, E. (1991). “Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy”. Journal of Pharmaceutical and Biomedical Analysis, 9(8), 625-635.

    • Original EMSC paper
  • Afseth, N. K., & Kohler, A. (2012). “Extended multiplicative signal correction in vibrational spectroscopy, a tutorial”. Chemometrics and Intelligent Laboratory Systems, 117, 92-99.

    • Excellent tutorial with practical examples
  • Rinnan, Å., van den Berg, F., & Engelsen, S. B. (2009). “Review of the most common pre-processing techniques for near-infrared spectra”. TrAC Trends in Analytical Chemistry, 28(10), 1201-1222.

    • Comprehensive review comparing EMSC with other methods

EMSC is a powerful tool when basic MSC isn’t enough. Use it when your spectra have complex baseline drift or known interferences, but always validate that the added complexity actually improves your results.