The mathematics of finite differences — the discrete analogue of differentiation — traces back to Isaac Newton, who developed his forward difference formula in the late 17th century as a tool for interpolating tabulated values. Leonhard Euler extended and formalized these ideas throughout the 18th century, establishing the calculus of finite differences as a branch of mathematics in its own right. But for two centuries, these techniques remained in the domain of pure mathematics and numerical tables. It was not until spectroscopic instruments began producing digital output in the 1960s that practitioners realized finite differences could be applied directly to measured spectra — and that the resulting “derivative spectra” revealed information hidden in the original data.
The pivotal moment came in 1964, when Abraham Savitzky and Marcel Golay published their famous paper on smoothing and differentiation by simplified least squares procedures. Although the paper is most often cited for its smoothing algorithm, its deeper contribution was showing that smooth derivatives could be computed as a natural byproduct of local polynomial fitting. By evaluating the derivative of the fitted polynomial rather than the polynomial itself, one obtains a smoothed derivative in a single convolution step. This solved a critical practical problem: raw finite differences amplify noise so severely that they are often unusable on real spectroscopic data, but Savitzky-Golay derivatives combine differentiation and smoothing into one operation.
Two decades later, Karl Norris and Phil Williams demonstrated the power of derivative spectroscopy specifically for near-infrared (NIR) applications. In their influential 1984 work, they showed that first derivatives remove additive baseline offsets (a constant shift disappears when you compute the slope) and that second derivatives remove linear baseline slopes (a linear trend has zero curvature). Since NIR spectra are notoriously plagued by scattering-induced baseline variations, derivatives became a standard preprocessing step in the NIR community. Norris also popularized the “gap derivative” — computing the difference between points separated by a gap rather than adjacent points — as a practical alternative that provides some inherent smoothing. Today, spectral derivatives remain one of the most widely used preprocessing techniques in chemometrics, applied routinely in NIR, mid-IR, Raman, and UV-Vis spectroscopy.
Why take derivatives of spectra?
Spectral derivatives transform your data in ways that solve several persistent problems in spectroscopy.
Baseline removal. Many spectroscopic techniques produce spectra with unwanted baseline offsets or slopes caused by scattering, instrument drift, or sample positioning. Derivatives handle this elegantly:
A constant baseline offset has zero slope, so the first derivative eliminates it entirely
A linear baseline slope has zero curvature, so the second derivative removes it
More generally, the n-th derivative removes any polynomial baseline of degree n - 1
Resolving overlapping peaks. Broad, overlapping absorption bands are common in spectroscopy, particularly in NIR where overtone and combination bands are inherently wide. The second derivative sharpens these bands by emphasizing curvature, making it easier to identify the position and number of underlying components.
Enhancing subtle features. Small shoulders or inflection points in a spectrum can be invisible to the eye but become prominent in derivative spectra. This makes derivatives a powerful tool for detecting minor components or subtle spectral differences between similar samples.
First derivative
The first derivative of a spectrum represents the rate of change — the slope at each point. Mathematically, for a discrete spectrum sampled at equal intervals Δx , the simplest finite difference approximation is:
dxdyi≈2Δxyi+1−yi−1
This central difference formula uses the points on either side of position i to estimate the slope at i.
What happens to the spectrum?
When you differentiate a spectrum:
Peaks become zero crossings. At the maximum of a peak, the slope is zero (the function transitions from rising to falling). So the first derivative crosses zero at every peak position.
Rising slopes become positive values. Where the original spectrum is increasing, the derivative is positive.
Falling slopes become negative values. Where the spectrum decreases, the derivative is negative.
Constant baselines vanish. A flat offset has zero slope everywhere, so the first derivative removes it completely.
This last property is why the first derivative is so useful for baseline correction: any additive constant offset in the spectrum disappears after differentiation.
Locating band positions
In the original spectrum, a peak maximum is where dy/dx=0 and the derivative changes from positive to negative. This means you can locate band positions by finding the zero crossings of the first derivative — often more precisely than trying to find maxima in a noisy spectrum.
Second derivative
The second derivative represents the curvature at each point. The finite difference approximation is:
dx2d2yi≈(Δx)2yi+1−2yi+yi−1
What happens to the spectrum?
The second derivative transforms spectra in a characteristic way:
Peaks become inverted (negative) peaks. At a maximum, the curvature is negative (concave down), producing a negative peak in the second derivative. This inversion is a hallmark of second-derivative spectroscopy.
Peak sharpening. The second derivative emphasizes the curvature of narrow features. Broad, gentle features produce small second derivatives, while sharp peaks produce large (negative) ones. The result is that the second derivative spectrum has narrower bands than the original.
Linear baselines vanish. A linear function y=a+bx has zero second derivative everywhere. This means the second derivative removes both constant offsets and linear slopes simultaneously.
Overlapping bands become resolved. Two broad, overlapping peaks that appear as a single merged feature in the original spectrum may produce two distinct negative peaks in the second derivative.
The inversion convention
Because peaks appear as negative features in the second derivative, many practitioners plot second-derivative spectra inverted (multiplied by -1) so that peaks point upward as in the original spectrum. Be aware of this convention when reading the literature; always check the axis labels.
Finite differences vs. Savitzky-Golay derivatives
There are three common approaches to computing spectral derivatives, and understanding their differences is essential.
Simple finite differences
The formulas shown above compute derivatives directly from neighboring points. This is the most straightforward approach but has a critical flaw: noise amplification is severe. Each differentiation step acts as a high-pass filter, and the high-frequency noise in your data passes through while the smooth signal is being differentiated. On real spectroscopic data, simple finite differences often produce unusable results.
Savitzky-Golay derivatives
The Savitzky-Golay method fits a local polynomial of order p through a window of w points, then evaluates the derivative of that polynomial. Because the polynomial fitting inherently smooths the data, the resulting derivative is much cleaner than a simple finite difference.
The SG derivative of order d uses precomputed convolution coefficients that depend on the window size w, polynomial order p, and derivative order d. The operation is still a simple convolution (weighted sum), just with different coefficients than for smoothing:
dxdddyi=(Δx)d1j=−m∑mcj(d)⋅yi+j
where m=(w−1)/2 and cj(d) are the SG derivative coefficients.
Key advantage: Smoothing and differentiation happen simultaneously. You do not need a separate smoothing step.
Parameter requirements:
The polynomial order p must be at least as large as the derivative order d (you cannot compute a second derivative from a first-degree polynomial)
Larger windows produce smoother derivatives but may broaden features
Typical settings: p = 2 or 3, w = 7 to 25
Gap derivatives (Norris derivative)
The gap derivative, popularized by Karl Norris, computes the difference between points separated by a gap of g data points:
dxdyi≈g⋅Δxyi+g/2−yi−g/2
The gap serves a dual purpose: it defines both the scale of the derivative and provides inherent smoothing (since the difference is taken over a larger interval, point-to-point noise fluctuations are averaged out). In practice, the gap derivative is often combined with a segment average: points within a segment of s points around each endpoint are averaged before the difference is computed. This is sometimes called the “Norris-Williams” or “segment gap” derivative.
When to use gap derivatives:
When you want a simple, fast derivative with modest smoothing
In NIR spectroscopy, where gap derivatives have a long tradition
When you want to control the spectral scale of the derivative independently from the smoothing
The noise amplification problem
This is the single most important practical issue in derivative spectroscopy. Each differentiation step amplifies noise, and the amplification increases with frequency. A quick way to understand this: if your signal is a smooth sine wave sin(ωx) , its first derivative is ωcos(ωx) , scaled by the frequency ω . High-frequency noise oscillates rapidly (high ω ), so it gets multiplied by a large factor. Low-frequency signal oscillates slowly, so it is barely amplified.
The practical consequences:
First derivative amplifies noise substantially. A spectrum with acceptable noise may produce a first derivative that looks rough.
Second derivative amplifies noise even more. Without adequate smoothing, second derivatives of real data are dominated by noise.
Third and higher derivatives are rarely used precisely because noise amplification becomes unmanageable.
Quantifying the amplification
For white noise with standard deviation σ0 , the noise in a simple finite difference first derivative is approximately:
σ1≈Δx2⋅σ0
For the second derivative:
σ2≈(Δx)26⋅σ0
This shows that noise grows rapidly with derivative order. It also shows that finer spectral resolution (smaller Δx ) makes the problem worse, because you are dividing by smaller intervals.
The solution: smooth before or during differentiation
There are two strategies:
Smooth first, then differentiate. Apply a smoothing filter (moving average, Gaussian, Whittaker) to your spectrum, then compute the derivative of the smoothed data. This works but requires choosing separate parameters for each step.
Use Savitzky-Golay derivatives. This is the standard approach because it handles both operations in a single step. The window size and polynomial order jointly control the smoothing-differentiation tradeoff.
Practical considerations
Choosing derivative order
First derivative is the right choice when:
You need to remove a constant baseline offset
You want to enhance slopes and inflection points
You are building PLS or PCR models and want to remove additive baseline effects
Noise amplification must be kept manageable
Second derivative is preferred when:
You need to remove both offset and linear slope
You want to resolve overlapping peaks
You want sharper bands for qualitative analysis
You have sufficient signal-to-noise to tolerate the extra noise amplification
Third derivative and above are rarely used in practice. The noise amplification becomes extreme, and the spectral interpretation becomes non-intuitive. If your second derivative is not adequate, consider other preprocessing approaches rather than going to higher orders.
Choosing SG derivative parameters
Window size (w). Larger windows produce smoother derivatives but may broaden features. A good starting point:
For first derivatives: w = 9 to 15
For second derivatives: w = 11 to 21 (you need more smoothing because noise is worse)
Adjust based on your spectral resolution and noise level
Polynomial order (p). Must be at least as large as the derivative order. Typical choices:
First derivative: p = 2 (quadratic) or p = 3 (cubic)
Second derivative: p = 2 (quadratic) or p = 3 (cubic)
Higher polynomial orders offer more flexibility but can overfit
Rule of thumb: Use the minimum polynomial order that captures the shape of your features, with a window large enough to suppress noise adequately. Then validate by visual inspection.
Gap size for gap derivatives
For Norris gap derivatives, the gap g plays a similar role to the window size in SG derivatives:
Smaller gap: more detail, more noise
Larger gap: smoother, but may miss narrow features
Common values: g = 5 to 21 for NIR data with 1-2 nm resolution
Validating your derivative
Always check your results:
Visual inspection. Do the derivative features correspond to real spectral features? Or is the derivative dominated by noise?
Zero crossings. For first derivatives, do the zero crossings align with known peak positions?
Peak count. For second derivatives, does the number of resolved bands match what you expect from the chemistry?
Residual structure. If using derivatives as PLS preprocessing, does increasing the derivative order or smoothing improve cross-validation results?
Derivative spectroscopy in chemistry
Near-infrared (NIR) spectroscopy
Derivatives are arguably most important in NIR spectroscopy, where they serve as the primary baseline correction method. NIR spectra are dominated by broad overtone and combination bands with significant overlap, and scattering from solid samples introduces baseline offsets and slopes that vary from sample to sample. First and second derivatives are standard preprocessing steps in nearly every NIR calibration workflow:
First derivative removes sample-to-sample baseline offsets caused by scattering
Second derivative removes both offsets and linear baseline slopes, and sharpens the broad NIR bands
Most published NIR calibration models use SG derivatives as part of the preprocessing pipeline
UV-Vis spectroscopy
In UV-Vis, derivative spectroscopy is used to resolve overlapping electronic absorption bands. Because electronic transitions produce broad bands that often merge, the second derivative (or even fourth derivative) can reveal the number and positions of underlying components. This technique is particularly useful in:
Pharmaceutical analysis (resolving drug components with overlapping UV spectra)
Environmental monitoring (identifying pollutants in complex mixtures)
Protein structural analysis (secondary structure estimation from far-UV CD spectra)
Raman spectroscopy
In Raman spectroscopy, derivatives can help remove fluorescence backgrounds. Fluorescence produces a broad, slowly varying baseline under the sharp Raman peaks. Since the fluorescence background is smooth (low curvature) while Raman peaks are sharp (high curvature), the second derivative suppresses the fluorescence contribution while preserving the Raman features. However, dedicated baseline correction methods (such as AsLS) are often preferred for this purpose.
Mid-infrared (mid-IR) spectroscopy
Mid-IR spectra generally have sharper peaks and less baseline variation than NIR, so derivatives are used less routinely. However, second derivatives can be useful for:
Resolving overlapping bands in the fingerprint region
Detecting minor components in mixtures
Qualitative analysis of protein secondary structure (amide I band analysis)
NIR calibration models
First or second derivatives are standard preprocessing for PLS models on NIR data. They remove scatter-induced baselines and improve model robustness.
Resolving overlapping bands
Second derivatives sharpen peaks and can reveal hidden shoulders or closely spaced absorption features.
Qualitative fingerprinting
Derivative spectra emphasize subtle differences between similar samples, useful for classification and authentication studies.
Removing additive and multiplicative scatter
First derivatives remove constant offsets; second derivatives remove both offsets and linear trends. Often used alongside or instead of MSC/SNV.
Consider alternatives
Very noisy data
If your signal-to-noise ratio is already poor, derivatives will make it worse. Consider smoothing or scatter correction methods instead.
Sharp, well-resolved peaks
If your peaks are already well separated (e.g., high-resolution Raman), derivatives add noise without providing much benefit.
When absolute intensities matter
Derivatives destroy absolute intensity information. If you need absorbance values (not just peak shapes), use baseline correction methods that preserve the scale.
Complex baselines
For strongly curved or non-polynomial baselines, dedicated methods like AsLS may outperform derivatives.
References
[1] Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627-1639.
[2] Norris, K. H., & Williams, P. C. (1984). Optimization of mathematical treatments of raw near-infrared signal in the measurement of protein in hard red spring wheat. I. Influence of particle size. Cereal Chemistry, 61(2), 158-165.
[3] Rinnan, A., van den Berg, F., & Engelsen, S. B. (2009). Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends in Analytical Chemistry, 28(10), 1201-1222.
[4] O’Haver, T. C. (1979). Derivative spectroscopy and its application to the analysis of unresolved spectral bands. Analytical Chemistry, 51(1), 91A-100A.
[5] Talsky, G. (1994). Derivative Spectrophotometry: Low and Higher Order. VCH Publishers.
[6] Steinier, J., Termonia, Y., & Deltour, J. (1972). Smoothing and differentiation of data by simplified least square procedure. Analytical Chemistry, 44(11), 1906-1909.
[7] Martens, H., & Naes, T. (1989). Multivariate Calibration. Wiley.