Centering and Scaling

The essential first step in chemometric preprocessing - bringing variables to a common ground

The idea of standardizing measurements to make them comparable has roots that reach back to the late 19th century. In 1893, Karl Pearson formalized the concept of the standard score (later called the z-score) in his work on statistical distributions, building on earlier ideas by Francis Galton about regression and deviation from the mean. Pearson's insight was straightforward: if you subtract the mean and divide by the standard deviation, any variable -- regardless of its original units or scale -- becomes a dimensionless quantity with zero mean and unit variance. This allowed statisticians to compare heights measured in inches with weights measured in pounds, or temperatures in Fahrenheit with pressures in atmospheres. The z-score became one of the most fundamental operations in all of statistics.

When chemometrics emerged as a discipline in the 1970s and 1980s, centering and scaling took on a new urgency. The multivariate methods at the heart of chemometrics -- Principal Component Analysis (PCA), Partial Least Squares (PLS), and their many variants -- are built on covariance and correlation structures. Svante Wold and Bruce Kowalski, two of the founding figures of chemometrics, emphasized from the beginning that mean centering was not optional for PCA: without it, the first principal component simply captures the mean spectrum rather than the variation between samples, which is the whole point of the analysis. Scaling, meanwhile, became necessary as analytical chemistry increasingly combined data from different instruments (NIR, Raman, GC, HPLC) or different types of measurements (concentrations, temperatures, pressures) into a single data matrix.

Over the decades, the chemometrics community developed several scaling approaches tailored to different data types and analytical goals. Autoscaling (unit variance scaling) became the standard for mixed-unit data. Pareto scaling, introduced by the metabolomics community in the early 2000s, offered a middle ground that reduced the dominance of large variables without amplifying noise as aggressively as autoscaling. Range scaling found its niche in process monitoring, where variables naturally have defined operating ranges. The choice of scaling method is not a minor technical detail -- it fundamentally shapes what patterns a multivariate model can find.

Why preprocessing starts with centering and scaling

Before applying PCA, PLS, or any multivariate method, you need to ask a basic question: are your variables on comparable scales?

In analytical chemistry, the answer is almost always no. A typical data matrix might contain:

NIR absorbance values ranging from 0.01 to 2.5
Temperature readings from 20 to 180 degrees Celsius
Moisture content from 0.1% to 15%
Raman intensities from 100 to 50,000 counts

When these variables are combined into a single matrix, the ones with the largest absolute values and the largest variance dominate the analysis. PCA, for example, finds directions of maximum variance in the data. If Raman intensity varies over a range of 50,000 while moisture varies over a range of 15, the first principal component will almost entirely describe Raman intensity variations -- not because Raman is more chemically important, but simply because its numbers are bigger.

This is the fundamental problem that centering and scaling solve. They bring variables to a common ground so that the multivariate analysis reflects chemical information rather than arbitrary measurement scales.

Mean centering

Mean centering is the most basic and universally applied preprocessing step. For each variable (column) in your data matrix, you subtract the mean of that variable across all samples:

x_{ij}^{centered} = x_{ij} - \overset{x}{ˉ}_{j}

where $\overset{x}{ˉ}_{j} = \frac{1}{n} \sum_{i = 1}^{n} x_{ij}$ is the mean of variable $j$ across all $n$ samples.

In matrix notation, if $X$ is your $n \times p$ data matrix (n samples, p variables):

X_{centered} = X - 1 \overset{ˉ}{x}^{T}

where $1$ is a column vector of ones and $\overset{ˉ}{x}$ is the vector of column means.

What centering does to your data

After centering, each variable has a mean of zero. Geometrically, you have moved the cloud of data points so that it is centered at the origin. This might seem trivial, but it has a profound effect on PCA.

Without centering, the first principal component points from the origin toward the center of the data cloud. It captures where the data is rather than how it varies. In spectroscopic terms, the first component is essentially the mean spectrum -- a perfectly uninteresting piece of information that tells you nothing about differences between samples.

With centering, the first principal component captures the direction of greatest variation between samples. This is what you actually want: the most important pattern of differences in your data. Every subsequent component captures the next most important source of variation, and so on.

Mean centering is not optional for PCA and PLS

Some software packages mean-center data automatically before computing PCA or PLS. Others do not. Always check. If you run PCA on uncentered data, your scores plot will be dominated by the mean rather than by meaningful variation between samples. The results will be technically correct but practically useless for most analytical purposes.

Example: spectroscopic data

Consider a set of NIR spectra measured on 50 samples. Each spectrum has 1000 wavelength points. The raw spectra all share a similar overall shape (the mean spectrum) with relatively small differences between samples.

Before centering, PCA finds that component 1 explains 99.5% of the variance -- but it is just the mean spectrum. The interesting differences between samples are buried in the remaining 0.5%.

After centering, those differences become the entire focus of the analysis. Component 1 now captures the most important chemical variation (perhaps protein content), component 2 captures the next most important (perhaps moisture), and so on.

Autoscaling (unit variance scaling)

Autoscaling combines mean centering with division by the standard deviation. It is Pearson's z-score applied column-wise:

x_{ij}^{auto} = \frac{x _{ij} - x ˉ _{j}}{s _{j}}

where $s_{j} = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{ij} - \overset{x}{ˉ}_{j})^{2}$ is the standard deviation of variable $j$ .

After autoscaling, every variable has zero mean and unit variance. Each variable contributes equally to the analysis regardless of its original scale or units.

When to use autoscaling

Autoscaling is the right choice when:

Variables have different units -- combining spectral data with temperature, pressure, pH, etc.
Variables have very different magnitudes -- one variable ranges 0-1 while another ranges 0-10,000
All variables are potentially important -- you do not want any single variable to dominate simply because of its scale
You are doing exploratory analysis -- autoscaling is a safe default that ensures every variable gets a fair hearing

The danger of autoscaling

Autoscaling has a well-known pitfall: it amplifies noise in low-variance variables. If a variable has very little real variation (perhaps it is nearly constant across all samples), its standard deviation will be small, and dividing by that small number magnifies whatever noise is present. A variable that was practically irrelevant in the raw data can become a major source of apparent variation after autoscaling.

This is particularly problematic in spectroscopy, where baseline regions with little chemical information can have low variance. After autoscaling, these noisy baseline regions get amplified to the same importance as the information-rich peak regions.

When not to autoscale

If all your variables are measured in the same units and on the same scale (for example, a set of NIR spectra where every wavelength channel records absorbance), autoscaling may not be appropriate. In this case, mean centering alone preserves the natural variance structure of the data, which often carries meaningful information. Autoscaling would give equal weight to every wavelength, including noisy baseline regions.

Pareto scaling

Pareto scaling offers a compromise between no scaling and autoscaling. Instead of dividing by the standard deviation, you divide by the square root of the standard deviation:

x_{ij}^{Pareto} = \frac{x _{ij} - x ˉ _{j}}{s _{j}}

This reduces the dominance of large-variance variables without completely equalizing all variables. Large signals are scaled down, but they still contribute more than small signals -- which is often desirable when the signal magnitude carries real information.

Why metabolomics loves Pareto scaling

Pareto scaling became particularly popular in metabolomics after van den Berg et al. (2006) published a systematic comparison of scaling methods for metabolomic data. Their key observation was that in metabolomics, peak intensity often correlates with biological importance -- major metabolites like glucose and lactate produce large peaks because they are present in high concentrations. Autoscaling would give equal weight to a glucose peak and a minor metabolite detected at the noise floor, which may not be desirable. Pareto scaling keeps the large metabolites prominent while still reducing the dominance gap, striking a practical balance.

The mathematical intuition is simple. If variable A has a standard deviation 100 times larger than variable B:

No scaling: A dominates by a factor of 100
Pareto scaling: A dominates by a factor of $100 = 10$
Autoscaling: Both have equal weight

Pareto scaling reduces the dominance ratio from 100:1 to 10:1, which often better reflects the underlying importance of the variables.

Range scaling

Range scaling divides each centered variable by its range (maximum minus minimum):

x_{ij}^{range} = \frac{x _{ij} - x ˉ _{j}}{x _{j, m a x} - x _{j, m i n}}

This maps each variable to a comparable scale determined by its observed range. It is conceptually similar to min-max normalization (which maps to [0, 1]) but preserves the centering at zero.

When range scaling makes sense

Range scaling is most useful in process monitoring and industrial settings where variables have well-defined operating ranges. A reactor temperature that operates between 150 and 200 degrees Celsius has a meaningful range of 50 degrees. A pressure sensor operating between 1 and 5 bar has a meaningful range of 4 bar. Range scaling normalizes each variable by its operational span, which often corresponds to the practical significance of that variable.

The main limitation is sensitivity to outliers. A single extreme value in one variable can inflate its range, pulling down the scaled values for all other samples. This makes range scaling less robust than autoscaling for datasets with outliers or non-standard samples.

Other scaling methods

Several additional scaling approaches have been proposed for specific applications:

VAST scaling (Variable Stability scaling): Divides by the coefficient of variation (standard deviation divided by the mean) times the standard deviation. This gives more weight to variables with small relative variation, which can be useful for identifying stable biomarkers.

Level scaling: Divides each centered variable by its mean value, making each variable a measure of relative deviation from the average. Useful when proportional changes are more meaningful than absolute changes.

Log transformation: Not strictly a scaling method, but often used in conjunction with scaling to handle multiplicative noise or data spanning several orders of magnitude. Common in metabolomics and gene expression analysis. Note that log transformation requires all values to be positive, and it changes the distributional properties of the data.

Power transformations (Box-Cox): A family of transformations parameterized by $λ$ that includes log transformation ( $λ = 0$ ) and square root ( $λ = 0.5$ ) as special cases. Used to stabilize variance or improve normality.

Choosing the right preprocessing

The choice between centering alone, autoscaling, Pareto scaling, or range scaling depends on the nature of your data and the goal of your analysis. There is no universal answer, but the following guidelines cover most practical situations.

Decision guide for centering and scaling

Step 1: Always mean-center. This is a prerequisite for PCA and PLS. There is almost never a reason to skip it.

Step 2: Do you need to scale?

All variables in the same units and scale (e.g., a set of spectra)? Mean centering alone may be sufficient. The natural variance structure carries information.
Variables in different units or vastly different scales? You must scale.

Step 3: Which scaling method?

Default / mixed-unit data / exploratory analysis: Autoscaling
Metabolomics / magnitude carries information: Pareto scaling
Process data with defined operating ranges: Range scaling
Data spanning orders of magnitude: Consider log transformation before scaling

Step 4: Validate your choice. Run PCA with different scaling options and compare the scores plots. Does the pattern make chemical sense? Are known groups separated? Is noise dominating?

The effect on PCA

The impact of centering and scaling on PCA results can be dramatic. Here is what typically happens with different preprocessing choices applied to a dataset of NIR spectra combined with process variables (temperature, pressure, flow rate).

No preprocessing (raw data):

PC1 captures the variable with the largest absolute values (often temperature or a dominant spectral region)
Scores plot is dominated by one axis
Chemical patterns are hidden

Mean centering only:

PC1 captures the direction of largest variance among all variables
For pure spectroscopic data, this often works well -- variance is naturally meaningful
For mixed-unit data, variables with large variance still dominate

Autoscaling (centering + unit variance):

Each variable contributes equally to the analysis
Chemical patterns emerge clearly in the scores plot
Risk: noise in low-variance variables may obscure real patterns

Pareto scaling:

Large-variance variables contribute more than small-variance variables, but not overwhelmingly so
Good compromise for heterogeneous data

The scores plot comparison typically reveals the following pattern: without centering, PC1 vs PC2 shows little structure. With centering alone on mixed-unit data, one or two variables dominate. With autoscaling, clusters and trends that reflect real chemistry become visible. Pareto scaling often gives results intermediate between centering-only and autoscaling.

Code implementation

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

def mean_center(X):
    """
    Mean center a data matrix (subtract column means).

    Parameters:
    -----------
    X : ndarray of shape (n_samples, n_variables)
        Raw data matrix

    Returns:
    --------
    X_centered : ndarray
        Mean-centered data
    means : ndarray
        Column means (needed to transform new data)
    """
    means = X.mean(axis=0)
    X_centered = X - means
    return X_centered, means


def autoscale(X):
    """
    Autoscale a data matrix (mean center + divide by std).

    Parameters:
    -----------
    X : ndarray of shape (n_samples, n_variables)
        Raw data matrix

    Returns:
    --------
    X_scaled : ndarray
        Autoscaled data (zero mean, unit variance)
    means : ndarray
        Column means
    stds : ndarray
        Column standard deviations
    """
    means = X.mean(axis=0)
    stds = X.std(axis=0, ddof=1)
    # Avoid division by zero for constant variables
    stds[stds == 0] = 1.0
    X_scaled = (X - means) / stds
    return X_scaled, means, stds


def pareto_scale(X):
    """
    Pareto scale a data matrix (mean center + divide by sqrt(std)).

    Parameters:
    -----------
    X : ndarray of shape (n_samples, n_variables)
        Raw data matrix

    Returns:
    --------
    X_scaled : ndarray
        Pareto-scaled data
    means : ndarray
        Column means
    stds : ndarray
        Column standard deviations
    """
    means = X.mean(axis=0)
    stds = X.std(axis=0, ddof=1)
    stds[stds == 0] = 1.0
    X_scaled = (X - means) / np.sqrt(stds)
    return X_scaled, means, stds


def range_scale(X):
    """
    Range scale a data matrix (mean center + divide by range).

    Parameters:
    -----------
    X : ndarray of shape (n_samples, n_variables)
        Raw data matrix

    Returns:
    --------
    X_scaled : ndarray
        Range-scaled data
    means : ndarray
        Column means
    ranges : ndarray
        Column ranges (max - min)
    """
    means = X.mean(axis=0)
    ranges = X.max(axis=0) - X.min(axis=0)
    ranges[ranges == 0] = 1.0
    X_scaled = (X - means) / ranges
    return X_scaled, means, ranges


# Example: compare scaling effects on PCA
np.random.seed(42)

# Simulate mixed-unit data: 50 samples, 3 variables
# Variable 1: NIR absorbance (0.1 to 0.5)
# Variable 2: Temperature (150 to 200 C)
# Variable 3: Moisture (1 to 5 %)
n_samples = 50
absorbance = 0.3 + 0.1 * np.random.randn(n_samples)
temperature = 175 + 15 * np.random.randn(n_samples)
moisture = 3 + 1.2 * np.random.randn(n_samples)

# Add a correlation structure (moisture affects absorbance)
absorbance += 0.02 * (moisture - 3)

X = np.column_stack([absorbance, temperature, moisture])
labels = ['Absorbance', 'Temperature', 'Moisture']

# Apply different preprocessing
X_centered, _ = mean_center(X)
X_auto, _, _ = autoscale(X)
X_pareto, _, _ = pareto_scale(X)

# Simple PCA via SVD
def pca_scores(X_pre, n_components=2):
    U, S, Vt = np.linalg.svd(X_pre, full_matrices=False)
    scores = U[:, :n_components] * S[:n_components]
    explained = (S**2) / np.sum(S**2) * 100
    return scores, explained

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for ax, X_pre, title in zip(
    axes,
    [X_centered, X_auto, X_pareto],
    ['Mean centering only', 'Autoscaling', 'Pareto scaling']
):
    scores, expl = pca_scores(X_pre)
    ax.scatter(scores[:, 0], scores[:, 1], alpha=0.7, edgecolors='k', s=50)
    ax.set_xlabel(f'PC1 ({expl[0]:.1f}%)')
    ax.set_ylabel(f'PC2 ({expl[1]:.1f}%)')
    ax.set_title(title)
    ax.axhline(0, color='gray', linewidth=0.5, linestyle='--')
    ax.axvline(0, color='gray', linewidth=0.5, linestyle='--')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

function [X_centered, means] = mean_center(X)
    % Mean center a data matrix (subtract column means)
    %
    % Parameters:
    %   X - Data matrix (n_samples x n_variables)
    %
    % Returns:
    %   X_centered - Mean-centered data
    %   means - Column means

    means = mean(X, 1);
    X_centered = X - means;
end

function [X_scaled, means, stds] = autoscale(X)
    % Autoscale a data matrix (mean center + divide by std)
    %
    % Parameters:
    %   X - Data matrix (n_samples x n_variables)
    %
    % Returns:
    %   X_scaled - Autoscaled data (zero mean, unit variance)
    %   means - Column means
    %   stds - Column standard deviations

    means = mean(X, 1);
    stds = std(X, 0, 1);  % ddof = 1
    stds(stds == 0) = 1;  % Avoid division by zero
    X_scaled = (X - means) ./ stds;
end

function [X_scaled, means, stds] = pareto_scale(X)
    % Pareto scale a data matrix (mean center + divide by sqrt(std))
    %
    % Parameters:
    %   X - Data matrix (n_samples x n_variables)
    %
    % Returns:
    %   X_scaled - Pareto-scaled data
    %   means - Column means
    %   stds - Column standard deviations

    means = mean(X, 1);
    stds = std(X, 0, 1);
    stds(stds == 0) = 1;
    X_scaled = (X - means) ./ sqrt(stds);
end

function [X_scaled, means, ranges] = range_scale(X)
    % Range scale a data matrix (mean center + divide by range)
    %
    % Parameters:
    %   X - Data matrix (n_samples x n_variables)
    %
    % Returns:
    %   X_scaled - Range-scaled data
    %   means - Column means
    %   ranges - Column ranges (max - min)

    means = mean(X, 1);
    ranges = max(X, [], 1) - min(X, [], 1);
    ranges(ranges == 0) = 1;
    X_scaled = (X - means) ./ ranges;
end

% Example: compare scaling effects on PCA
rng(42);

% Simulate mixed-unit data: 50 samples, 3 variables
n_samples = 50;
absorbance = 0.3 + 0.1 * randn(n_samples, 1);
temperature = 175 + 15 * randn(n_samples, 1);
moisture = 3 + 1.2 * randn(n_samples, 1);

% Add correlation: moisture affects absorbance
absorbance = absorbance + 0.02 * (moisture - 3);

X = [absorbance, temperature, moisture];

% Apply different preprocessing
[X_centered, ~] = mean_center(X);
[X_auto, ~, ~] = autoscale(X);
[X_pareto, ~, ~] = pareto_scale(X);

% PCA via SVD
figure('Position', [100 100 1400 400]);

datasets = {X_centered, X_auto, X_pareto};
titles = {'Mean centering only', 'Autoscaling', 'Pareto scaling'};

for k = 1:3
    [U, S, V] = svd(datasets{k}, 'econ');
    scores = U * S;
    explained = diag(S).^2 / sum(diag(S).^2) * 100;

    subplot(1, 3, k);
    scatter(scores(:,1), scores(:,2), 50, 'filled', ...
            'MarkerEdgeColor', 'k', 'MarkerFaceAlpha', 0.7);
    xlabel(sprintf('PC1 (%.1f%%)', explained(1)));
    ylabel(sprintf('PC2 (%.1f%%)', explained(2)));
    title(titles{k});
    xline(0, '--', 'Color', [0.5 0.5 0.5]);
    yline(0, '--', 'Color', [0.5 0.5 0.5]);
    grid on;
end

mean_center <- function(X) {
  # Mean center a data matrix (subtract column means)
  #
  # Parameters:
  #   X - Data matrix (n_samples x n_variables)
  #
  # Returns:
  #   list with centered data and means

  means <- colMeans(X)
  X_centered <- sweep(X, 2, means, "-")
  return(list(X = X_centered, means = means))
}

autoscale <- function(X) {
  # Autoscale a data matrix (mean center + divide by std)
  #
  # Parameters:
  #   X - Data matrix (n_samples x n_variables)
  #
  # Returns:
  #   list with scaled data, means, and standard deviations

  means <- colMeans(X)
  stds <- apply(X, 2, sd)
  stds[stds == 0] <- 1  # Avoid division by zero
  X_scaled <- sweep(sweep(X, 2, means, "-"), 2, stds, "/")
  return(list(X = X_scaled, means = means, stds = stds))
}

pareto_scale <- function(X) {
  # Pareto scale a data matrix (mean center + divide by sqrt(std))
  #
  # Parameters:
  #   X - Data matrix (n_samples x n_variables)
  #
  # Returns:
  #   list with scaled data, means, and standard deviations

  means <- colMeans(X)
  stds <- apply(X, 2, sd)
  stds[stds == 0] <- 1
  X_scaled <- sweep(sweep(X, 2, means, "-"), 2, sqrt(stds), "/")
  return(list(X = X_scaled, means = means, stds = stds))
}

range_scale <- function(X) {
  # Range scale a data matrix (mean center + divide by range)
  #
  # Parameters:
  #   X - Data matrix (n_samples x n_variables)
  #
  # Returns:
  #   list with scaled data, means, and ranges

  means <- colMeans(X)
  ranges <- apply(X, 2, max) - apply(X, 2, min)
  ranges[ranges == 0] <- 1
  X_scaled <- sweep(sweep(X, 2, means, "-"), 2, ranges, "/")
  return(list(X = X_scaled, means = means, ranges = ranges))
}

# Example: compare scaling effects on PCA
set.seed(42)

# Simulate mixed-unit data: 50 samples, 3 variables
n_samples <- 50
absorbance <- 0.3 + 0.1 * rnorm(n_samples)
temperature <- 175 + 15 * rnorm(n_samples)
moisture <- 3 + 1.2 * rnorm(n_samples)

# Add correlation: moisture affects absorbance
absorbance <- absorbance + 0.02 * (moisture - 3)

X <- cbind(absorbance, temperature, moisture)

# Apply different preprocessing
res_centered <- mean_center(X)
res_auto <- autoscale(X)
res_pareto <- pareto_scale(X)

# PCA via SVD and plot
par(mfrow = c(1, 3), mar = c(4, 4, 3, 1))

datasets <- list(res_centered$X, res_auto$X, res_pareto$X)
titles <- c("Mean centering only", "Autoscaling", "Pareto scaling")

for (k in 1:3) {
  svd_result <- svd(datasets[[k]])
  scores <- svd_result$u %*% diag(svd_result$d)
  explained <- svd_result$d^2 / sum(svd_result$d^2) * 100

  plot(scores[, 1], scores[, 2],
       pch = 19, col = rgb(0.2, 0.4, 0.8, 0.7), cex = 1.2,
       xlab = sprintf("PC1 (%.1f%%)", explained[1]),
       ylab = sprintf("PC2 (%.1f%%)", explained[2]),
       main = titles[k])
  abline(h = 0, v = 0, col = "gray", lty = 2)
  grid()
}

Practical guidelines

Always do

Mean-center your data before PCA, PLS, or any covariance-based method. This is not optional -- it is a mathematical requirement for meaningful results.

Store the preprocessing parameters (means, standard deviations, ranges) from your calibration set. When you preprocess new samples for prediction, you must apply the same means and standard deviations, not recalculate them from the new data.

Document your choices. Record which preprocessing you applied and why. Reproducibility requires knowing exactly how the data was transformed.

Watch out for

Autoscaling noisy variables. If a variable has very low variance, dividing by its standard deviation amplifies noise. Consider removing such variables or using Pareto scaling instead.

Inconsistent preprocessing. The calibration set, validation set, and any new samples must all be preprocessed with the same parameters (the means and standard deviations from the calibration set).

Scaling spectroscopic data unnecessarily. If all variables are in the same units (e.g., absorbance at different wavelengths), mean centering alone often gives better results than autoscaling, because the natural variance structure carries chemical information.

Quick reference: which method to use

Situation	Recommended approach
Pure spectroscopic data (same units)	Mean centering only
Mixed-unit data (spectra + process variables)	Autoscaling
Metabolomics / peak intensity matters	Pareto scaling
Process monitoring with known operating ranges	Range scaling
Data spanning orders of magnitude	Log transform + centering or autoscaling
Exploratory analysis, uncertain which to use	Try autoscaling first, compare with centering only

Applying preprocessing to new data

When you build a calibration model, you compute means and standard deviations from your training data. New samples must be preprocessed using those same parameters, not recalculated from the new data. This is a common source of errors.

# During calibration
X_train_scaled, means, stds = autoscale(X_train)
model = build_pls_model(X_train_scaled, y_train)

# During prediction -- use calibration means and stds
X_new_scaled = (X_new - means) / stds   # NOT autoscale(X_new)
y_predicted = model.predict(X_new_scaled)

If you recalculate means and standard deviations from new data, each prediction batch gets a different preprocessing, and the model's coefficients no longer apply correctly.

References

[1] van den Berg, R. A., Hoefsloot, H. C. J., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics, 7, 142.

[2] Bro, R., & Smilde, A. K. (2014). Principal component analysis. Analytical Methods, 6(9), 2812-2831.

[3] Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(11), 559-572.

[4] Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1-3), 37-52.

[5] Rinnan, Å., van den Berg, F., & Engelsen, S. B. (2009). Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends in Analytical Chemistry, 28(10), 1201-1222.

[6] Eriksson, L., Byrne, T., Johansson, E., Trygg, J., & Vikstrom, C. (2013). Multi- and Megavariate Data Analysis: Basic Principles and Applications. Umetrics Academy.

[7] Brereton, R. G. (2003). Chemometrics: Data Analysis for the Laboratory and Chemical Plant. Wiley.

[8] Martens, H., & Naes, T. (1989). Multivariate Calibration. Wiley.

Centering and Scaling

Always do

Watch out for

On this page