Skip to content

Factor Analysis

Factor analysis was born not in a chemistry laboratory, but in the psychology department of University College London. In 1904, the British psychologist Charles Spearman published a landmark paper titled “General Intelligence, Objectively Determined and Measured” in the American Journal of Psychology [1]. Spearman had noticed something striking: students who scored well on one type of cognitive test (say, arithmetic) also tended to score well on apparently unrelated tests (say, pitch discrimination). He proposed that a single hidden variable — which he called g, for “general intelligence” — could explain the correlations among diverse mental tests. The observed test scores were not the fundamental quantities; they were noisy manifestations of something deeper and unobservable. This was the first formal statement of the latent variable idea, and it required a new mathematical framework to make it precise.

That framework was expanded dramatically in the 1930s and 1940s by the American psychologist L.L. Thurstone. Where Spearman had argued for a single general factor, Thurstone believed that intelligence was composed of multiple independent abilities — verbal comprehension, spatial reasoning, numerical fluency, and others. He developed the theory of multiple factor analysis, introduced the concept of factor rotation to achieve what he called “simple structure” (a solution where each variable loads heavily on as few factors as possible), and laid the mathematical foundations that transformed factor analysis from a one-factor trick into a general multivariate method [2]. The tension between Spearman’s single g and Thurstone’s multiple factors drove decades of debate and refinement.

The migration of factor analysis into chemistry came through Edmund R. Malinowski, a physical chemist at Stevens Institute of Technology. His 1991 book Factor Analysis in Chemistry [3] showed how the same mathematical machinery that psychologists used to find latent mental abilities could be used to determine the number of independent chemical components in a spectroscopic mixture and to resolve their pure spectra. In chemistry, the “latent factors” are not abstract psychological constructs — they are real chemical species whose spectra and concentration profiles can, in principle, be measured independently. This concrete physical interpretation made factor analysis particularly powerful in chemometrics and led to an entire family of methods including evolving factor analysis, self-modeling curve resolution, and target factor analysis.

The latent variable concept

The central idea of factor analysis is that the variables you observe are not the variables that matter. Instead, a smaller number of hidden (latent) factors generate the observed data.

Consider a UV-Vis spectrum of a three-component mixture measured at 200 wavelengths. You have 200 observed variables (absorbances), but only three independent sources of variation (the three chemical species). The 200 absorbances are not independent — they are driven by three latent factors, plus measurement noise. Factor analysis formalizes this intuition.

In the factor model, each observed variable is written as a linear combination of common factors plus a term unique to that variable:

The coefficients are called factor loadings — they tell you how strongly observed variable depends on factor . The term captures everything specific to variable that the common factors do not explain: measurement noise, detector artifacts, or any variation unique to that particular wavelength.

This is fundamentally different from simply reducing dimensions. Factor analysis posits a generative model: the factors come first, and the data are a consequence. The observed correlations among variables exist because those variables share common underlying causes.

Factor analysis vs PCA

Factor analysis and principal component analysis (PCA) are often confused, and for understandable reasons: both reduce a large number of variables to a smaller number of summary quantities, and both involve eigendecompositions of correlation or covariance matrices. But they rest on different assumptions and answer different questions.

Factor AnalysisPCA
GoalModel the latent structure that generates correlationsSummarize total variance in fewer dimensions
ModelExplicit generative model: No generative model; purely a decomposition
VarianceSeparates common variance (shared among variables) from unique variance (specific to each)Accounts for all variance (common + unique)
CommunalitiesDiagonal of covariance matrix is modeled, not fixed at 1Uses full variance (diagonal = 1 in correlation matrix)
RotationRoutinely rotated for interpretability (varimax, promax)Rotation destroys the orthogonality and variance-ordering properties
Number of componentsMust be specified or estimated as part of the modelAll components exist; you choose how many to retain
UniquenessSolution is not unique without rotation constraintsUnique (up to sign) for a given matrix

The practical consequence: if you want to understand what latent structure drives your data, use factor analysis. If you just need a compact representation of your data for subsequent modeling (regression, classification), PCA is usually the simpler and more appropriate choice.

The factor model mathematically

For a random vector of observed variables (centered to have zero mean), the factor model is:

where:

  • is the factor loading matrix (links observed variables to latent factors)
  • is the vector of common factors (the latent variables, with )
  • is the vector of unique factors (variable-specific noise)

The model makes three key assumptions:

  1. The common factors are uncorrelated with each other and have unit variance:
  2. The unique factors are uncorrelated with each other: , where is diagonal
  3. The common factors and unique factors are uncorrelated:

Under these assumptions, the covariance matrix of the observed variables decomposes as:

This is the fundamental equation of factor analysis. The total covariance has two parts:

  • is the common variance — the part explained by the shared latent factors
  • is the unique variance — the part specific to each variable

Communalities and uniqueness

For each variable , its total variance splits into two pieces:

The communality is the proportion of variable ‘s variance explained by the common factors. A high communality means the variable is well-represented by the factor model. The uniqueness is whatever is left over — noise, measurement error, or variation specific to that variable alone.

In spectroscopy, a wavelength with high communality is one that is strongly influenced by the chemical species in the mixture. A wavelength with high uniqueness might be dominated by detector noise or an instrumental artifact unrelated to the analytes.

Estimation methods

Given an observed sample covariance matrix , the goal is to find and such that . Two methods dominate practice.

Principal factor method (principal axis factoring)

This is the most intuitive approach:

  1. Estimate communalities

    Start with initial communality estimates. A common choice is the squared multiple correlation of each variable with all others, or simply the largest absolute correlation in each row of the correlation matrix.

  2. Replace the diagonal

    Substitute the communality estimates into the diagonal of the correlation matrix, replacing the ones. This modified matrix represents only the common variance.

  3. Eigendecompose

    Compute the eigenvalues and eigenvectors of the reduced correlation matrix.

  4. Extract factors

    Retain the largest eigenvalues and corresponding eigenvectors. The loading matrix is:

    where contains the first eigenvectors and is the diagonal matrix of the first eigenvalues.

  5. Update and iterate

    Compute new communality estimates from the loadings ( ), replace the diagonal again, and repeat until convergence.

The principal factor method is straightforward and widely implemented, but it provides no formal statistical framework for testing hypotheses about the number of factors.

Maximum likelihood factor analysis

Introduced by Lawley in 1940 [4], maximum likelihood factor analysis (MLFA) assumes the data follow a multivariate normal distribution and finds the and that maximize the likelihood of the observed data.

The log-likelihood to be maximized (up to a constant) is:

This is solved iteratively using numerical optimization. The key advantage is that MLFA provides a likelihood ratio test for the number of factors: you can formally test whether factors are sufficient to explain the observed covariance structure. The test statistic is approximately chi-squared, giving a p-value for the null hypothesis that the model fits.

Rotation

The factor loading matrix is not unique. For any orthogonal matrix , the transformation produces a different loading matrix that fits the data equally well, because:

This rotational indeterminacy means the initial factor solution is just one of infinitely many equivalent solutions. Rotation exploits this freedom to find the solution that is easiest to interpret.

Why rotate?

The unrotated solution from eigendecomposition tends to produce a first factor that loads on almost everything (a “general” factor) and subsequent factors that contrast groups of variables. This is mathematically natural but often hard to interpret. Rotation redistributes the variance among factors to achieve simple structure — Thurstone’s criterion that each variable should load highly on one factor and near zero on the others.

Varimax (orthogonal rotation)

Varimax, introduced by Kaiser in 1958 [5], is the most widely used orthogonal rotation. It maximizes the variance of the squared loadings within each factor:

The effect is to push loadings toward either large or near-zero values, making the pattern of variable-factor associations clearer. Varimax maintains orthogonality — the rotated factors remain uncorrelated.

Promax (oblique rotation)

In many real situations, the underlying factors are not truly independent. Promax starts from a varimax solution and then allows the factors to become correlated (oblique). It raises the varimax loadings to a power (typically 2—4) to sharpen the pattern, then finds the oblique rotation closest to this target. The result is often simpler structure than varimax, at the cost of introducing factor correlations.

Simple structure in chemistry

In spectroscopic mixture analysis, rotation has a direct physical meaning. The unrotated factors from an eigendecomposition of a spectral matrix are abstract mathematical constructs — linear combinations of pure spectra that do not correspond to any real chemical species. Rotation can bring the factors closer to the actual pure spectra, especially when the spectra have limited overlap. This is the basic idea behind target rotation and self-modeling curve resolution, where additional constraints (non-negativity of spectra and concentrations) are imposed.

Determining the number of factors

Choosing , the number of factors to retain, is one of the most consequential decisions in factor analysis. Several criteria exist, and they do not always agree.

Eigenvalue greater than 1 (Kaiser’s rule)

Retain factors whose eigenvalues (from the correlation matrix) exceed 1. The reasoning: a factor that explains less variance than a single original variable is not worth keeping. This rule is simple and widely used, but it tends to overestimate the number of factors, especially with many variables [6].

Scree test

Plot the eigenvalues in descending order (the “scree plot”) and look for an “elbow” — a point where the eigenvalues transition from steep descent to a flat, rubble-like tail. Factors above the elbow are retained. The scree test is subjective (different analysts may see the elbow at different points), but it often works well in practice, particularly when the signal-to-noise ratio is high.

Parallel analysis

Generate random datasets with the same dimensions as your data (same and , but no factor structure). Compute eigenvalues from these random matrices. Retain only those factors whose eigenvalues from your real data exceed the corresponding eigenvalues from the random data. This accounts for the fact that eigenvalues from random data are not zero — they have a sampling distribution. Parallel analysis is generally considered the most reliable automated criterion [7].

Likelihood ratio test (for MLFA)

Fit models with factors and use the chi-squared test to determine the smallest that provides an adequate fit. Be cautious with large sample sizes, where the test becomes overly sensitive, and with small sample sizes, where the chi-squared approximation may be poor.

Cross-validation

Split the data into training and test sets. Fit the model with different numbers of factors on the training set and evaluate prediction of the held-out data. The number of factors that minimizes prediction error is retained. This is the most principled approach when the goal is prediction rather than interpretation.

Factor analysis in chemistry

Factor analysis entered chemistry through the problem of mixture resolution: given a set of spectra from samples containing unknown combinations of chemical species, can you determine how many species are present and what their pure spectra look like?

Determining the number of components

The first and most fundamental application is estimating the chemical rank of a spectral data matrix (samples by wavelengths). If the mixture contains absorbing species and Beer’s Law holds, then the rank of (in the absence of noise) is exactly . Factor analysis provides a principled way to estimate this rank in the presence of noise.

Malinowski developed the indicator function (IND) and the factor indicator function (embedded error) specifically for this purpose [3]. These functions exploit the behavior of the eigenvalues: the first eigenvalues correspond to real chemical factors, while the remaining eigenvalues correspond to noise. The IND function reaches a minimum at the correct number of factors.

Evolving factor analysis (EFA)

In process monitoring or chromatography, spectra are collected sequentially over time. Evolving factor analysis applies factor analysis to expanding windows of the data matrix — first to the first 2 spectra, then the first 3, and so on. By tracking how the number of significant factors changes as new spectra are added, you can determine when new chemical species appear and disappear. This provides a concentration “window” for each species, which is valuable prior information for subsequent resolution.

Abstract vs. real factors

The factors extracted by standard factor analysis are abstract — they are linear combinations of the true pure spectra, mathematically valid but chemically meaningless. To obtain physically meaningful solutions, additional constraints are needed:

  • Target factor analysis (TFA): test whether a known (or suspected) pure spectrum is consistent with the factor space. If the target spectrum can be well-reproduced as a linear combination of the abstract factors, it is a plausible component of the mixture.
  • Self-modeling curve resolution (SMCR): constrain the factors to be non-negative (spectra and concentrations cannot be negative) and find the range of physically acceptable solutions.

These methods bridge the gap between the mathematical elegance of factor analysis and the physical reality of chemical mixtures.

Code implementation

Here are implementations of factor analysis in three languages, applied to a simulated spectroscopic dataset.

import numpy as np
from sklearn.decomposition import FactorAnalysis
import matplotlib.pyplot as plt
# --- Simulate a 3-component spectroscopic mixture ---
np.random.seed(42)
n_samples = 50
n_wavelengths = 100
n_components = 3
# Pure spectra (Gaussian peaks at different positions)
wavelengths = np.linspace(200, 800, n_wavelengths)
pure_spectra = np.zeros((n_components, n_wavelengths))
centers = [350, 500, 650]
for i, center in enumerate(centers):
pure_spectra[i] = np.exp(-0.5 * ((wavelengths - center) / 40) ** 2)
# Random concentrations (non-negative)
concentrations = np.random.exponential(1, (n_samples, n_components))
# Observed spectra = concentrations x pure_spectra + noise
noise = 0.02 * np.random.randn(n_samples, n_wavelengths)
spectra = concentrations @ pure_spectra + noise
# --- Factor analysis ---
fa = FactorAnalysis(n_components=n_components, random_state=42)
fa.fit(spectra)
# Factor loadings (components_ in sklearn are the loadings transposed)
loadings = fa.components_.T # (n_wavelengths, n_components)
# Communalities
communalities = np.sum(loadings ** 2, axis=1)
# Uniquenesses (noise variance per variable)
uniquenesses = fa.noise_variance_
print(f"Number of factors: {n_components}")
print(f"Mean communality: {communalities.mean():.4f}")
print(f"Mean uniqueness: {uniquenesses.mean():.6f}")
# --- Scree plot using eigenvalues of correlation matrix ---
corr_matrix = np.corrcoef(spectra.T)
eigenvalues = np.linalg.eigvalsh(corr_matrix)[::-1]
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Scree plot
axes[0].plot(range(1, 16), eigenvalues[:15], 'bo-', markersize=6)
axes[0].axhline(y=1, color='r', linestyle='--', alpha=0.7, label='Kaiser criterion')
axes[0].set_xlabel('Factor number')
axes[0].set_ylabel('Eigenvalue')
axes[0].set_title('Scree Plot')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Factor loadings
for k in range(n_components):
axes[1].plot(wavelengths, loadings[:, k], label=f'Factor {k+1}')
axes[1].set_xlabel('Wavelength (nm)')
axes[1].set_ylabel('Loading')
axes[1].set_title('Factor Loadings')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
# Communalities
axes[2].plot(wavelengths, communalities, 'g-', linewidth=2)
axes[2].set_xlabel('Wavelength (nm)')
axes[2].set_ylabel('Communality')
axes[2].set_title('Communalities across Wavelengths')
axes[2].set_ylim(0, 1.05)
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

When to use factor analysis

Good applications

Determining the number of components in a mixture The fundamental question in spectroscopic mixture analysis. Factor analysis provides formal statistical tests.

Understanding latent structure When you believe hidden variables drive the observed correlations — common in environmental monitoring, process data, and sensory analysis.

Spectroscopic mixture resolution Combined with constraints (non-negativity, target testing), factor analysis is the basis for curve resolution methods.

Exploratory multivariate analysis When you want to identify groups of correlated variables and understand what drives the correlations.

Better alternatives exist for

Pure dimensionality reduction If you just need fewer variables for regression or classification, PCA is simpler and makes fewer assumptions.

Prediction tasks Factor scores are not optimal predictors. Use PLS or PCR instead.

Very small samples Factor analysis requires reasonable sample sizes (at least 5—10 observations per variable as a rough guideline). With few samples, results are unstable.

Non-linear relationships The factor model is linear. For non-linear latent structure, consider autoencoders or non-linear PCA.

Next steps

Factor analysis connects to several important chemometric methods:

  • Principal Component Analysis (PCA) is the close relative that decomposes total variance without a generative model. PCA scores and loadings are often the starting point for factor analysis.

  • Multivariate Curve Resolution (MCR-ALS) extends factor analysis for chemical mixtures by imposing non-negativity, unimodality, and closure constraints to recover physically meaningful spectra and concentration profiles.

  • Target Factor Analysis (TFA) tests whether a known pure spectrum belongs to the factor space of a mixture, providing a powerful tool for component identification.

  • Evolving Factor Analysis (EFA) tracks the appearance and disappearance of chemical species over time or along a process variable, useful in chromatography and reaction monitoring.

  • Independent Component Analysis (ICA) goes beyond uncorrelated factors to seek statistically independent sources, useful when the central limit theorem does not apply.

References

[1] Spearman, C. (1904). “General intelligence,” objectively determined and measured. American Journal of Psychology, 15(2), 201—293.

[2] Thurstone, L. L. (1947). Multiple-Factor Analysis: A Development and Expansion of The Vectors of Mind. University of Chicago Press.

[3] Malinowski, E. R. (1991). Factor Analysis in Chemistry (2nd ed.). Wiley.

[4] Lawley, D. N., & Maxwell, A. E. (1971). Factor Analysis as a Statistical Method (2nd ed.). Butterworths.

[5] Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23(3), 187—200.

[6] Harman, H. H. (1976). Modern Factor Analysis (3rd ed.). University of Chicago Press.

[7] Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179—185.

[8] Jaumot, J., de Juan, A., & Tauler, R. (2015). MCR-ALS GUI 2.0: New features and applications. Chemometrics and Intelligent Laboratory Systems, 140, 1—12.

[9] Malinowski, E. R. (1977). Determination of the number of factors and the experimental error in a data matrix. Analytical Chemistry, 49(4), 612—617.

[10] Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245—276.