Linear Regression

From Galton's heights to Beer's Law calibrations - the foundational model for predicting one variable from another

The word "regression" entered statistics through an unlikely door: the study of human height. In 1886, the English polymath Francis Galton published a paper titled Regression towards Mediocrity in Hereditary Stature, in which he measured the heights of 928 adult children and their parents. He observed that unusually tall parents tended to have children who were tall, but not quite as tall as their parents -- and unusually short parents tended to have children who were short, but not quite as short. The children's heights "regressed" toward the population average. Galton drew a straight line through a scatter of parent-child height pairs, and in doing so created the first regression analysis.

Galton's colleague Karl Pearson took the idea further. Over the following decade, Pearson developed the correlation coefficient $r$ , formalized the method of fitting a straight line to bivariate data, and built the mathematical framework that turned Galton's empirical observation into a general statistical tool. By 1903, Pearson and his students had applied the method to problems ranging from skull measurements to meteorological data.

But the mathematical engine underneath regression -- the method of least squares -- had been invented a full eighty years earlier, and in a completely different field. In 1805, the French mathematician Adrien-Marie Legendre published the first description of the least squares method in his work on determining the orbits of comets. He proposed minimizing the sum of squared residuals as the criterion for the "best" fit, without any probabilistic justification. Four years later, Carl Friedrich Gauss provided that justification: if measurement errors follow a normal (Gaussian) distribution, then the least squares solution is also the maximum likelihood estimate. Gauss claimed he had been using the method since 1795, though he published after Legendre. The priority dispute between them was bitter and never fully resolved.

What matters for us is how the method migrated from astronomy to chemistry. Throughout the 19th and 20th centuries, analytical chemists needed to convert instrument readings into concentrations. The standard procedure -- still used in every analytical chemistry laboratory today -- is the calibration curve: prepare a set of standards with known concentrations, measure each one on the instrument, plot signal versus concentration, and fit a straight line. That line is a linear regression model. When August Beer formalized his law of light absorption in 1852, the linear relationship between absorbance and concentration gave calibration curves a firm theoretical basis. Linear regression became the workhorse of quantitative chemical analysis.

The calibration problem in chemistry

Suppose you need to determine how much lead is in a set of drinking water samples. You have an atomic absorption spectrometer that measures absorbance, and you know from Beer's Law that absorbance is proportional to concentration. But you don't know the proportionality constant for your particular instrument, cuvette path length, and wavelength setting. You need a calibration model.

The procedure is straightforward:

Prepare standards

Make 5--8 solutions with known lead concentrations (say 0, 5, 10, 20, 50, 100 ppb).
Measure each standard

Run each solution through the spectrometer and record the absorbance.
Fit a line

Plot absorbance vs. concentration and find the straight line that best describes the relationship.
Predict unknowns

Measure the absorbance of your unknown samples and use the fitted line to read off their concentrations.

This is linear regression in its most natural chemical setting. The "line of best fit" is not drawn by eye (as was common before the 1970s) but computed by a precise mathematical criterion: least squares. The line that minimizes the total squared distance between the observed points and the line is, in a well-defined sense, the best one.

The mathematical model

The simplest linear regression model relates a response variable $y$ to a single predictor $x$ :

y_{i} = β_{0} + β_{1} x_{i} + ε_{i}, i = 1, 2, \dots, n

Each symbol has a concrete chemical meaning:

$y_{i}$ is the measured response for the $i$ -th sample (e.g., absorbance at 283.3 nm)
$x_{i}$ is the known predictor value (e.g., lead concentration in ppb)
$β_{0}$ is the intercept -- the expected response when $x = 0$ (ideally zero for a blank, but often slightly nonzero due to stray light, solvent absorption, or detector offset)
$β_{1}$ is the slope -- how much the response changes per unit change in $x$ (the sensitivity of the method)
$ε_{i}$ is the error (or residual) -- the difference between what we actually measured and what the model predicts, arising from random measurement noise

The model says that the data are generated by a deterministic linear relationship $β_{0} + β_{1} x$ plus random noise $ε$ . We observe $x$ and $y$ ; we want to estimate $β_{0}$ and $β_{1}$ .

Why epsilon?

The error term $ε$ is not a flaw in the model -- it is a statement about reality. No instrument is perfectly repeatable, no sample is perfectly homogeneous, and no measurement is free of noise. By including $ε$ explicitly, the model acknowledges what every experimentalist knows: there is always some irreducible uncertainty. The question is not whether noise exists, but how to estimate the underlying relationship despite it.

Finding the best line

We have $n$ data points $(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n}, y_{n})$ and we want the values of $β_{0}$ and $β_{1}$ that make the line fit the data as closely as possible. The standard criterion is ordinary least squares (OLS): minimize the sum of squared residuals.

Define the residual for the $i$ -th observation as:

e_{i} = y_{i} - (β_{0} + β_{1} x_{i})

The sum of squared residuals is:

S (β_{0}, β_{1}) = i = 1 \sum n e_{i}^{2} = i = 1 \sum n (y_{i} - β_{0} - β_{1} x_{i})^{2}

To minimize $S$ , take partial derivatives with respect to $β_{0}$ and $β_{1}$ , set them to zero, and solve:

\frac{\partial S}{\partial β _{0}} = - 2 i = 1 \sum n (y_{i} - β_{0} - β_{1} x_{i}) = 0

\frac{\partial S}{\partial β _{1}} = - 2 i = 1 \sum n x_{i} (y_{i} - β_{0} - β_{1} x_{i}) = 0

These are the normal equations. Solving them gives closed-form expressions for the optimal coefficients:

\hat{β}_{1} = \frac{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ( y _{i} - y ˉ )}{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}

\hat{β}_{0} = \overset{y}{ˉ} - \hat{β}_{1} \overset{x}{ˉ}

where $\overset{x}{ˉ} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$ and $\overset{y}{ˉ} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}$ are the sample means.

The formula for $\hat{β}_{1}$ has an intuitive reading: the slope is the ratio of "how much $x$ and $y$ vary together" (the covariance) to "how much $x$ varies on its own" (the variance). If $x$ and $y$ increase together, the slope is positive. If one increases while the other decreases, the slope is negative.

Why least squares?

The choice of squaring the residuals is not arbitrary. Gauss showed in 1809 that if the errors $ε_{i}$ are independent and normally distributed, then minimizing the sum of squared residuals gives the maximum likelihood estimate of the coefficients. In other words, the least squares line is the one that was most probable to have generated your data, given Gaussian noise. We covered the derivation and the deep connection between least squares and probability in Least Squares: Your First Step into Chemometrics.

Measuring model quality

After fitting the line, you need to know: is it actually any good? Two complementary metrics answer this question.

Coefficient of determination (R-squared)

The coefficient of determination $R^{2}$ measures the fraction of the total variability in $y$ that is explained by the linear relationship with $x$ :

R^{2} = 1 - \frac{S S _{res}}{S S _{tot}} = 1 - \frac{\sum _{i = 1}^{n} ( y _{i} - y ^ _{i} ) ^{2}}{\sum _{i = 1}^{n} ( y _{i} - y ˉ ) ^{2}}

where:

$S S_{res} = \sum_{i = 1}^{n} (y_{i} - \overset{y}{^}_{i})^{2}$ is the residual sum of squares -- the variation the model fails to explain
$S S_{tot} = \sum_{i = 1}^{n} (y_{i} - \overset{y}{ˉ})^{2}$ is the total sum of squares -- the total variation in the data

$R^{2}$ ranges from 0 to 1. An $R^{2}$ of 0.998 (common for well-behaved Beer's Law calibrations) means the linear model explains 99.8% of the variance in absorbance. An $R^{2}$ of 0.75 means 25% of the variance is unexplained -- the model captures the general trend but misses something important.

R-squared can be misleading

A high $R^{2}$ does not prove that the relationship is linear. A quadratic curve fitted with a straight line can still give $R^{2} > 0.95$ if the curvature is modest. Always plot your data and examine the residuals. Anscombe's quartet (1973) is the classic illustration of how four very different datasets can produce nearly identical regression statistics.

Root mean square error (RMSE)

While $R^{2}$ is dimensionless, the RMSE tells you the model's prediction error in the same units as the response:

RMSE = \frac{1}{n} i = 1 \sum n (y_{i} - \overset{y}{^}_{i})^{2}

If you are calibrating a UV-Vis spectrometer for copper determination and your RMSE is 0.003 absorbance units, your predictions are typically within 0.003 AU of the true value. This is directly interpretable: you can convert it to concentration units using the slope, giving you the method's practical detection capability.

Residual analysis

Numbers alone are not enough. The single most informative diagnostic is the residual plot: plot the residuals $e_{i} = y_{i} - \overset{y}{^}_{i}$ against the fitted values $\overset{y}{^}_{i}$ (or against $x_{i}$ ).

A well-behaved regression produces residuals that look like random scatter around zero -- no trends, no patterns, no funnels. If you see structure in the residuals, something is wrong:

A curved pattern indicates the true relationship is nonlinear. Consider a quadratic term or a transformation.
A funnel shape (variance increasing with $x$ ) indicates heteroscedasticity. Weighted regression or a variance-stabilizing transformation may help.
Clusters or runs of positive/negative residuals suggest autocorrelation or a missing variable.
One or two points far from zero flag potential outliers worth investigating.

From simple to matrix form

Simple linear regression handles one predictor. But in chemistry, we often have multiple predictors -- for example, absorbances at several wavelengths, or concentrations of several analytes. The matrix formulation generalizes the model to any number of predictors.

Write the model for all $n$ observations at once:

y = X β + ε

where:

$y$ is the $n \times 1$ vector of responses
$X$ is the $n \times p$ design matrix (first column is all ones for the intercept, remaining columns are predictors)
$β$ is the $p \times 1$ vector of coefficients
$ε$ is the $n \times 1$ vector of errors

For simple linear regression with one predictor, the design matrix looks like:

X = 11 ⋮ 1 x_{1} x_{2} ⋮ x_{n}, β = (β_{0} β_{1})

The least squares solution in matrix form is the famous normal equation:

\hat{β} = (X^{T} X)^{- 1} X^{T} y

This single expression replaces all the summation formulas from the simple case. It works for any number of predictors $p$ , which is why it forms the basis for multiple linear regression, ridge regression, and ultimately the entire family of linear methods in chemometrics.

When does the matrix inverse exist?

The matrix $X^{T} X$ must be invertible for the normal equation to have a unique solution. This fails when predictors are perfectly collinear -- one predictor is an exact linear combination of others. In spectroscopy, where hundreds of wavelengths are highly correlated, $X^{T} X$ is often nearly singular. This is precisely the problem that ridge regression, principal component regression, and PLS were invented to solve.

Real chemistry examples

Linear regression appears throughout analytical chemistry. Here are three common scenarios.

Scenario: Determine copper concentration in water samples using UV-Vis spectroscopy at 810 nm.

You prepare six standards and measure their absorbances:

Concentration (mg/L)	Absorbance
0.0	0.002
2.0	0.098
5.0	0.243
10.0	0.491
20.0	0.985
50.0	2.461

Beer's Law predicts a linear relationship: $A = ε ℓ c$ , where $ε$ is the molar absorptivity, $ℓ$ is the path length, and $c$ is the concentration. Fitting $A = β_{0} + β_{1} c$ gives:

$\hat{β}_{0} = 0.001$ (very close to zero, as expected for a good blank)
$\hat{β}_{1} = 0.0492$ AU per mg/L (the method sensitivity)
$R^{2} = 0.9999$

To predict an unknown sample with absorbance 0.370: $c = (0.370 - 0.001) /0.0492 = 7.50$ mg/L.

Scenario: Quantify polycyclic aromatic hydrocarbons (PAHs) in river water using fluorescence spectroscopy.

Fluorescence intensity is proportional to concentration at low levels, but the relationship can deviate at higher concentrations due to inner filter effects. A calibration using standards at 0, 1, 5, 10, 25, and 50 ng/mL might give:

$R^{2} = 0.9994$ for the range 0--25 ng/mL (linear)
$R^{2} = 0.9823$ for the range 0--50 ng/mL (slight curvature appears)

This illustrates a critical point: always check your calibration range. Linear regression assumes a linear relationship. If you push beyond the linear range, the model breaks down and predictions become biased. The residual plot will show the curvature clearly.

Scenario: Validate a moisture analyzer by comparing its readings to reference oven-drying results.

You analyze 20 grain samples using both methods and fit:

Moisture_{analyzer} = β_{0} + β_{1} \cdot Moisture_{oven}

If the analyzer is perfectly accurate, you expect $β_{0} = 0$ and $β_{1} = 1$ . Statistical tests on the intercept and slope (using their standard errors) tell you whether the analyzer has a systematic bias ( $β_{0} \neq = 0$ ) or a proportional error ( $β_{1} \neq = 1$ ). This is method comparison -- one of the most common uses of linear regression in analytical chemistry beyond calibration.

Common pitfalls

Linear regression is robust and well-understood, but it can fail silently if its assumptions are violated. These are the problems you are most likely to encounter in chemical data.

Nonlinearity. The most common failure. Beer's Law is linear only over a limited concentration range; at high concentrations, deviations appear due to molecular interactions, stray light, or detector saturation. If your residual plot shows a curve, the fix is usually to narrow the calibration range, add a quadratic term, or use a nonlinear model.

Outliers. A single outlier can dramatically shift the fitted line, especially with small sample sizes. Outliers in calibration data may come from preparation errors (wrong dilution), instrument glitches, or transcription mistakes. Identify them through the residual plot, investigate them (don't just delete them blindly), and consider robust regression methods if outliers are frequent.

Heteroscedasticity. In many spectroscopic measurements, noise increases with signal intensity (think shot noise, which scales as $n$ ). This means the variance of $ε$ is not constant across the calibration range -- a violation of the standard OLS assumption. The remedy is weighted least squares, where each point is weighted inversely to its variance, giving less noisy points more influence.

Extrapolation. A calibration curve is only valid within the range of your standards. Predicting beyond this range (extrapolation) is dangerous because you have no data to confirm the relationship holds. A calibration built from 0--50 mg/L standards should not be used to predict a sample at 200 mg/L.

Collinearity (in multiple regression). When two or more predictors are highly correlated -- as they almost always are in spectroscopic data -- the matrix $X^{T} X$ becomes nearly singular. The estimated coefficients become unstable: large in magnitude, sensitive to small changes in the data, and difficult to interpret. This is the collinearity problem, and it is the principal motivation for regularized methods like ridge regression and latent-variable methods like PCR and PLS.

When to use (and when not to)

Good applications

Calibration curves The primary use case in analytical chemistry. Beer's Law, electrode calibrations, instrument validation.

Method comparison Comparing two measurement techniques (e.g., reference vs. rapid method).

Simple trend analysis Relating a response to a single continuous predictor with a roughly linear relationship.

Teaching The foundation for understanding all regression methods. Learn this well before moving to more complex techniques.

Better alternatives exist for

High-dimensional data (many wavelengths) Spectroscopic data typically has hundreds of correlated predictors. Use PCR or PLS instead.

Nonlinear relationships If the calibration curve bends, consider polynomial regression, splines, or nonlinear models.

Noisy data with outliers Robust regression (e.g., iteratively reweighted least squares) handles outliers better than OLS.

Small sample sizes with many predictors When $p \geq n$ , OLS has no unique solution. Ridge, LASSO, or PLS are necessary.

Code implementation

Here are complete, working implementations of simple linear regression for calibration in three languages.

import numpy as np
import matplotlib.pyplot as plt

def linear_regression(x, y):
    """
    Fit a simple linear regression model y = beta_0 + beta_1 * x.

    Parameters
    ----------
    x : array-like
        Predictor values (e.g., known concentrations)
    y : array-like
        Response values (e.g., measured absorbances)

    Returns
    -------
    beta_0 : float
        Intercept
    beta_1 : float
        Slope
    r_squared : float
        Coefficient of determination
    rmse : float
        Root mean square error
    """
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)
    n = len(x)

    # Sample means
    x_mean = np.mean(x)
    y_mean = np.mean(y)

    # Slope and intercept via normal equations
    beta_1 = np.sum((x - x_mean) * (y - y_mean)) / np.sum((x - x_mean)**2)
    beta_0 = y_mean - beta_1 * x_mean

    # Predictions and residuals
    y_pred = beta_0 + beta_1 * x
    residuals = y - y_pred

    # Model quality metrics
    ss_res = np.sum(residuals**2)
    ss_tot = np.sum((y - y_mean)**2)
    r_squared = 1 - ss_res / ss_tot
    rmse = np.sqrt(ss_res / n)

    return beta_0, beta_1, r_squared, rmse


# --- Example: Beer's Law calibration for copper ---
# Known concentrations (mg/L) and measured absorbances
conc = np.array([0.0, 2.0, 5.0, 10.0, 20.0, 50.0])
absorbance = np.array([0.002, 0.098, 0.243, 0.491, 0.985, 2.461])

# Fit the calibration model
b0, b1, r2, rmse = linear_regression(conc, absorbance)
print(f"Intercept:  {b0:.4f}")
print(f"Slope:      {b1:.4f} AU per mg/L")
print(f"R-squared:  {r2:.6f}")
print(f"RMSE:       {rmse:.4f} AU")

# Predict unknown sample
unknown_abs = 0.370
unknown_conc = (unknown_abs - b0) / b1
print(f"\nUnknown sample: A = {unknown_abs} -> c = {unknown_conc:.2f} mg/L")

# --- Visualization ---
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Calibration plot
x_line = np.linspace(0, 55, 100)
y_line = b0 + b1 * x_line

axes[0].scatter(conc, absorbance, s=80, zorder=5, label='Standards')
axes[0].plot(x_line, y_line, 'r-', linewidth=2, label=f'Fit (R² = {r2:.4f})')
axes[0].scatter(unknown_conc, unknown_abs, s=100, marker='*',
                color='green', zorder=5, label=f'Unknown ({unknown_conc:.1f} mg/L)')
axes[0].set_xlabel('Concentration (mg/L)')
axes[0].set_ylabel('Absorbance (AU)')
axes[0].set_title('Beer\'s Law Calibration')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Residual plot
y_pred = b0 + b1 * conc
residuals = absorbance - y_pred

axes[1].scatter(y_pred, residuals, s=80, zorder=5)
axes[1].axhline(y=0, color='red', linestyle='--', alpha=0.7)
axes[1].set_xlabel('Fitted values (AU)')
axes[1].set_ylabel('Residuals (AU)')
axes[1].set_title('Residual Plot')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

function [beta0, beta1, r_squared, rmse] = linear_regression(x, y)
    % Fit simple linear regression: y = beta0 + beta1 * x
    %
    % Parameters:
    %   x - Predictor values (column vector)
    %   y - Response values (column vector)
    %
    % Returns:
    %   beta0     - Intercept
    %   beta1     - Slope
    %   r_squared - Coefficient of determination
    %   rmse      - Root mean square error

    x = x(:); y = y(:);  % Ensure column vectors
    n = length(x);

    % Sample means
    x_mean = mean(x);
    y_mean = mean(y);

    % Slope and intercept
    beta1 = sum((x - x_mean) .* (y - y_mean)) / sum((x - x_mean).^2);
    beta0 = y_mean - beta1 * x_mean;

    % Predictions and quality metrics
    y_pred = beta0 + beta1 * x;
    residuals = y - y_pred;

    ss_res = sum(residuals.^2);
    ss_tot = sum((y - y_mean).^2);
    r_squared = 1 - ss_res / ss_tot;
    rmse = sqrt(ss_res / n);
end

% --- Beer's Law calibration for copper ---
conc = [0.0, 2.0, 5.0, 10.0, 20.0, 50.0];
absorbance = [0.002, 0.098, 0.243, 0.491, 0.985, 2.461];

[b0, b1, r2, rmse_val] = linear_regression(conc, absorbance);

fprintf('Intercept:  %.4f\n', b0);
fprintf('Slope:      %.4f AU per mg/L\n', b1);
fprintf('R-squared:  %.6f\n', r2);
fprintf('RMSE:       %.4f AU\n', rmse_val);

% Predict unknown
unknown_abs = 0.370;
unknown_conc = (unknown_abs - b0) / b1;
fprintf('\nUnknown: A = %.3f -> c = %.2f mg/L\n', unknown_abs, unknown_conc);

% --- Visualization ---
figure('Position', [100 100 1000 400]);

% Calibration plot
subplot(1, 2, 1);
x_line = linspace(0, 55, 100);
y_line = b0 + b1 * x_line;

scatter(conc, absorbance, 80, 'b', 'filled'); hold on;
plot(x_line, y_line, 'r-', 'LineWidth', 2);
scatter(unknown_conc, unknown_abs, 120, 'g', 'p', 'filled');
xlabel('Concentration (mg/L)');
ylabel('Absorbance (AU)');
title(sprintf('Beer''s Law Calibration (R^2 = %.4f)', r2));
legend('Standards', 'Fit', 'Unknown', 'Location', 'northwest');
grid on;

% Residual plot
subplot(1, 2, 2);
y_pred = b0 + b1 * conc;
residuals = absorbance - y_pred;

scatter(y_pred, residuals, 80, 'b', 'filled');
yline(0, '--r', 'LineWidth', 1.5);
xlabel('Fitted values (AU)');
ylabel('Residuals (AU)');
title('Residual Plot');
grid on;

linear_regression <- function(x, y) {
  # Fit simple linear regression: y = beta0 + beta1 * x
  #
  # Parameters:
  #   x - Predictor values (numeric vector)
  #   y - Response values (numeric vector)
  #
  # Returns:
  #   Named list with beta0, beta1, r_squared, rmse

  n <- length(x)

  # Sample means
  x_mean <- mean(x)
  y_mean <- mean(y)

  # Slope and intercept
  beta1 <- sum((x - x_mean) * (y - y_mean)) / sum((x - x_mean)^2)
  beta0 <- y_mean - beta1 * x_mean

  # Predictions and quality metrics
  y_pred <- beta0 + beta1 * x
  residuals <- y - y_pred

  ss_res <- sum(residuals^2)
  ss_tot <- sum((y - y_mean)^2)
  r_squared <- 1 - ss_res / ss_tot
  rmse <- sqrt(ss_res / n)

  list(beta0 = beta0, beta1 = beta1,
       r_squared = r_squared, rmse = rmse,
       fitted = y_pred, residuals = residuals)
}

# --- Beer's Law calibration for copper ---
conc <- c(0.0, 2.0, 5.0, 10.0, 20.0, 50.0)
absorbance <- c(0.002, 0.098, 0.243, 0.491, 0.985, 2.461)

fit <- linear_regression(conc, absorbance)

cat(sprintf("Intercept:  %.4f\n", fit$beta0))
cat(sprintf("Slope:      %.4f AU per mg/L\n", fit$beta1))
cat(sprintf("R-squared:  %.6f\n", fit$r_squared))
cat(sprintf("RMSE:       %.4f AU\n", fit$rmse))

# Predict unknown
unknown_abs <- 0.370
unknown_conc <- (unknown_abs - fit$beta0) / fit$beta1
cat(sprintf("\nUnknown: A = %.3f -> c = %.2f mg/L\n",
            unknown_abs, unknown_conc))

# --- Visualization ---
par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))

# Calibration plot
x_line <- seq(0, 55, length.out = 100)
y_line <- fit$beta0 + fit$beta1 * x_line

plot(conc, absorbance, pch = 16, cex = 1.5, col = "steelblue",
     xlab = "Concentration (mg/L)", ylab = "Absorbance (AU)",
     main = sprintf("Beer's Law Calibration (R² = %.4f)", fit$r_squared))
lines(x_line, y_line, col = "red", lwd = 2)
points(unknown_conc, unknown_abs, pch = 8, cex = 2, col = "forestgreen")
legend("topleft",
       legend = c("Standards", "Fit", "Unknown"),
       col = c("steelblue", "red", "forestgreen"),
       pch = c(16, NA, 8), lwd = c(NA, 2, NA),
       pt.cex = c(1.5, NA, 2))
grid()

# Residual plot
plot(fit$fitted, fit$residuals, pch = 16, cex = 1.5, col = "steelblue",
     xlab = "Fitted values (AU)", ylab = "Residuals (AU)",
     main = "Residual Plot")
abline(h = 0, col = "red", lty = 2, lwd = 2)
grid()

Use library functions in practice

The implementations above are for learning. In real work, use the optimized, numerically stable implementations provided by your language:

Python: scipy.stats.linregress() or sklearn.linear_model.LinearRegression()
MATLAB: fitlm() or the backslash operator X \ y
R: lm(y ~ x)

These handle edge cases (near-singular matrices, numerical precision) that a naive implementation does not.

Next steps

Simple linear regression is the foundation upon which almost every chemometric method is built. The concepts introduced here -- least squares fitting, the normal equations, residual diagnostics, the matrix formulation -- reappear in increasingly sophisticated forms throughout the field.

The natural next steps from here:

Multiple Linear Regression extends the model to several predictors ( $y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + \dots$ ). Essential when your response depends on more than one variable, but runs into trouble when predictors are correlated.
Ridge Regression adds a penalty term to the least squares criterion to stabilize coefficient estimates when predictors are collinear. The first regularization method most chemometricians encounter.
LASSO Regression uses a different penalty that can shrink some coefficients exactly to zero, performing variable selection. Useful when you suspect only a few wavelengths matter.
Principal Component Regression compresses the predictor matrix into a few orthogonal components before regression. A classic approach for spectroscopic calibration.
Partial Least Squares finds components that simultaneously explain variance in both $X$ and $y$ . The most widely used regression method in chemometrics, and the one you will encounter most often in spectroscopic applications.

For a deeper treatment of the least squares criterion and its probabilistic foundations, see Least Squares: Your First Step into Chemometrics.

References

[1] Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246--263.

[2] Legendre, A.-M. (1805). Nouvelles methodes pour la determination des orbites des cometes. Firmin Didot, Paris.

[3] Gauss, C. F. (1809). Theoria motus corporum coelestium. Perthes et Besser, Hamburg.

[4] Pearson, K. (1896). Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society A, 187, 253--318.

[5] Beer, A. (1852). Bestimmung der Absorption des rothen Lichts in farbigen Flussigkeiten. Annalen der Physik und Chemie, 162(5), 78--88.

[6] Martens, H., & Naes, T. (1989). Multivariate Calibration. Wiley.

[7] Brereton, R. G. (2003). Chemometrics: Data Analysis for the Laboratory and Chemical Plant. Wiley.

[8] Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27(1), 17--21.

[9] Mark, H., & Workman, J. (2007). Chemometrics in Spectroscopy. Academic Press.

[10] Draper, N. R., & Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley.

Linear Regression

Good applications

Better alternatives exist for

On this page