Streaming Kernel PLS in bigPLSR: XX^T and Column-Chunked Variants

Frédéric Bertrand

Cedric, Cnam, Paris
frederic.bertrand@lecnam.net

2025-11-26

Overview

This vignette documents bigPLSR’s kernel PLS streaming backends for bigmemory::big.matrix inputs. We provide two complementary streaming strategies:

Both strategies produce the same model up to floating point round-off. Selection is automatic (see ?pls_fit) or can be forced via the option options(bigPLSR.kpls_gram = "rows" | "cols" | "auto").

Math sketch

Let X in R^{n x p}, Y in R^{n x m} be centered.

At component h, kernel-PLS uses the NIPALS-like fixed-point update

  1. Start with u in R^n (e.g., a column of Y).
  2. Compute a = X^T u.
  3. Normalize w = a / ||a||_2.
  4. Scores: t = X w.
  5. Loadings:
    • p = (X^T t)/(t^T t),
    • q = (Y^T t)/(t^T t).
  6. Deflate: X <- X - t p^T, Y <- Y - t q^T, and set u <- Y q.

Coefficients after H components are

beta = W (P^T W)^{-1} Q^T,

yhat = 1 * mu_Y + (x - mu_X) beta.

The row-chunked implementation keeps X on disk and performs steps (2) and (4) with two passes over row blocks:

Loadings p are accumulated precisely like Pass A but with t instead of u.

APIs

pls_fit() chooses the variant via options(bigPLSR.kpls_gram) or heuristics when "auto" is set (the default).

When to prefer each variant

References