L1-PCA*: A Pure L1 Principal Component Analysis

by

J.P. Brooks, J.H. Dulá, and E.L. Boone

Abstract

The L1 norm has been applied in numerous variations of principal component analysis (PCA). L1-norm PCA is an attractive alternative to traditional L2-based PCA because it can impart robustness in the presence of outliers and is indicated for models where standard Gaussian assumptions about the noise may not apply. Of all the previously-proposed PCA schemes that recast PCA as an optimization problem involving the L1 norm, none provide globally optimal solutions in polynomial time. This paper proposes an L1-norm PCA procedure based on the efficient calculation of the optimal solution of the L1-norm best-fit hyperplane problem. We present a procedure called L1-PCA* based on the application of this idea that fits data to subspaces of successively smaller dimension, the orthogonal vectors of which are taken to be successive directions of minimum dispersion. The procedure is implemented and tested on a diverse problem suite. Our tests show that L1-PCA* is the indicated procedure in the presence of unbalanced outlier contamination.

Paper

The paper is published here:

click here

R Script

l1pcastar.r

Open the script in a text editor and follow instructions at the top of the file.

Data Sets

Toy example

Warning: The following two compressed files are at least 3.2GB each when uncompressed.

Laplacian noise (.bz2)

Gaussian noise (.bz2)

Date last modified: January 2, 2017.