PCA, t-SNE, and UMAP

Created
TagsBasic Concepts

PCA is a linear dimension reduction technique that seeks to maximize variance and preserves large pairwise distances. This can lead to poor visualization especially when dealing with non-linear manifold structures. Think of a manifold structure as any geometric shape like: cylinder, ball, curve, etc.

t-SNE preserves only small pairwise distances or local similarities

PCA is concerned with preserving large pairwise distances to maximize variance.

t-SNE (t-Distributed Stochastic Neighbor Embedding) and UMAP (Uniform Manifold Approximation and Projection) are both techniques used for dimensionality reduction and visualization of high-dimensional data, similar to PCA (Principal Component Analysis). Each has its unique characteristics and use cases:

PCA:

t-SNE:

UMAP:

Choosing Between PCA, t-SNE, and UMAP:

In summary, the choice between PCA, t-SNE, and UMAP depends on the specific goals of the analysis, the nature of the dataset, and the computational resources available. UMAP's balance of performance, scalability, and structure preservation makes it a versatile choice for many applications, though PCA's simplicity and efficiency or t-SNE's detailed local structure preservation might be preferred in certain contexts.