Visualizing High-Dimensional Embeddings with PCA and t-SNE

When working with high-dimensional embeddings—such as 256-dimensional vectors that lie on a hypersphere after training—it's often useful to project them into 2D or 3D space to inspect cluster structure or class separation.

Two widely used techniques for this purpose are Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE). PCA is a linear method that preserves global variance, while t-SNE is non-linear and better at revealing local clusters by modeling pairwise similarities.

Below is a Python implementation using scikit-learn and plotly to generate interactive 3D visualizations:

import os
import numpy as np
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import plotly.graph_objects as go

def plot_embeddings_3d(embeddings, output_file, group_ids=None):
    """
    Project high-dimensional embeddings into 3D using PCA and t-SNE.
    
    Args:
        embeddings: numpy array of shape (n_samples, n_features)
        output_file: base path for saving HTML plots
        group_ids: optional labels for coloring points
    """
    base_name = os.path.splitext(output_file)[0]
    
    # PCA projection
    print("Running PCA...")
    reducer_pca = PCA(n_components=3)
    proj_pca = reducer_pca.fit_transform(embeddings)
    
    # t-SNE projection
    print("Running t-SNE...")
    reducer_tsne = TSNE(n_components=3, perplexity=30, learning_rate=200, random_state=42)
    proj_tsne = reducer_tsne.fit_transform(embeddings)
    
    # Plot PCA
    fig1 = go.Figure(data=go.Scatter3d(
        x=proj_pca[:, 0],
        y=proj_pca[:, 1],
        z=proj_pca[:, 2],
        mode='markers',
        marker=dict(size=4, color=group_ids, opacity=0.7)
    ))
    fig1.update_layout(title="PCA Projection", scene=dict(
        xaxis_title="PC1",
        yaxis_title="PC2",
        zaxis_title="PC3"
    ))
    fig1.write_html(f"{base_name}_pca.html")
    
    # Plot t-SNE
    fig2 = go.Figure(data=go.Scatter3d(
        x=proj_tsne[:, 0],
        y=proj_tsne[:, 1],
        z=proj_tsne[:, 2],
        mode='markers',
        marker=dict(size=4, color=group_ids, opacity=0.7)
    ))
    fig2.update_layout(title="t-SNE Projection", scene=dict(
        xaxis_title="Dim 1",
        yaxis_title="Dim 2",
        zaxis_title="Dim 3"
    ))
    fig2.write_html(f"{base_name}_tsne.html")

This function accepts an (N, 256) embedding matrix and optionally a list of class or cluster labels. It outputs two interactive HTML files—one for each projection—allowing rotation and zoom to explore spatial relationhsips. Note that t-SNE results may vary between runs due to its stochastic nature; setting random_state ensures reproducibility.

Tags: Machine Learning dimensionality reduction Data Visualization PCA t-SNE

Posted on Sat, 20 Jun 2026 17:32:46 +0000 by kusal