New method lets you interpret protein AI models without exploding feature counts

Standard interpretability fails on protein architecture

Foundation models for structural biology now predict protein and ligand structures with high accuracy. The problem: no one knows which internal features drive those predictions. Sparse autoencoders (SAEs), the current tool for understanding transformer embeddings, don't work on pairformer-style architectures. When you apply a standard SAE to pairwise tensors, you hit a quadratic explosion of features that obscures which patterns the model actually uses.

Researchers introduced PairSAE to solve this. The method summarizes pairwise tensors using N-mode singular value decomposition, collapsing them into token-wise interaction roles. Then a sparse autoencoder learns a shared set of token-level features that decode into both sequence and pair representations. The result: interpretable features without the computational blow-up.

On Boltz-2 activations for PLINDER protein-ligand complexes, the features align with UniProt annotations (independent validation against a reference database) and predict Boltz-2 affinity values. The work was accepted to the Machine Learning in Structural Biology workshop at a 2025 conference.

You need to know what your model learned before deploying it

Protein design is moving from academic exercise to biotech production pipeline. Models like Boltz-2 and AlphaFold3 now inform real candidate selection for wet-lab synthesis. If a model predicts a binding affinity but you don't know whether it learned actual biochemistry or just memorized training data, you waste time and reagents on bad leads.

PairSAE bridges that gap by mapping model internals back to structural concepts (residue interactions, binding motifs, domain contacts). It lets you audit the model's reasoning before committing to experiments. Prior interpretability methods either scaled poorly on these architectures or required manual inspection of millions of features.

Verify your model's understanding before the lab

If you run Boltz-2 or similar pairformer-based models in a protein design workflow, use PairSAE on a validation set of known protein-ligand complexes. Check whether the features the model emphasizes match the known interaction sites in UniProt or your internal annotations. If the model is silent on a binding site you know matters, that's a red flag for deployment.

The code details are in the full paper on arXiv. The approach assumes you have access to model activations, so it works best as an internal audit before you hand off predictions to the bench team. This is not a guarantee the model is right, but it tells you whether the model's confidence is grounded in interpretable structure or noise.

New method lets you interpret protein AI models without exploding feature counts

Our Take

Why it matters

Do this week

Standard interpretability fails on protein architecture

You need to know what your model learned before deploying it

Verify your model's understanding before the lab

Related stories

Non-observable states cut Markovian bandit regret near-logarithmic

Darts Adds Four Foundation Models in One Interface

RANSAC scoring removes the guesswork parameter