Brain scans confirm AI-written stories target specific regions

Black-box brain models become testable theories

Researchers at Microsoft Research, UC Berkeley, UC San Francisco, and Columbia University developed generative causal testing (GCT), a method that converts uninterpretable neural network predictions into readable hypotheses and then validates them experimentally. The work appears in Nature Neuroscience (peer-reviewed).

GCT works in two stages. First, an LLM distills the phrases that most strongly drive a predictive model's response to a given brain region into a short verbal explanation: "food preparation," "location names," or "dialogue between people." Second, an LLM writes new stories engineered to match that explanation. Three subjects read these synthetic stories in an fMRI scanner. If the targeted region activated significantly above baseline, the explanation passed a causal test, not merely a correlation.

The method validated itself on well-studied regions. When GCT's generated stories for "locations" were presented, the place-processing regions RSC, OPA, and PPA lit up as expected. A candidate explanation for "food preparation" activated ventral occipital cortex near the fusiform face area. Across all three subjects, synthetic stories reliably drove their target regions above baseline, confirming the method's core logic.

Beyond known selectivity, GCT teased apart three neighboring place-processing regions long treated as functionally equivalent. The retrosplenial cortex (RSC) responds more strongly to proper noun location names (Tokyo, Connecticut) rather than generic location language. Parahippocampal place area (PPA) and occipital place area (OPA) showed distinct patterns only when differential stimuli were designed to isolate them.

The method also surfaced previously unmapped prefrontal micro-regions. By scanning candidate locations and retaining only the most stable predictive models, GCT identified regions selective for dialogue (words like "said" or "told"), clock times ("one o'clock"), and numeric measurements ("50 feet"). These distinctions emerged because the framework could propose a hypothesis and test it within a single study.

Closing the gap between prediction and understanding

For the past decade, LLMs have become the most accurate tools available for predicting human brain responses to language. Feed an LLM the same story a person hears in an fMRI scanner, and its internal representations predict individual patches of cortex activity with high fidelity. But success brought a cost: predictive models contain millions of inscrutable parameters with no direct path to scientific interpretation.

A model that predicts brain activity tells you that a region responds to language. It does not tell you whether it encodes food, places, numbers, faces, or abstract concepts. This prediction-explanation gap has become a central problem in computational neuroscience as black-box models spread across the field.

GCT inverts the problem. Instead of accepting that accuracy must come at the cost of readability, the method uses the model to propose a theory, then closes the loop with a physical experiment. The result is a framework that translates uninterpretable models back into the currency of science: concise, testable hypotheses that can be confirmed or refuted.

The broader implication extends beyond neuroscience. Any domain where powerful predictive models have outrun human understanding faces the same dilemma. GCT demonstrates that data-driven models need not end inquiry; they can be distilled into readable theory and checked against reality by generating new experiments on demand.

How neuroscience teams should validate their models

If you work in language neuroscience and rely on LLM-based predictive models, audit the phrases driving activity in regions you care about. Use an LLM to summarize those phrases into a candidate explanation. Then design a small targeted-stimulus study with those explanations as your hypotheses before scaling to larger experiments. This approach saves time by filtering explanations that fail to replicate in new data.

The code for GCT is available on GitHub. The method scales to grid-scanning for unmapped regions, making it a tool for discovery rather than just validation. If your predictive models are stable across subjects, GCT's explanations will be more reliable; use model stability as a filter for which hypotheses to test first.

Brain scans confirm AI-written stories target specific regions

Our Take

Why it matters

Do this week

Black-box brain models become testable theories

Closing the gap between prediction and understanding

How neuroscience teams should validate their models

Related stories

Seal failures cause batch recalls—here's what machinery standards prevent

Generic sildenafil costs £2.50 per tablet vs £9.50 for Viagra

GemPharmatech builds mouse models to cut neurology drug failures