Our Take
The field has optimized the wrong problem: we've been chasing model performance while the pipeline breaks downstream at data quality and biological grounding.
Why it matters
Drug discovery teams investing heavily in AI infrastructure are now realizing their ROI ceiling is set by biology, not compute. This shift redirects where biotech companies and AI vendors should spend engineering effort.
Do this week
Biotech leaders: audit your data lineage and experimental validation pipelines before buying another model license—your bottleneck is upstream, not downstream.
The model wasn't the problem
Researchers and practitioners working on AI-assisted drug discovery are reporting a consistent finding: improvements in language models, graph neural networks, and molecular prediction systems are not translating to proportional gains in actual drug candidate success rates. The constraint isn't algorithmic. It's biological.
The issue centers on data quality and experimental grounding. Models trained on incomplete or poorly annotated biological datasets, or on proxies that don't capture real-world protein behavior, can produce confident predictions that fail in the lab. Similarly, models lack access to negative results, failed experiments, and contextual factors (assay conditions, cellular context, off-target effects) that would ground their predictions in reality.
This bottleneck was always present, but it became visible only once model performance crossed a threshold where it stopped being the limiting factor. Teams that scaled compute and refined architectures found themselves waiting for biologists to validate what the models suggested, or discovering that validation rates didn't improve.
Infrastructure spending is misaligned
For the past two years, biotech and pharmaceutical companies have prioritized AI infrastructure: hiring machine learning engineers, licensing foundation models, building in-house prediction platforms. The assumption was that better models → better predictions → faster drug discovery. The assumption was incomplete.
If the constraint is data quality and experimental validation, then capital allocation should shift. High-quality experimental datasets (especially negative data, which models rarely see), automated assay platforms that feed real-time results back into training loops, and tighter collaboration between computational and wet-lab teams become the actual competitive advantages. A mature model with access to proprietary, high-fidelity biological data will outperform a cutting-edge model trained on public data every time.
This also affects how companies evaluate AI vendors. Claims of "improved molecular property prediction" are only valuable if your team has the experimental infrastructure to validate and iterate. Without that, you're paying for performance gains you cannot use.
Where to focus next
If you lead a biotech discovery program, the immediate action is not to adopt the latest model. It's to map where predictions diverge from experimental outcomes and why. Are your models seeing assay artifacts? Are they trained on a biased subset of chemical space? Do they lack information about failure modes?
For AI teams embedded in drug discovery: build feedback loops with the experimental teams. The most valuable training data for your models isn't in published databases—it's in the negative results and edge cases your lab generates weekly. Systematize that capture and labeling.
For AI vendors selling into biotech: stop leading with benchmark improvements on public datasets. Lead with case studies showing how your models perform on client proprietary data, and what experimental validation looks like. The buyers who understand the real bottleneck will recognize the difference immediately.