OncoAgent routes oncology cases through dual-tier LLMs locally

Researchers built a privacy-preserving oncology AI system

OncoAgent routes clinical queries through an additive complexity scorer to either a 9B parameter model for simple cases or a 27B model for complex presentations. The system runs entirely on AMD Instinct MI300X hardware with 192GB HBM3, eliminating cloud API dependencies that prevent hospital deployment due to patient data sovereignty requirements.

The complexity router assigns weighted scores based on cancer type (rare cancers +0.40 points), staging (Stage IV +0.25), mutation count (multiple mutations +0.30), and prior treatments (+0.10). Cases scoring above 0.5 route to the deeper reasoning model. A Stage IV pancreatic carcinoma case with KRAS and BRCA2 mutations scored 0.80, correctly routing to Tier 2 (per the research paper).

Training used QLoRA fine-tuning on 266,854 oncological cases from PMC patient reports, Asclepius medical QA data, and synthetic cases generated by Qwen 3.6-27B. The AMD MI300X hardware completed full-dataset fine-tuning in approximately 50 minutes, delivering 56× throughput acceleration over API-based generation (company-reported).

The retrieval pipeline grounds responses in 77 physician-grade NCCN and ESMO guidelines. Document relevance grading achieved 100% success rate with mean RAG confidence scores of 2.3+ after switching from Qwen 3.5 to Qwen 2.5 Instruct for the grading component (per the technical preprint).

On-premises deployment removes the primary adoption barrier

Most clinical AI systems fail hospital adoption because they require sending patient data to cloud APIs, violating HIPAA compliance policies and institutional data governance frameworks. OncoAgent's full on-premises deployment addresses this directly while maintaining clinical safety through deterministic validation layers.

The three-layer safety cascade runs formatting checks, rule-based scans for prohibited patterns, and LLM entailment verification before any output reaches clinicians. The system enforces mandatory human-in-the-loop interrupts for complex cases and low-confidence outputs, with fallback nodes returning clinical refusals rather than hallucinated recommendations.

Per-patient memory isolation using unique thread IDs prevents cross-contamination between clinical sessions while enabling multi-turn consultations within individual cases.

Evaluate hardware costs against API expenses

Hospital IT teams should calculate AMD MI300X procurement and operational costs against projected cloud API volumes. The 192GB HBM3 specification supports both model tiers simultaneously, but requires substantial upfront capital investment versus pay-per-query cloud alternatives.

Clinical teams should audit existing oncology decision support workflows to identify integration points where dual-tier routing could reduce cognitive load without disrupting established protocols. The complexity scoring system may require local calibration based on institutional case mix and specialist availability.

Privacy officers should review the Zero-PHI policy implementation and on-premises deployment architecture against current data governance requirements before pilot deployment approval.

OncoAgent routes oncology cases through dual-tier LLMs locally

Our Take

Why it matters

Do this week

Researchers built a privacy-preserving oncology AI system

On-premises deployment removes the primary adoption barrier

Evaluate hardware costs against API expenses

Related stories

Gartner warns data science skills gap threatens AI projects

Ex-CEO flags candidates who can start immediately as risky hires

Devil Wears Prada 2 showcases why toxic management fails