OncoAgent is a clinical decision support system for oncology designed to be 100% open source and deployable on‑premises. Why does this matter today? Because it brings together a set of SOTA practices — multi‑agent, corrective RAG, QLoRA and deployment on AMD MI300X — with a strict Zero‑PHI policy, all built for hospitals and regulated environments.
What is OncoAgent
OncoAgent combines a dual LLM model architecture with a multi‑agent topology implemented in LangGraph. Its goal: give clinical recommendations anchored in professional guidelines (NCCN, ESMO) without leaking sensitive data to the cloud.
In numbers: QLoRA training on 266,854 cases (real and synthetic), two inference tiers (9B for fast triage, 27B for deep reasoning), and a knowledge base of 70+ guidelines. The stack runs natively on AMD Instinct MI300X under ROCm.
Dual architecture and multi‑agent design
Logic is split across 8 auditable nodes in LangGraph: Router → Ingestion → Corrective RAG → Specialist ↔ Critic → HITL Gate → Formatter → END. There's a reflection loop (max 2 iterations) and a mandatory fallback if something fails.
Model selection uses an additive complexity score:
S = w_cancer + w_stage + w_mutations + w_treatment
Summary table of factors and weights:
| Factor | Condition | Weight |
|---|---|---|
| Cancer type | Rare | +0.40 |
| Cancer type | Unknown primary | +0.30 |
| Stage | Stage IV | +0.25 |
| Stage | Stage III | +0.15 |
| Mutations | ≥2 identified | +0.30 |
| Mutations | Single | +0.15 |
| Prior treatment | Any keyword match | +0.10 |
Decision: if S >= 0.5 → Tier 2 (Qwen 3.6‑27B), if S < 0.5 → Tier 1 (Qwen 3.5‑9B). Real validation example: a Stage IV pancreatic case with KRAS + BRCA2 gave S = 0.80 and was correctly routed to Tier 2.
Corrective RAG and anti‑hallucination pipeline
OncoAgent doesn't use a simple RAG. It implements a four‑stage pipeline:
- Recall with a bi‑encoder (PubMedBERT) for broad sampling - top 15.
- Distance Gate with cosine filter - hard threshold = 0.10. If it fails, safe return: "Information not conclusive in the provided guidelines."
- Re‑ranking with cross‑encoder (MS‑MARCO MiniLM) - top 5.
- Context trimming to fit the token limit (6,000 characters).
It also incorporates HyDE to generate hypothetical texts and resolve critical medical synonyms (for example, "neoplasia pulmonar" vs "lung carcinoma"). The CRAG (Corrective RAG) node scores each document before sending it to the Specialist; irrelevant documents trigger automatic reformulation (max 1 retry).
Validation, Critic and HITL
Before reaching the clinician, the output passes through a deterministic Critic with three layers:
- Format: compliance with the OncoCoT schema.
- Safety: deterministic rules (no absolute doses without citation, interaction checks, etc.).
- Entailment: verification that the recommendation is supported by the RAG context.
If the Critic fails, its feedback is injected back into the Specialist for a retry (max 2). Any Tier 2 case or rag_confidence < 0.3 forces a HITL (human‑in‑the‑loop). The fallback returns the safe phrase already mentioned when recovery fails.
Training and optimizations on AMD MI300X
OncoAgent did QLoRA over the combined corpus (266,854 cases) using Unsloth and MI300X‑optimized kernels (192 GB HBM3). Two key points:
- Sequence packing (2048 tokens) and Unsloth kernels reduced an estimated 5‑hour fine‑tuning to ~50 minutes. This cut steps and sped up synthetic generation.
- NF4 4‑bit quantization via
BitsAndBytesand LoRA adapters applied to key projections (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj).
Essential configuration by tier:
| Parameter | Tier 1 (9B) | Tier 2 (27B) |
|---|---|---|
| Batch per device | 4 | 2 |
| Gradient accumulation | 4 | 8 |
| Effective batch | 16 | 16 |
| Learning rate | 2e-4 | 1e-4 |
| LoRA rank | 16 | 32 |
ROCm practices: use the correct tokenizer, avoid incompatible EOS injection and use bitsandbytes solutions specific to ROCm. Unsloth reduced VRAM use by ~60%, stabilizing consumption at ~64 GB.
Privacy and Zero‑PHI
The first step in Ingestion is a Zero‑PHI redaction module. Names, birth dates, MRNs, addresses and other identifiers are detected and replaced with clinically neutral markers. The redacted representation is stored in AgentState and the original is discarded. This prevents any LLM from seeing PHI.
The vector store is local (persistent ChromaDB) and embeddings use pritamdeka/S-PubMedBert-MS-MARCO, fine‑tuned for asymmetric medical search. The architecture has no mandatory API dependencies; a soft failover to Featherless.ai is allowed only if high availability is required.
Results and practical metrics
- Accelerated synthetic generation: ~6,800 cases/hr on MI300X vs ~120 cases/hr via API (56×). Synthetic rejection rate: 0.65%.
- Full corpus fine‑tuning time: ~50 minutes.
- CRAG post‑fix: grading success 100% and RAG confidence 2.3+ in uterine cancer tests.
- Stable training throughput: ~11.3 s/iteration; peak GPU utilization ~70%.
Limitations, risks and next validations
OncoAgent demonstrates technical and operational feasibility, but it's not a finished clinical product. Key limitations:
- ~36% of the corpus is synthetic; broad validation against judgments from certified oncologists is missing.
- Multilingual coverage and ESMO/non‑English guidelines support is partial for now.
- Tier 1 is at checkpoint‑1000; evaluation with clinical benchmarks (MedQA, oncology subsets like USMLE) is planned.
Does this mean AI replaces the doctor? No. It means you can build traceable, local assistance that reduces data‑leak and hallucination risks, but always with mandatory human review.
OncoAgent's contribution is more than numbers: it offers a reproducible blueprint to deploy responsible clinical AI in regulated settings, combining architectural decomposition, corrective RAG pipelines and deterministic safety layers. For hospital teams and R&D, it's a practical reference for moving lab research to infrastructure with data sovereignty.
Original source
https://huggingface.co/blog/lablab-ai-amd-developer-hackathon/oncoagent-official-paper
