4B cybersecurity model matches 8B competitor at half the size

CyberSecQwen-4B beats larger specialist on key benchmarks

A new 4-billion parameter cybersecurity model outperforms Cisco's 8B Foundation-Sec-Instruct model on cyber threat intelligence questions while matching its CVE-to-CWE classification accuracy. CyberSecQwen-4B scored 58.7% on CTI-MCQ (2,500 cybersecurity multiple choice questions) versus 50.0% for the Cisco model, and achieved 66.6% on CTI-RCM (1,000 CVE-to-CWE mapping tasks) compared to 68.5% (per independent evaluation using Cisco's published CTI-Bench protocol).

The model was trained on a single AMD MI300X GPU using Apache 2.0-licensed data: CVE-to-CWE mappings from MITRE/NVD records and synthetic defensive analyst Q&A. Training data was deduplicated against the evaluation set to prevent contamination. The team also trained a companion 2B model (Gemma4Defense-2B) using identical methods, which achieved similar results and confirms the approach works across model families.

Both models run on consumer hardware with 12GB+ VRAM and are released under Apache 2.0 license. The models use LoRA fine-tuning (r=64, alpha=64) on instruction-tuned base models rather than raw pre-trained checkpoints.

Local deployment solves three cybersecurity problems

Cybersecurity teams face unique constraints that make frontier model APIs unsuitable for many defensive workflows. Sensitive incident data, malware samples, and vulnerability disclosures cannot be sent to external APIs without creating breach risks. Mid-size SOCs process thousands of alerts daily, making per-call API costs prohibitive for routine tasks like CVE explanation or CWE classification.

Air-gapped environments in critical infrastructure, healthcare, and government require on-premises deployment. The 4B parameter count targets the sweet spot between capability and hardware requirements: meaningful performance improvement over general-purpose 4B models while fitting on widely available single-GPU systems.

The benchmarks focus on structured cybersecurity tasks rather than general reasoning. CTI-MCQ tests knowledge of attack patterns, controls, and threat actor behavior. CTI-RCM evaluates the practical skill of mapping vulnerability descriptions to MITRE's Common Weakness Enumeration categories, which drives patch prioritization decisions.

Narrow evaluation leaves deployment questions open

The 2,500-question evaluation covers core knowledge but does not test performance on messy real-world inputs: incomplete CVE descriptions, novel attack patterns, or adversarial prompts embedded in vulnerability reports. The authors acknowledge this gap and plan adversarial robustness testing.

Deployment options include direct inference via transformers (three lines of Python) or high-throughput serving via vLLM on AMD hardware. GGUF quantized versions are planned to enable mobile and edge deployment at approximately 2.5GB memory footprint.

The model is explicitly scoped for defensive tasks: CWE classification, threat intelligence Q&A, and triage assistance. It is not designed for exploit generation or autonomous security decisions. Teams should evaluate whether the CVE-focused training data matches their specific vulnerability management workflows before production deployment.

4B cybersecurity model matches 8B competitor at half the size

Our Take

Why it matters

Do this week

CyberSecQwen-4B beats larger specialist on key benchmarks

Local deployment solves three cybersecurity problems

Narrow evaluation leaves deployment questions open

Related stories

Wealth managers pivot to resilience as Q1 volatility rewards caution

Dutch pension advisers lack data access during WTP transition

Iress partners with Thoughtworks for wealth platform overhaul