Ai Engineering 3 min read

CyberSecQwen-4B Defeats Cisco 8B on CTI-MCQ Benchmark

Team athena19 fine-tuned a 4-billion parameter model on a single AMD MI300X GPU that outperforms Cisco's 8B model for defensive cyber threat intelligence.

Team “athena19” has released CyberSecQwen-4B, a specialized 4-billion parameter language model fine-tuned specifically for defensive Cyber Threat Intelligence (CTI). Developed during the AMD Developer Hackathon hosted by lablab.ai, the release demonstrates how targeted fine-tuning on modern open-weights models can surpass the performance of larger enterprise-backed equivalents. If you build internal threat analysis tools, this introduces a highly capable option for processing sensitive intelligence workloads locally without relying on external APIs.

Benchmark Performance

The model establishes a new baseline for small-parameter cyber intelligence. In testing against the CTI-MCQ (Multiple Choice Questions for Cyber Threat Intelligence) benchmark, CyberSecQwen-4B successfully outscored Cisco’s 8B Foundation-Sec-Instruct.

ModelParameter CountCTI-MCQ Performance Gap
CyberSecQwen-4B4.0B+8.7 pp
Cisco 8B Foundation-Sec-Instruct8.0BBaseline

Achieving an 8.7 percentage point improvement over a model twice its size highlights the efficiency of domain-specific fine-tuning. The base model, Alibaba Cloud’s Qwen3-4B, uses a hybrid architecture combining Gated Delta Networks and Gated Attention. It natively supports a 32,768-token context length, which is critical for processing verbose security logs and lengthy incident reports. Extensibility techniques like YaRN allow developers to stretch this context window further for more demanding analysis.

Local Deployment Constraints

A core objective of the project is ensuring defensive analysts can process classified or sensitive telemetry on edge devices. Within 15 hours of the model’s release, community contributor mradermacher published quantized GGUF formats. If you apply standard quantization techniques, you can deploy these weights in environments constrained to 8–12GB of RAM.

Performance holds up well on consumer hardware. Early telemetry indicates the model achieves 15 tokens per second on modern mobile devices. This throughput makes it viable to integrate the model directly into local security tooling rather than depending on centralized network inferences. If your infrastructure team is evaluating local LLM deployments, the memory footprint of a quantized 4B model removes the need for dedicated AI workstation hardware.

Training Infrastructure

The development process provides a clear template for fine-tuning small models on alternative hardware stacks. The athena19 team utilized a single AMD Instinct MI300X GPU provisioned through the AMD Developer Cloud. They leveraged the open-source AMD ROCm software stack to complete the training.

The project was submitted under Track 2 of the hackathon, which focused on fine-tuning using AMD hardware. Participants operated within a strict compute budget, utilizing $100 in developer cloud credits to access the MI300X instances. This constraint proves that building a production-ready CTI specialist model no longer requires massive compute clusters or extensive financial resources.

For security operations centers, the availability of CyberSecQwen-4B alters the deployment strategy for automated threat analysis. Instead of routing sensitive logs to proprietary cloud models, evaluate integrating this 4B model directly into your local SIEM platforms to ensure data residency remains completely under your control.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading