CyberSecQwen-4B Defeats Cisco 8B on CTI-MCQ Benchmark
Team athena19 fine-tuned a 4-billion parameter model on a single AMD MI300X GPU that outperforms Cisco's 8B model for defensive cyber threat intelligence.
Team “athena19” has released CyberSecQwen-4B, a specialized 4-billion parameter language model fine-tuned specifically for defensive Cyber Threat Intelligence (CTI). Developed during the AMD Developer Hackathon hosted by lablab.ai, the release demonstrates how targeted fine-tuning on modern open-weights models can surpass the performance of larger enterprise-backed equivalents. If you build internal threat analysis tools, this introduces a highly capable option for processing sensitive intelligence workloads locally without relying on external APIs.
Benchmark Performance
The model establishes a new baseline for small-parameter cyber intelligence. In testing against the CTI-MCQ (Multiple Choice Questions for Cyber Threat Intelligence) benchmark, CyberSecQwen-4B successfully outscored Cisco’s 8B Foundation-Sec-Instruct.
| Model | Parameter Count | CTI-MCQ Performance Gap |
|---|---|---|
| CyberSecQwen-4B | 4.0B | +8.7 pp |
| Cisco 8B Foundation-Sec-Instruct | 8.0B | Baseline |
Achieving an 8.7 percentage point improvement over a model twice its size highlights the efficiency of domain-specific fine-tuning. The base model, Alibaba Cloud’s Qwen3-4B, uses a hybrid architecture combining Gated Delta Networks and Gated Attention. It natively supports a 32,768-token context length, which is critical for processing verbose security logs and lengthy incident reports. Extensibility techniques like YaRN allow developers to stretch this context window further for more demanding analysis.
Local Deployment Constraints
A core objective of the project is ensuring defensive analysts can process classified or sensitive telemetry on edge devices. Within 15 hours of the model’s release, community contributor mradermacher published quantized GGUF formats. If you apply standard quantization techniques, you can deploy these weights in environments constrained to 8–12GB of RAM.
Performance holds up well on consumer hardware. Early telemetry indicates the model achieves 15 tokens per second on modern mobile devices. This throughput makes it viable to integrate the model directly into local security tooling rather than depending on centralized network inferences. If your infrastructure team is evaluating local LLM deployments, the memory footprint of a quantized 4B model removes the need for dedicated AI workstation hardware.
Training Infrastructure
The development process provides a clear template for fine-tuning small models on alternative hardware stacks. The athena19 team utilized a single AMD Instinct MI300X GPU provisioned through the AMD Developer Cloud. They leveraged the open-source AMD ROCm software stack to complete the training.
The project was submitted under Track 2 of the hackathon, which focused on fine-tuning using AMD hardware. Participants operated within a strict compute budget, utilizing $100 in developer cloud credits to access the MI300X instances. This constraint proves that building a production-ready CTI specialist model no longer requires massive compute clusters or extensive financial resources.
For security operations centers, the availability of CyberSecQwen-4B alters the deployment strategy for automated threat analysis. Instead of routing sensitive logs to proprietary cloud models, evaluate integrating this 4B model directly into your local SIEM platforms to ensure data residency remains completely under your control.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Fine-Tune Qwen3 on AMD MI300X Using ROCm
Learn how to configure ROCm 6.1 environment variables and use the Hugging Face stack to fine-tune Qwen3-1.7B on AMD hardware without CUDA.
EMO Pretraining Decouples Mixture-of-Experts Subsets
AI2 and UC Berkeley researchers introduced EMO, a pretraining constraint that groups MoE experts by semantic domain to allow independent subnet deployment.
IBM Granite 4.1 Pushes Dense 8B Model Past Previous 32B MoE
IBM released the Granite 4.1 open-source model family featuring dense text architectures, a 512K context window, and specialized vision and speech variants.
What Is Quantization in AI?
Quantization shrinks AI models by reducing numerical precision. Here's how it works, what formats exist, and how to choose the right tradeoff between size, speed, and quality.
How to Build a Domain-Specific Embedding Model
Learn NVIDIA's recipe for fine-tuning a domain-specific embedding model in hours using synthetic data, hard negatives, BEIR, and NIM.