Cloudflare Reinvents Cache to Shield Sites From AI Bots
With AI bot traffic set to surpass human usage by 2027, Cloudflare is deploying a dual-layer cache architecture to protect performance and origin servers.
On April 2, 2026, Cloudflare detailed a fundamental architectural shift in its edge network to manage the exponential growth of agentic traffic. Outlined in their rethinking cache for the AI era research, the company now handles over 10 billion automated requests per week. For developers managing high-traffic domains or building applications that rely on scraping, the way edge networks serve content is structurally changing.
The Traffic Asymmetry
Automated requests now account for 32% of all traffic across Cloudflare’s infrastructure. Within that segment, AI crawlers generate 80% of self-identified bot traffic. Traditional caching relies on hit rates driven by human behavior, where many users request the same popular assets. Web browsers also utilize local session management and side-caching to reduce server round-trips.
Machine clients bypass these mechanisms. They execute sequential, parallel scans of rarely visited pages. Because every request hits the CDN or the origin directly, these patterns actively evict popular human-facing content from edge nodes. The load disparity is massive. A human might visit five pages to complete a task, while autonomous AI agents request up to 5,000 sites to execute the same logic.
This scaling factor has tangible infrastructure costs. Wikimedia recently recorded a 50% surge in multimedia bandwidth driven entirely by bulk scraping. Platforms like Fedora and Diaspora experienced severe performance degradation for human users due to these parallel loads. Cloudflare projects that total AI bot traffic will eclipse human web usage by 2027.
Dual-Layer Cache Architecture
To protect origin servers without breaking agentic workflows, Cloudflare partnered with ETH Zurich to design a multi-tiered cache system. The architecture uses real-time machine learning algorithms to identify automated requests and route them away from standard delivery nodes.
The human tier remains on standard CDN Points of Presence (PoPs). This layer is strictly optimized for responsiveness and high cache hit rates. The AI tier operates as a separate infrastructure layer built for raw capacity. These specific caches tolerate higher latency, which is acceptable for asynchronous training data collection or retrieval-augmented generation pipelines. The network categorizes requests dynamically, routing workloads based on their identified purpose.
Industry analysts at WWT recently noted that this shift toward specialized high-performance architectures is necessary to handle agentic data mobility. Competitors like Bifrost are already attempting to capture this traffic by offering low-latency alternative networks that avoid managed proxy overhead.
New Edge Controls
The architectural split introduces specific infrastructure controls for site operators. Cloudflare implemented a specialized toolkit to manage automated access directly at the edge. The system includes a Pay Per Crawl feature integrated with Stripe, allowing domains to charge AI companies directly for data scraping.
Content delivery is also adapting to machine reading. Operators can deploy Markdown for Agents, serving a stripped-down, reduced-bandwidth version of a site when an automated crawler is detected. Administrators manage these policies through AI Crawl Control, which provides analytics and one-click blocking capabilities. This integrates with the existing AI Gateway for unified monitoring of AI applications and LLM provider rate-limiting.
If you operate heavily scraped domains or maintain web-crawling infrastructure, traditional cache hit rates will no longer reflect your actual origin load. Audit your server metrics specifically for sequential scans and implement explicit machine-readable endpoints to avoid aggressive throttling at the edge.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Use Amazon Polly's Bidirectional Streaming API
Learn how to use Amazon Polly’s new HTTP/2 bidirectional streaming to reduce latency in real-time conversational AI by streaming text and audio simultaneously.
Cloudflare Client-Side Security Now Open to All Users
Cloudflare expands its Client-Side Security suite to Pro and Business plans, using a cascading AI model to detect malicious scripts and supply chain attacks.
Cloudflare Ships Dynamic Workers for AI Code Execution
Cloudflare shipped Dynamic Workers, an isolate-based sandbox that starts in milliseconds and uses a fraction of container memory, now in open beta.
Nvidia GPUs Compromised by Root-Level Rowhammer Attacks
Researchers demonstrate GDDRHammer and GeForge exploits, using Nvidia GPU memory bit flips to gain full root control over host CPU systems.
Microsoft Releases MAI-Transcribe-1 to Rival Whisper
Microsoft AI unveils MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 to reduce reliance on OpenAI with high-efficiency, in-house foundational models.