Cloudflare Reinvents Cache to Shield Sites From AI Bots
With AI bot traffic set to surpass human usage by 2027, Cloudflare is deploying a dual-layer cache architecture to protect performance and origin servers.
On April 2, 2026, Cloudflare detailed a fundamental architectural shift in its edge network to manage the exponential growth of agentic traffic. Outlined in their rethinking cache for the AI era research, the company now handles over 10 billion automated requests per week. For developers managing high-traffic domains or building applications that rely on scraping, the way edge networks serve content is structurally changing.
The Traffic Asymmetry
Automated requests now account for 32% of all traffic across Cloudflare’s infrastructure. Within that segment, AI crawlers generate 80% of self-identified bot traffic. Traditional caching relies on hit rates driven by human behavior, where many users request the same popular assets. Web browsers also utilize local session management and side-caching to reduce server round-trips.
Machine clients bypass these mechanisms. They execute sequential, parallel scans of rarely visited pages. Because every request hits the CDN or the origin directly, these patterns actively evict popular human-facing content from edge nodes. The load disparity is massive. A human might visit five pages to complete a task, while autonomous AI agents request up to 5,000 sites to execute the same logic.
This scaling factor has tangible infrastructure costs. Wikimedia recently recorded a 50% surge in multimedia bandwidth driven entirely by bulk scraping. Platforms like Fedora and Diaspora experienced severe performance degradation for human users due to these parallel loads. Cloudflare projects that total AI bot traffic will eclipse human web usage by 2027.
Dual-Layer Cache Architecture
To protect origin servers without breaking agentic workflows, Cloudflare partnered with ETH Zurich to design a multi-tiered cache system. The architecture uses real-time machine learning algorithms to identify automated requests and route them away from standard delivery nodes.
The human tier remains on standard CDN Points of Presence (PoPs). This layer is strictly optimized for responsiveness and high cache hit rates. The AI tier operates as a separate infrastructure layer built for raw capacity. These specific caches tolerate higher latency, which is acceptable for asynchronous training data collection or retrieval-augmented generation pipelines. The network categorizes requests dynamically, routing workloads based on their identified purpose.
Industry analysts at WWT recently noted that this shift toward specialized high-performance architectures is necessary to handle agentic data mobility. Competitors like Bifrost are already attempting to capture this traffic by offering low-latency alternative networks that avoid managed proxy overhead.
New Edge Controls
The architectural split introduces specific infrastructure controls for site operators. Cloudflare implemented a specialized toolkit to manage automated access directly at the edge. The system includes a Pay Per Crawl feature integrated with Stripe, allowing domains to charge AI companies directly for data scraping.
Content delivery is also adapting to machine reading. Operators can deploy Markdown for Agents, serving a stripped-down, reduced-bandwidth version of a site when an automated crawler is detected. Administrators manage these policies through AI Crawl Control, which provides analytics and one-click blocking capabilities. This integrates with the existing AI Gateway for unified monitoring of AI applications and LLM provider rate-limiting.
If you operate heavily scraped domains or maintain web-crawling infrastructure, traditional cache hit rates will no longer reflect your actual origin load. Audit your server metrics specifically for sequential scans and implement explicit machine-readable endpoints to avoid aggressive throttling at the edge.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Run In-Loop Model Evaluations With olmo-eval
Learn how to set up olmo-eval to test large language model checkpoints during the training process using vLLM, LiteLLM, and Docker-based agent sandboxes.
Cloudflare Now Forces AI Bots to Only Scrape Canonical Content
The new Redirects for AI Training tool converts soft canonical tags into hard 301 redirects to stop AI crawlers from ingesting deprecated or duplicate data.
Shrinking Model VRAM by 22% with Cloudflare Unweight
Cloudflare's new Unweight system offers lossless, bit-exact LLM compression, saving 3GB of VRAM on 8B models without impacting output quality.
Cloudflare Client-Side Security Now Open to All Users
Cloudflare expands its Client-Side Security suite to Pro and Business plans, using a cascading AI model to detect malicious scripts and supply chain attacks.
Cloudflare Ships Skipper AI Agent and Town Lake Data Platform
Cloudflare launched Town Lake and the Skipper AI agent to consolidate massive internal data sprawl into a single SQL interface with natural language querying.