Ai Engineering 3 min read

Cloudflare Now Forces AI Bots to Only Scrape Canonical Content

The new Redirects for AI Training tool converts soft canonical tags into hard 301 redirects to stop AI crawlers from ingesting deprecated or duplicate data.

On April 17, 2026, Cloudflare released Redirects for AI Training to automatically route verified AI training crawlers away from deprecated web pages to current canonical versions. If you manage documentation or content platforms, this prevents large language models from scraping outdated technical instructions and generating hallucinated code for your users.

The Problem with Soft Directives

Standard web development relies on soft directives like HTML canonical tags and noindex rules to guide search engines. AI training bots frequently ignore these signals. Cloudflare analyzed traffic to its developer documentation in March 2026 and found that OpenAI crawled legacy Workers documentation 46,000 times. Anthropic and Meta bots recorded 3,600 and 1,700 hits on the same deprecated pages.

When training pipelines ingest legacy data, models output deprecated syntax. Cloudflare observed AI assistants recommending the outdated kv:key put syntax, which was replaced in Wrangler 3.60.0 by kv key put, directly because bots scraped archived documentation. This creates structural problems when trying to reduce AI hallucinations at the model level.

HTTP 301 Redirects for AI Training Bots

The new redirect feature stream-parses HTML response headers at the edge. When Cloudflare detects a non-self-referencing canonical tag, it interrupts the response for verified AI training crawlers like GPTBot, ClaudeBot, and Bytespider. The network returns an HTTP 301 redirect pointing to the canonical URL instead of rendering the page content.

Human visitors, standard search engines like Googlebot, and live AI search agents still receive the original page. This selective routing preserves historical archives for users who need legacy documentation while keeping deprecated code out of training datasets. The system also explicitly blocks cross-origin canonicals to prevent accidental routing to third-party domains.

Internal testing on Cloudflare documentation successfully redirected 100 percent of AI training crawler requests to canonical pages.

Configuration and Availability

The feature is available immediately on Cloudflare Pro, Business, and Enterprise plans at no additional cost. You can enable it through a single toggle in the Cloudflare dashboard under the AI Crawl Control section. For granular routing, developers can apply path-specific constraints using Configuration Rules and Cloudflare for SaaS.

Agent Infrastructure Updates

This release is part of Cloudflare’s 2026 Agents Week, which focuses on edge networking for multi-agent systems. Alongside the redirect tool, Cloudflare launched an Agent Readiness Score to validate site compatibility with standard AI authentication and formatting requirements. The company also updated Radar AI Insights to track HTTP status codes triggered specifically by agent traffic.

If you host software documentation, migrating from soft canonical tags to edge-level AI redirects ensures frontier models learn your current API syntax. Review your existing canonical architecture to ensure tags strictly reference current documentation, as automated 301s will now enforce these paths across major AI training pipelines.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading