Build AI Agent Search with Cloudflare AI Search
Learn how to use Cloudflare AI Search to simplify RAG pipelines with hybrid vector search, automated indexing, and native MCP support for AI agents.
Cloudflare’s new AI Search service simplifies retrieval-augmented generation by abstracting vector indexing, document parsing, and keyword search into a single managed primitive. Released during Cloudflare’s Agents Week on April 16, 2026, the tool allows developers to provision dedicated search indexes programmatically at runtime. You can create millions of separate dynamic instances for individual users or agents without redeploying code. This tutorial covers how to set up the new namespace binding, ingest documents, and configure hybrid search for your workloads.
Hybrid Search Architecture
AI Search runs semantic vector search and BM25-based keyword search in parallel. The service fuses the results of both operations automatically to improve overall retrieval accuracy. This eliminates the need to manage dual databases or write custom rank-fusion algorithms.
Every search instance includes automated storage and a built-in vector index. Files uploaded via the API are parsed and chunked automatically by the service. You do not need to configure chunk sizes or embedding models manually.
The service is designed for direct integration with AI agents. It works with Cloudflare Workers, the Agents SDK, and the Wrangler CLI, making it straightforward to connect to agent architectures that need retrieval capabilities.
Setup and Namespace Configuration
AI Search integrates directly with Cloudflare Workers, the Wrangler CLI, and the Agents SDK. The service relies on a specific binding system to manage access and permissions.
You connect your application using the ai_search_namespaces binding. This dynamic binding allows your code to create, list, and delete search instances on the fly.
This approach replaces the previous env.AI.autorag() API used when the service was known as AutoRAG. If you are migrating an older project, the legacy binding remains supported through Workers compatibility dates. You can review the updated wrangler.toml requirements in the official documentation.
Managing Dynamic Search Instances
Traditional vector databases typically require manual index creation and static connection strings. AI Search introduces programmatic instance generation. You can spin up isolated namespaces for specific tasks, individual users, or discrete tenants.
This architecture prevents data leakage across user sessions. An agent can create a temporary search instance for a single task, upload relevant files, and delete the instance when the job finishes.
The service supports querying across multiple search instances in a single API call. This is highly effective when building RAG applications that require both a global company knowledge base and a private, user-specific document store.
Data Ingestion and Crawling
Instances created on or after April 16, 2026, manage their own storage internally. You upload files directly to the target instance via a standard REST API. The underlying infrastructure handles all text extraction and vectorization.
You can still connect search instances to external R2 buckets for larger existing datasets. The system maps the bucket contents to the search index automatically.
AI Search also features a built-in website crawler powered by the Browser Run service. You can pass a URL to the API, and the system will fetch the page, render any dynamic content, extract the text, and index it. This allows AI agents to read live documentation or external web resources directly.
Metadata and Relevance Boosting
Documents ingested into AI Search support custom metadata tagging. You can attach specific attributes to any uploaded file. Common metadata fields include version numbers, author tags, timestamps, or language identifiers.
You can apply hard filters or relevance boosts based on this metadata at query time. A query can restrict results exclusively to documents tagged “v2.0”. Alternatively, you can apply a mathematical boost to specific tags, ensuring that newer documents appear higher in the fused search results. The query API documentation details the exact syntax for defining these query-time parameters.
Pricing and Limits
AI Search is currently in open beta and available on all Cloudflare plans. Usage of the core AI Search primitive is free within the designated beta limits. This includes the built-in storage features and the Browser Run web crawling capabilities.
The compute operations powering the service are billed separately. The system relies on Workers AI for generating embeddings and running inference. It uses AI Gateway for request management and caching.
| Limit | Workers Free | Workers Paid |
|---|---|---|
| AI Search instances per account | 100 | 5,000 |
| Files per instance | 100,000 | 1M (500K for hybrid search) |
| Queries per month | 20,000 | Unlimited |
Workers AI and AI Gateway usage are billed separately. Cloudflare plans to introduce unified pricing for AI Search as a single service after the beta period concludes.
Begin by configuring the ai_search_namespaces binding in your staging environment and verifying the built-in MCP endpoint connectivity. Test your metadata filtering and relevance boosting on a small dataset before deploying dynamic instances to production users.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Anthropic pushes MCP for production agents despite RCE flaws
Anthropic outlined a production roadmap for the Model Context Protocol, introducing dynamic tool discovery and programmable integrations for AI agents.
Boosting Kimi K2.5 Speed 3x via Cloudflare Infire Optimization
Cloudflare enhances Workers AI with the Infire engine, enabling extra-large models like Kimi K2.5 to run faster and more cost-effectively using Rust-based optimizations.
How to Deploy Enterprise MCP with Cloudflare Workers
Learn to secure and scale Model Context Protocol deployments using Cloudflare’s reference architecture for remote MCP servers and centralized portals.
SandboxAQ Routes Quantum Chemistry Simulations Through Claude
SandboxAQ has integrated its physics-grounded Large Quantitative Models with Anthropic's Claude via MCP, enabling natural language control of simulations.
Browser Run Migrates to Edge Containers for 4x Concurrency
Cloudflare rebuilt its Browser Run platform on native edge containers, quadrupling concurrency limits and halving latency for automated web tasks.