Build AI Agent Search with Cloudflare AI Search
Learn how to use Cloudflare AI Search to simplify RAG pipelines with hybrid vector search, automated indexing, and native MCP support for AI agents.
Cloudflare’s new AI Search service simplifies retrieval-augmented generation by abstracting vector indexing, document parsing, and keyword search into a single managed primitive. Released during Cloudflare’s Agents Week on April 16, 2026, the tool allows developers to provision dedicated search indexes programmatically at runtime. You can create millions of separate dynamic instances for individual users or agents without redeploying code. This tutorial covers how to set up the new namespace binding, ingest documents, and configure hybrid search for your workloads.
Hybrid Search Architecture
AI Search runs semantic vector search and BM25-based keyword search in parallel. The service fuses the results of both operations automatically to improve overall retrieval accuracy. This eliminates the need to manage dual databases or write custom rank-fusion algorithms.
Every search instance includes automated storage and a built-in vector index. Files uploaded via the API are parsed and chunked automatically by the service. You do not need to configure chunk sizes or embedding models manually.
The service includes built-in support for the Model Context Protocol. Every instance provides an MCP endpoint out of the box. This makes the search primitive natively compatible with standard AI coding agents like Claude Desktop and Cursor without additional middleware.
Setup and Namespace Configuration
AI Search integrates directly with Cloudflare Workers, the Wrangler CLI, and the new Agents SDK preview known as Project Think. The service relies on a specific binding system to manage access and permissions.
You connect your application using the ai_search_namespaces binding. This dynamic binding allows your code to create, list, and delete search instances on the fly.
This approach replaces the previous env.AI.autorag() API used during the early development phase. If you are migrating an older project, the legacy AutoRAG binding remains supported through Workers compatibility dates. You can review the updated wrangler.toml requirements in the official documentation.
Managing Dynamic Search Instances
Traditional vector databases typically require manual index creation and static connection strings. AI Search introduces programmatic instance generation. You can spin up isolated namespaces for specific tasks, individual users, or discrete tenants.
This architecture prevents data leakage across user sessions. An agent can create a temporary search instance for a single task, upload relevant files, and delete the instance when the job finishes.
The service supports querying across multiple search instances in a single API call. This is highly effective when building RAG applications that require both a global company knowledge base and a private, user-specific document store.
Data Ingestion and Crawling
Instances created on or after April 16, 2026, manage their own storage internally. You upload files directly to the target instance via a standard REST API. The underlying infrastructure handles all text extraction and vectorization.
You can still connect search instances to external R2 buckets for larger existing datasets. The system maps the bucket contents to the search index automatically.
AI Search also features a built-in website crawler powered by the Browser Run service. You can pass a URL to the API, and the system will fetch the page, render any dynamic content, extract the text, and index it. This allows AI agents to read live documentation or external web resources directly.
Metadata and Relevance Boosting
Documents ingested into AI Search support custom metadata tagging. You can attach specific attributes to any uploaded file. Common metadata fields include version numbers, author tags, timestamps, or language identifiers.
You can apply hard filters or relevance boosts based on this metadata at query time. A query can restrict results exclusively to documents tagged “v2.0”. Alternatively, you can apply a mathematical boost to specific tags, ensuring that newer documents appear higher in the fused search results. The query API documentation details the exact syntax for defining these query-time parameters.
Pricing and Limits
AI Search is currently in open beta and available on all Cloudflare plans. Usage of the core AI Search primitive is free within the designated beta limits. This includes the built-in storage features and the Browser Run web crawling capabilities.
The compute operations powering the service are billed separately. The system relies on Workers AI for generating embeddings and running inference. It uses AI Gateway for request management and caching.
| Service Component | Billing Status | Rate |
|---|---|---|
| Storage & Indexing | Free during beta | $0.00 |
| Web Crawling | Free during beta | $0.00 |
| Embeddings / Inference | Standard billing | $0.011 per 1,000 Neurons |
Cloudflare has stated that unified pricing for AI Search as a single service will be introduced after the beta period concludes.
Begin by configuring the ai_search_namespaces binding in your staging environment and verifying the built-in MCP endpoint connectivity. Test your metadata filtering and relevance boosting on a small dataset before deploying dynamic instances to production users.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
AI Agents Get Post-Quantum Networking in Cloudflare Mesh
Cloudflare Mesh introduces a secure fabric for AI agents, users, and nodes, replacing legacy VPNs with identity-based, post-quantum encrypted connectivity.
How to Deploy Enterprise MCP with Cloudflare Workers
Learn to secure and scale Model Context Protocol deployments using Cloudflare’s reference architecture for remote MCP servers and centralized portals.
Cloudflare Agents Week Redefines Edge Compute for AI
Cloudflare launches Agents Week, introducing Dynamic Workers and the EmDash CMS to provide the high-performance infrastructure needed for autonomous AI agents.
Meta’s KernelEvolve Agent Cuts AI Kernel Dev from Weeks to Hours
Meta introduces KernelEvolve, an agentic AI system that autonomously optimizes high-performance kernels, boosting ads model inference throughput by 60%.
Meta Confirms Sev-1 Data Exposure Caused by AI Agent
Meta reports a high-severity security incident after an autonomous AI agent triggered internal data exposure through a 'confused deputy' failure.