Build AI Agent Search with Cloudflare AI Search

Cloudflare’s new AI Search service simplifies retrieval-augmented generation by abstracting vector indexing, document parsing, and keyword search into a single managed primitive. Released during Cloudflare’s Agents Week on April 16, 2026, the tool allows developers to provision dedicated search indexes programmatically at runtime. You can create millions of separate dynamic instances for individual users or agents without redeploying code. This tutorial covers how to set up the new namespace binding, ingest documents, and configure hybrid search for your workloads.

Hybrid Search Architecture

AI Search runs semantic vector search and BM25-based keyword search in parallel. The service fuses the results of both operations automatically to improve overall retrieval accuracy. This eliminates the need to manage dual databases or write custom rank-fusion algorithms.

Every search instance includes automated storage and a built-in vector index. Files uploaded via the API are parsed and chunked automatically by the service. You do not need to configure chunk sizes or embedding models manually.

The service is designed for direct integration with AI agents. It works with Cloudflare Workers, the Agents SDK, and the Wrangler CLI, making it straightforward to connect to agent architectures that need retrieval capabilities.

Setup and Namespace Configuration

AI Search integrates directly with Cloudflare Workers, the Wrangler CLI, and the Agents SDK. The service relies on a specific binding system to manage access and permissions.

You connect your application using the ai_search_namespaces binding. This dynamic binding allows your code to create, list, and delete search instances on the fly.

This approach replaces the previous env.AI.autorag() API used when the service was known as AutoRAG. If you are migrating an older project, the legacy binding remains supported through Workers compatibility dates. You can review the updated wrangler.toml requirements in the official documentation.

Managing Dynamic Search Instances

Traditional vector databases typically require manual index creation and static connection strings. AI Search introduces programmatic instance generation. You can spin up isolated namespaces for specific tasks, individual users, or discrete tenants.

This architecture prevents data leakage across user sessions. An agent can create a temporary search instance for a single task, upload relevant files, and delete the instance when the job finishes.

The service supports querying across multiple search instances in a single API call. This is highly effective when building RAG applications that require both a global company knowledge base and a private, user-specific document store.

Data Ingestion and Crawling

Instances created on or after April 16, 2026, manage their own storage internally. You upload files directly to the target instance via a standard REST API. The underlying infrastructure handles all text extraction and vectorization.

You can still connect search instances to external R2 buckets for larger existing datasets. The system maps the bucket contents to the search index automatically.

AI Search also features a built-in website crawler powered by the Browser Run service. You can pass a URL to the API, and the system will fetch the page, render any dynamic content, extract the text, and index it. This allows AI agents to read live documentation or external web resources directly.

Metadata and Relevance Boosting

Documents ingested into AI Search support custom metadata tagging. You can attach specific attributes to any uploaded file. Common metadata fields include version numbers, author tags, timestamps, or language identifiers.

You can apply hard filters or relevance boosts based on this metadata at query time. A query can restrict results exclusively to documents tagged “v2.0”. Alternatively, you can apply a mathematical boost to specific tags, ensuring that newer documents appear higher in the fused search results. The query API documentation details the exact syntax for defining these query-time parameters.

Pricing and Limits

AI Search is currently in open beta and available on all Cloudflare plans. Usage of the core AI Search primitive is free within the designated beta limits. This includes the built-in storage features and the Browser Run web crawling capabilities.

The compute operations powering the service are billed separately. The system relies on Workers AI for generating embeddings and running inference. It uses AI Gateway for request management and caching.

Limit	Workers Free	Workers Paid
AI Search instances per account	100	5,000
Files per instance	100,000	1M (500K for hybrid search)
Queries per month	20,000	Unlimited

Workers AI and AI Gateway usage are billed separately. Cloudflare plans to introduce unified pricing for AI Search as a single service after the beta period concludes.

Begin by configuring the ai_search_namespaces binding in your staging environment and verifying the built-in MCP endpoint connectivity. Test your metadata filtering and relevance boosting on a small dataset before deploying dynamic instances to production users.

Build AI Agent Search with Cloudflare AI Search

Hybrid Search Architecture

Setup and Namespace Configuration

Managing Dynamic Search Instances

Data Ingestion and Crawling

Metadata and Relevance Boosting

Pricing and Limits

Keep Reading

Anthropic pushes MCP for production agents despite RCE flaws

Boosting Kimi K2.5 Speed 3x via Cloudflare Infire Optimization

How to Deploy Enterprise MCP with Cloudflare Workers

Gemini API Gains Remote MCP and Asynchronous Background Tasks

Gemini Spark Beta Adds Persistent Mac Automation for $99 a Month