Ai Engineering 3 min read

Chrome Brings Cross-Origin Model Caching to Transformers.js

Hugging Face and Google Chrome are testing a Cross-Origin Storage API in Transformers.js to cache large AI models globally across different web domains.

On June 23, 2026, Hugging Face detailed an experimental integration of the Cross-Origin Storage API into Transformers.js. Authored alongside Google’s Chrome team, the update targets the massive redundant bandwidth and disk space consumption caused by standard browser cache isolation. By enabling global caching, web-based AI applications can bypass multi-megabyte downloads for previously fetched models and WebAssembly runtimes.

The Cache Isolation Bottleneck

Modern web browsers partition HTTP caches by origin to protect user privacy. If two different web applications both use Transformers.js to load the same model, such as Xenova/whisper-tiny.en, the browser is forced to download and store the model weights separately for each site.

This behavior creates severe inefficiencies for developers trying to run LLMs locally in the browser. Popular models occupy multiple times their actual size on the user’s disk. Users also endure redundant network requests for multi-megabyte WebAssembly (Wasm) runtime files every time they visit a new AI-powered website.

Global Storage via Content Hashing

The proposed Cross-Origin Storage (COS) API introduces a global storage area that operates independent of specific domains. Files are stored and retrieved based on their content hash rather than their URL.

Once a site caches a core dependency like ort-wasm-simd-threaded.asyncify.wasm (4,733 kB), any subsequent site requesting the same hash can pull it directly from the COS. This eliminates the network request entirely. Retrieval via hash identifiers like SHA-256 ensures implicit verification on write, preventing applications from loading poisoned or corrupted models.

FeatureStandard HTTP CacheCross-Origin Storage API
ScopePartitioned by originGlobal across origins
IdentificationURLContent Hash (SHA-256)
VerificationCertificate trustImplicit hash verification
RedundancyHigh (duplicate downloads)Low (shared resources)

Privacy Controls and Implementation

Global caching introduces the risk of cache probing, where a malicious site could check for specific cached models to infer a user’s browsing history. The COS API mitigates this using an origins field. Developers can restrict resource availability to a specific whitelist of origins or set the field to '*' for universally public models.

Hugging Face has implemented this capability in Transformers.js v4.2.0. Developers can activate the global cache check by setting a single library flag:

javascript env.experimental_useCrossOriginStorage = true;

When this flag is enabled, the library checks the COS for existing model weights before falling back to the standard Cache API or initiating a network fetch.

Browser Ecosystem Context

This experimentation builds on the foundation of Transformers.js v4.0.0, released in April 2026. That major update introduced a WebGPU runtime rewritten in C++ and enabled support for models exceeding 8 billion parameters, such as GPT-OSS 20B. While Hugging Face recently brought Transformers.js v4 to Chrome extensions for isolated local processing, the COS API solves resource sharing across the open web.

If you build heavy web-based AI applications, you can test this architecture today. The API is currently available in Chrome via an experimental flag and requires a dedicated Cross-Origin Storage Chrome extension for local development environments.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading