How to Implement Saga Rollbacks in Cloudflare Workflows
Learn how to manage distributed transactions and write compensating actions using the saga rollback feature in Cloudflare Workflows.
Cloudflare’s new saga rollbacks for Cloudflare Workflows let you define compensating actions directly within each step of a multi-step application. Following the Workflows V2 upgrade in May 2026, which increased limits to 50,000 concurrent instances, this release provides a native solution for the distributed transaction problem. This tutorial covers how to implement the saga pattern, configure rollback retries, and access step context during failure states.
The Saga Pattern in Workflows
Long-running processes often require multiple external API calls or database updates. If a step fails late in the process, standard JavaScript catch blocks are insufficient because they rely on memory. The worker isolate may be evicted, or the failure might occur days after the first step completed.
Cloudflare Workflows addresses this using persisted step history. When a workflow fails, the system automatically executes compensating actions for all previously completed steps in reverse step-start order. This is particularly useful when orchestrating multi-agent coordination patterns, where intermediate state must be unwound across disparate systems.
Implementing Rollback Handlers
You define a rollback handler by passing a third argument to the step.do() method. As of the June 23 API update, rollback handlers receive the original step context via a ctx object, alongside the output and error parameters.
typescript await step.do( “provision resource”, async () => { /* primary logic */ }, { rollback: async ({ ctx, output, error }) => { // Compensating logic (e.g., delete the provisioned resource) }, rollbackConfig: { retries: { limit: 3, delay: “30 seconds”, backoff: “linear” }, timeout: “5 minutes” } } );
The ctx object contains ctx.step.name, ctx.step.count, ctx.attempt, and the original step configuration. This enables dynamic rollback logic based on the specific attempt that caused the failure.
Configuration Options
Rollback handlers maintain their own execution settings independent of the primary step logic. You can configure these via the rollbackConfig object.
| Feature | Details |
|---|---|
| Retry Support | Rollbacks accept independent retry limits, backoff strategies, and timeout settings. |
| Failure Handling | A failed step.do() remains eligible for rollback execution if it successfully registered a handler before crashing. |
| Payload Support | Handlers support ReadableStream return values for processing large payloads up to R2 storage limits. |
| Status API | The Workers API returns status: "running" during an active rollback, exposing the final outcome once completed. |
Observability and Limitations
Monitoring failed rollbacks requires differentiating between primary execution failures and cleanup failures. Cloudflare Workflows emits specific rollback lifecycle events into the analytics dashboard. This makes it easier to monitor AI applications running complex asynchronous tasks.
There are tradeoffs to consider. Relying on persisted step history means developers no longer need to track state in external databases like D1 or KV. However, you must design your compensating actions to be idempotent. If a rollback handler times out and retries, it might attempt to delete a resource that was already partially removed.
Update your Cloudflare Workers API bindings to the latest release to access the ctx object in your rollback definitions, and begin adding compensating actions to your longest-running dynamic workflows.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Centralized IAM Hits Claude Code via Self-Hosted Apps Gateway
The new self-hosted Claude apps gateway bridges Amazon Bedrock and Google Cloud environments with local Claude Code deployments via unified OIDC authentication.
3nm Trainium3 Chips Pivot AWS to Direct Merchant Silicon
Amazon Web Services is shifting its semiconductor strategy, selling its 3nm Trainium3 and Inferentia3 AI chips directly to external data center operators.
AI Exploit Chains Prompt Cloudflare's New Defense Architecture
Cloudflare detailed a four-layer security architecture designed to counter rapid exploit chain construction by frontier AI models like Claude Mythos.
How to Route GPU GitHub Actions to Hugging Face Jobs
Offload your training and GPU-heavy CI workloads to Hugging Face Jobs using their new ephemeral GitHub runners and action integrations.
Cloudflare Rebuilds CLI on Vite Following VoidZero Acquisition
Cloudflare acquired VoidZero, bringing the Rust-based Vite build ecosystem internally to unify local development environments with global edge runtimes.