Ai Engineering 3 min read

How to Implement Saga Rollbacks in Cloudflare Workflows

Learn how to manage distributed transactions and write compensating actions using the saga rollback feature in Cloudflare Workflows.

Cloudflare’s new saga rollbacks for Cloudflare Workflows let you define compensating actions directly within each step of a multi-step application. Following the Workflows V2 upgrade in May 2026, which increased limits to 50,000 concurrent instances, this release provides a native solution for the distributed transaction problem. This tutorial covers how to implement the saga pattern, configure rollback retries, and access step context during failure states.

The Saga Pattern in Workflows

Long-running processes often require multiple external API calls or database updates. If a step fails late in the process, standard JavaScript catch blocks are insufficient because they rely on memory. The worker isolate may be evicted, or the failure might occur days after the first step completed.

Cloudflare Workflows addresses this using persisted step history. When a workflow fails, the system automatically executes compensating actions for all previously completed steps in reverse step-start order. This is particularly useful when orchestrating multi-agent coordination patterns, where intermediate state must be unwound across disparate systems.

Implementing Rollback Handlers

You define a rollback handler by passing a third argument to the step.do() method. As of the June 23 API update, rollback handlers receive the original step context via a ctx object, alongside the output and error parameters.

typescript await step.do( “provision resource”, async () => { /* primary logic */ }, { rollback: async ({ ctx, output, error }) => { // Compensating logic (e.g., delete the provisioned resource) }, rollbackConfig: { retries: { limit: 3, delay: “30 seconds”, backoff: “linear” }, timeout: “5 minutes” } } );

The ctx object contains ctx.step.name, ctx.step.count, ctx.attempt, and the original step configuration. This enables dynamic rollback logic based on the specific attempt that caused the failure.

Configuration Options

Rollback handlers maintain their own execution settings independent of the primary step logic. You can configure these via the rollbackConfig object.

FeatureDetails
Retry SupportRollbacks accept independent retry limits, backoff strategies, and timeout settings.
Failure HandlingA failed step.do() remains eligible for rollback execution if it successfully registered a handler before crashing.
Payload SupportHandlers support ReadableStream return values for processing large payloads up to R2 storage limits.
Status APIThe Workers API returns status: "running" during an active rollback, exposing the final outcome once completed.

Observability and Limitations

Monitoring failed rollbacks requires differentiating between primary execution failures and cleanup failures. Cloudflare Workflows emits specific rollback lifecycle events into the analytics dashboard. This makes it easier to monitor AI applications running complex asynchronous tasks.

There are tradeoffs to consider. Relying on persisted step history means developers no longer need to track state in external databases like D1 or KV. However, you must design your compensating actions to be idempotent. If a rollback handler times out and retries, it might attempt to delete a resource that was already partially removed.

Update your Cloudflare Workers API bindings to the latest release to access the ctx object in your rollback definitions, and begin adding compensating actions to your longest-running dynamic workflows.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading