Event Versioning With Upcasters: Schema Evolution That Does Not Break Replay

Book: Event-Driven Architecture Pocket Guide: Saga, CQRS, Outbox, and the Traps Nobody Warns You About
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You have an event store with four years of OrderPlaced events in it. The schema has changed three times. Last week you wrote a new projection and replayed the stream to build it. The projection came out wrong for everything before 2024, because those old events still call the field amount and your new code reads amount_cents.

That is the moment event versioning stops being a compatibility checkbox and becomes a replay problem. In a plain message queue, old events age out. In an event-sourced system, the events are the source of truth and they never age out. You will read that 2022 event again, and again, every time you rebuild a projection. The schema you wrote it with is the schema you are stuck with on disk, forever.

Upcasters are the pattern that makes that survivable. The events stay frozen in their original shape. The transformation happens on the way out.

The two kinds of change, and why only one is safe to ignore

Before any tooling, sort every schema change into one of two buckets.

Additive changes add optional information. A new nullable field. A new event type. A new enum value that old readers can fall through on. Old events are still valid; they just lack the new thing. Old consumers ignore what they do not recognize. You do not need an upcaster for these. A default value covers it.

{
  "type": "record",
  "name": "OrderPlaced",
  "fields": [
    { "name": "order_id", "type": "string" },
    { "name": "amount_cents", "type": "long" },
    {
      "name": "promo_code",
      "type": ["null", "string"],
      "default": null
    }
  ]
}

A reader of an old event that predates promo_code gets null. Nothing breaks.

Breaking changes invalidate the old shape. You renamed a field. You changed units. You split one field into several. You moved a value into a nested object. An old event read by new code is now wrong, not just incomplete. These are the changes that need an upcaster, because there is no default value that turns amount: 42 (dollars) into a correct amount_cents.

The trap is that breaking changes are easy to disguise as additive ones. Renaming amount (dollars) to amount_cents and keeping the type as a number passes every structural compatibility check your registry runs. Nothing fails validation. Then you 100x a charge on replay because the value 42 meant forty-two dollars and now reads as forty-two cents. Structural checks see types. They do not see meaning. You have to catch semantic breaks yourself.

Why versioned topics do not solve this

The common instinct is to publish a new topic: orders.v1, orders.v2. That handles the live stream. It does nothing for the events already on disk.

When you replay, you replay orders.v1 too, because those events are real history. Now your consumer needs to understand both shapes anyway. You have moved the branching into the consumer instead of removing it. Every consumer grows a if v1 { ... } else { ... } block, and that block multiplies every time you add a version. By v4 the handler is a switch statement nobody wants to touch.

Upcasters invert this. Instead of every consumer knowing every version, one place knows how to walk an old version forward to the current one. Consumers only ever see current.

The upcaster chain

An upcaster is a small function: it takes an event at version N and returns the same event at version N+1. You never write a v1-to-v3 jump. You write v1-to-v2 and v2-to-v3, and the chain composes them.

package events

type Upcaster interface {
    SourceVersion() int
    Upcast(raw map[string]any) (map[string]any, error)
}

The first real transform: a rename plus a unit fix, the breaking change from earlier.

// V1 -> V2: "amount" in dollars became
// "amount_cents" in minor units.
type OrderV1toV2 struct{}

func (OrderV1toV2) SourceVersion() int { return 1 }

func (OrderV1toV2) Upcast(
    raw map[string]any,
) (map[string]any, error) {
    if amt, ok := raw["amount"].(float64); ok {
        raw["amount_cents"] = int64(amt * 100)
        delete(raw, "amount")
    }
    raw["schema_version"] = 2
    return raw, nil
}

The second: splitting a flat string into a nested object, with a branch for events that predate the field entirely.

// V2 -> V3: "shipping_address" string became a
// structured "shipping" object.
type OrderV2toV3 struct{}

func (OrderV2toV3) SourceVersion() int { return 2 }

func (OrderV2toV3) Upcast(
    raw map[string]any,
) (map[string]any, error) {
    addr, ok := raw["shipping_address"].(string)
    if !ok {
        raw["shipping"] = map[string]any{
            "line1": "", "city": "",
        }
        raw["schema_version"] = 3
        return raw, nil
    }
    parsed, err := parseAddress(addr)
    if err != nil {
        return nil, fmt.Errorf(
            "v2->v3 addr: %w", err,
        )
    }
    raw["shipping"] = parsed
    delete(raw, "shipping_address")
    raw["schema_version"] = 3
    return raw, nil
}

Running the chain on read

The chain reads the stored version off the event and applies upcasters in order until it reaches the current target. An event already at the current version passes through untouched.

type Chain struct {
    steps  map[int]Upcaster // keyed by source version
    target int
}

func (c *Chain) Apply(
    raw map[string]any,
) (map[string]any, error) {
    v, _ := raw["schema_version"].(int)
    if v == 0 {
        v = 1 // events from before we tracked it
    }
    for v < c.target {
        step, ok := c.steps[v]
        if !ok {
            return nil, fmt.Errorf(
                "no upcaster for v%d", v,
            )
        }
        next, err := step.Upcast(raw)
        if err != nil {
            return nil, err
        }
        raw = next
        v++
    }
    return raw, nil
}

The consumer collapses to one shape. It decodes, runs the chain, and works against the current struct. It never branches on version.

func (h *OrderHandler) Handle(b []byte) error {
    var raw map[string]any
    if err := json.Unmarshal(b, &raw); err != nil {
        return err
    }
    up, err := h.chain.Apply(raw)
    if err != nil {
        return err
    }
    var cur OrderPlacedV3
    return decode(up, &cur)
}

Keeping replay deterministic

Replay only gives you the same projection twice if the upcaster chain is a pure function. Two events with identical stored bytes must produce identical output every time you run them. That sounds obvious until you see the ways it quietly stops being true.

No clocks. An upcaster that fills a missing created_at with time.Now() produces a different result on every replay. Backfill from data already in the event, or from a fixed constant, never from the wall clock. If the information is genuinely absent, write a sentinel and move on. Inventing it from the current time poisons the determinism.

No external lookups. An upcaster that calls a service to enrich an old event ("look up the customer's current tier") is reading state that changes between replays. The same event becomes two different events depending on when you replayed. Upcasters transform what is in the event. They do not fetch.

No randomness, no map iteration order leaking out. If a transform generates an ID, derive it from the event's own fields so it is stable. Ranging over a Go map to build an ordered output will bite you; sort first.

Version the rules, not the data. When you fix a bug in the v1-to-v2 upcaster, you are changing how four years of history reads. Test the change against real historical payloads pulled from the event store, not synthetic ones you wrote to match your assumptions. The synthetic test passes; the 2022 event with the weird null in it does not.

There is one more failure mode worth a sentence. If you ever rewrite events in place to "upgrade them on disk," you have given up the property that makes upcasters work. The stored event is the contract. Touch it and you can no longer reproduce a past replay.

Performance, when the chain gets long

Every read now runs through N transforms. For a projection rebuild over millions of events with an eight-step chain, that adds up. The usual fix is to cache the upcast result: store the current-version form alongside the original, serve the cached one on read, and invalidate only the rows affected when you add a new upcaster. The original stays frozen; the cache is a derived view you can throw away and rebuild. You keep the determinism and skip the repeated work.

For most systems you do not need this on day one. A chain of two or three transforms over a normal stream is cheap. Add the cache when a profile of an actual rebuild says the upcasters are the cost, not before.

Where this lands

Sort changes into additive and breaking, and watch for breaking changes wearing an additive disguise. Freeze events on disk. Put the version-walking logic in one chain instead of a switch statement in every consumer. Keep each upcaster pure: no clocks, no network, no randomness, so a replay in 2027 produces exactly what it produced in 2025.

The reward is that you can write a brand-new projection against today's schema, point it at four years of history, and trust the number that comes out the other end.

What is the oldest event version you still upcast on read, and how many steps does the chain take to reach current? Drop it in the comments.

If this was useful

Upcasters are one piece of a larger story: how event-sourced systems handle replay, projections, and the schema drift that builds up over years in production. The Event-Driven Architecture Pocket Guide: Saga, CQRS, Outbox, and the Traps Nobody Warns You About works through the patterns you reach for once events become your source of truth, including the failure modes that only show up when you replay a stream you have not touched in years.