Versioning the Harness Itself

The harness is code. The team executes it the way the team executes code: in production, on real tasks, with stakes. Code gets versioning, change review, and migration discipline. The harness usually has none of that, and the absence costs us.

What versioning actually means

I do not mean a version number in the file header. That would be theater. The harness lives in git already; commit hashes are the version.

I mean the practice around changes: how a change is proposed, how it is reviewed, how it is communicated, how it is rolled out, and how it is rolled back. The discipline that surrounds any other change to shared code. The harness deserves the same discipline because it has the same blast radius. A rule change reaches every engineer the next time they start a session.

The harness is infrastructure. Treating it casually because it is a markdown file is the category error.

The change review nobody runs

Most teams I have talked to do not review harness changes the way they review code. The change lands in a PR, someone glances at the diff, the rule reads sensibly in isolation, the PR is approved. The conflict check, the scope check, the who does this affect check — none of it happens.

The minimum review I run now, on any non-trivial harness change:

Does this conflict with an existing rule. A grep against the harness for the same concepts, the same file paths, the same workflows. If the new rule overlaps with an old one, the change is to merge or replace, not to add.

Is the scope right. Project root, subdirectory, path-glob. A rule about the API code should not be in the project root if it never applies elsewhere.

What workflows is this change going to affect. Not “what tasks.” What recurring workflows. A rule that says always do X in the import flow affects every engineer who touches the import flow next week. The change is a coordination event.

Is this reversible. Some rule changes are: delete the rule, the agent goes back to whatever it did before. Some are not. The rule taught the team a habit, and the habit will persist after the rule is gone, possibly carrying the rule’s mistake with it.

The review takes ten minutes. It catches most of the failures before they ship.

When a rule change is actually a migration

The harness sometimes needs more than a rule change. It needs a migration. The structure shifts; a section moves; the conventions for how rules are written change. Those changes break every contributor’s mental model at once, the same way a directory restructure breaks every IDE’s open tabs at once.

The teams that handle this well treat it the same way they handle a code migration: announced ahead of time, deployed on a Monday, with a written migration note in the PR. The note says what changed, why, what to update if you have in-flight work, and who to ask if something looks broken.

The teams that handle this poorly merge the migration on a Friday afternoon and answer DMs about it for the next week.

The cost of the announcement is fifteen minutes of writing. The cost of not announcing is hours of fragmented confusion across the team, plus the slow erosion of trust in the harness that comes from being surprised by it.

Rolling out without breaking everyone

The rollout discipline matches the change’s blast radius.

Additive change. A new rule that does not conflict with anything, scoped to a path that already has a CLAUDE.md, applies to new work only. Merge it. The team notices the next time the agent acts in that path.

Change to an active workflow. Post in the team channel before merging, name the workflow, name the change, give people a chance to push back or flag in-flight work. Wait a day if anyone has a branch in that workflow. The cost is one day of delay; the benefit is that the rule lands without breaking active work.

Structural change. Write a migration note, schedule it for a Monday morning, batch other harness changes into the same window if they overlap. The team gets one event to update their mental model rather than five spread over the week.

A rule about commit messages is not the same blast radius as a rule about how the API layer handles errors. Treating them with the same care is overkill on one and insufficient on the other.

The rollback that earns its place

The rollback is easy because the harness is in git. Revert the commit, the agent reads the old rule on the next session, the workflow recovers.

The rollback that is not easy is the one where the rule taught the team a behavior that survived the rule. A rule that said name files like X sat in the harness for two months. The team adopted the convention. The rule got rolled back when the convention turned out wrong for a different module. The agent now sees the convention everywhere in the codebase and treats it as fact, even though the harness no longer says to follow it.

Some rules have persistent effects, and rolling them back is not enough. The rollback has to be paired with a correction. A new rule that explicitly contradicts the old one. A test that catches the old behavior. A note in the team channel that the convention is gone. Otherwise the agent and the team both keep doing what the rule used to say.

The check before merging any non-trivial rule: if this rule turned out to be wrong, what would I have to undo? If the answer is revert the commit, fine. If the answer is revert the commit plus untrain the team plus catch the residual cases in review, the rule needs more thought before it lands.

The one-paragraph summary

Every harness change merges with a one-paragraph summary in the team channel. Not a link to the PR. A summary the team can read in fifteen seconds.

Added a rule about testing the migration path against a real database, not a mock. Affects anyone working in the migrations directory. Prompted by the incident two weeks ago. Push back in this thread if it conflicts with something I missed.

The summary takes two minutes to write. It does three things: it tells the team something changed, it gives them the context to evaluate the change, and it gives them a place to push back if I got it wrong.

Most pushback comes within a day. Most of it is useful. Sometimes it reveals the rule should be scoped down, or that it conflicts with an in-flight piece of work, or that the team has a better way of expressing the same intent. The discussion happens in public, and the next maintainer reading the commit history can see the reasoning.

A monthly cadence for big changes

Small changes go in whenever. Big changes, like ones that touch the harness’s structure or change a rule the team has built habits around, go in on a regular cadence. We do it monthly.

The cadence does two things. It batches the disruption: the team’s mental model updates once a month, not constantly. And it forces a backlog of harness changes to accumulate, which surfaces patterns. Three of the changes pending this month all point in the same direction; the actual change should be a single bigger move that subsumes them, not three small ones.

The discipline is to resist landing big changes between cadences. The cost of waiting two weeks is small. The cost of breaking everyone’s flow on a random Tuesday is large.

Ship your next harness change on a Monday

The harness has the same blast radius as a build configuration or a CI pipeline. The teams that treat it that way get a harness that improves. The teams that do not get one that drifts.

Three things to try this week:

Before merging the next non-trivial rule change, grep the harness for the same concepts and paths. If the new rule overlaps with an old one, merge or replace; do not add.
Write a one-paragraph summary and post it in the team channel before merging. Wait a day if the change affects an active workflow.
Put a monthly slot on the calendar for structural changes. Push big changes to that slot. Land small ones whenever.

Versioning the Harness Itself

What versioning actually means

The change review nobody runs

When a rule change is actually a migration

Rolling out without breaking everyone

The rollback that earns its place

The one-paragraph summary

A monthly cadence for big changes

Ship your next harness change on a Monday

Tags

Author

Stats

Published

You Might Also Like

Keystone 2.0 — A Worthy 2.0

The Harness Is Also Onboarding

Prompt vs Context vs Harness vs Loop Engineering