Why I Made Stale Forecasts Fail Instead of Falling Back to Do Nothing

The UI showed ready, do_nothing, and a blank reason field.

A facility manager reading that screen would assume the engine had looked at the peak window, checked the assets, and decided there was nothing worth doing. The interface looked calm. The audit trail looked complete.

That was not true.

The forecast behind the plan had expired. The planner should never have scored it. But the fallback path did exactly what I told it to do: when no action was selected, choose do_nothing so the operator gets a safe recommendation instead of an empty response.

Safe fallback became false confidence.

The product has a simple rule: physical feasibility comes before economics. A battery cannot discharge below its minimum state of charge. A building load cannot curtail past its comfort limit. A flat-rate tariff cannot justify peak curtailment because there is no peak signal to respond to. Those are business truths encoded as code.

That rule exists because energy planning has a dangerous temptation: collapse every problem into money. If the demand charge is high enough, the spreadsheet always finds a saving. Real facilities do not work that way. Operators know some processes cannot move, some comfort limits cannot bend, and some battery cycles are not worth spending for a small peak reduction. The planner has to encode that judgment before it calculates expected savings.

The bug came from treating stale forecasts like another physical constraint.

In a real infeasible window, do_nothing is useful. If every battery is depleted, every comfort limit blocks curtailment, and every flexible process is already at its limit, doing nothing is a valid operational recommendation. It tells the operator: the engine understood the window and found no feasible savings-positive action.

A stale forecast is different. It means the engine did not have permission to reason about the window at all. The input is invalid. The correct output is a failed plan with an explicit reason.

I got that boundary wrong in the first implementation.

The code had all the pieces in separate places. createPlan detected stale forecasts. generateCurtailmentPlan recorded a stale-forecast rejection. But the bottom of the planner had a broad fallback: if no selected actions exist, add do_nothing. That line was written for infeasible windows, not invalid input, but it had no way to know the difference.

The fix looks small because the hard part was naming the boundary.

Batch 009 had already moved the planner away from fake input. Forecast creation loads qualified interval readings from the database. Plans check that the selected forecast and tariff belong to the requested site. The selected band travels as forecastBandKw, and the action payload records both confidence_band and forecast_band_driver. Those pieces made the failure more embarrassing, not less. The system had evidence discipline at the edges, then lost it in one central fallback.

const staleForecastFailureReason* = "stale forecast cannot be used for a new plan without explicit override"

proc planStatusForDecision*(decision: PlannerDecision; stale: bool): (string, string) =
  if stale: ("failed", staleForecastFailureReason)
  elif decision.selectedActions.len > 0: ("ready", "")
  else: ("failed", "no feasible planner action")

proc generateCurtailmentPlan*(ctx: TenantContext; input: PlannerInput; tariff: PlannerTariff; assets: seq[PlannerAsset]; staleForecast: bool): PlannerDecision =
  validateInput(ctx, input)
  var selected = newJArray()
  var rejected = newJArray()
  var binding = newJArray()
  let curtailAllowed = addStaleAndTariffRejections(tariff, staleForecast, rejected, binding)
  var selectedSavings = 0.0
  for asset in assets:
    if asset.rejectMissingAssetTelemetry(rejected):
      continue
    if asset.assetType == "battery":
      addBatteryAction(asset, input, tariff, staleForecast, selected, rejected, selectedSavings)
      addChargeBatteryAction(asset, input, tariff, staleForecast, selected, rejected, selectedSavings)
    elif asset.assetType in ["building_load", "flexible_process"]:
      addCurtailAction(asset, input, tariff, curtailAllowed, staleForecast, selected, rejected, selectedSavings)
      addShiftLoadAction(asset, input, tariff, staleForecast, selected, rejected, selectedSavings)
  if selected.len == 0 and not staleForecast:
    selected.add(%*{"action_type": "do_nothing", "reason": "no feasible savings-positive action after physical constraints"})
  addBindingRejections(rejected, binding)
  makeDecision(input, tariff, selected, rejected, binding, selectedSavings)

That and not staleForecast is the visible change. The real design change is above it: planStatusForDecision owns the distinction between invalid input and feasible output.

Before that split, status came from selectedActions.len. If there was at least one selected action, the plan became ready. That is a bad proxy because selected actions can be generated by fallback logic. The status needs to know why the planner had no action.

The stale forecast flag now travels through the planning path as an input validity marker, not just another rejected-action reason. It still appears in rejectedActions so the UI can show the operator what blocked the run. But it also controls persisted status and failure_reason so API consumers and replay logic do not treat the plan as a valid no-op decision.

What surprised me was how much of the surrounding architecture existed because of this one boundary.

The same boundary shows up in the schema. curtailment_plans stores status, confidence_band, input_snapshot, plan_actions, rejected_actions, savings_estimate, risk_summary, and failure_reason as separate fields. That separation matters because a failed plan with rejected actions is not the same thing as a ready plan with rejected actions. The operator sees the human explanation either way, but the status tells the rest of the system whether the plan can be accepted, replayed, or escalated.

Forecasts store p10, p50, and p90 bands. Plans record confidence_band and forecast_band_kw. The service checks that a forecast belongs to the same site as the plan. It checks whether the tariff changed after the forecast. It checks whether the forecast is older than the allowed window. All of that is careful work, but one broad fallback at the bottom of the planner erased the meaning.

That is the part I was wrong about. I assumed a safe fallback is always safer than a hard failure.

In operational software, a false safe state can be worse than an error. An error asks for attention. A ready no-op plan closes the loop. It tells the operator they can move on.

The tests now capture the boundary directly. One unit test calls generateCurtailmentPlan with a stale forecast and asserts that no do_nothing action is selected. Another calls planStatusForDecision with stale input and asserts that the persisted status is failed, not ready. The Playwright journeys cover the other side of the behavior: when the inputs are valid but the constraints block action, the operator still sees recommended and rejected action sections, binding constraints, and decision controls.

That is why I like this failure as a design story. It did not ask for more code. It asked for a better state model. The planner needed two kinds of negative answer: one where the business should not act because no feasible action exists, and one where the software should not answer because its input has expired. Both are negative. Only one is a recommendation.

The same distinction shaped replay. Backtests compare planner, no-action, and threshold policies using the same historical input snapshot. A replay can include a no-action policy because it is an intentional baseline. That is different from a planner run falling into do_nothing because its forecast input had expired. Same words. Different contract.

It also shaped operator feedback. A user can accept, reject, or modify a ready plan, and operator_feedback.original_snapshot preserves the recommendation at the time of the decision. That only works if ready means ready. If stale input can still reach ready status, the audit trail becomes a record of the operator reacting to a recommendation the engine should never have issued. The database can preserve the snapshot perfectly and still preserve the wrong thing.

That is why I prefer status fields that carry domain meaning, even when they feel strict. failed is not a bad product outcome when it protects the operator from bad evidence. A failure reason such as stale forecast cannot be used for a new plan without explicit override gives the next workflow something honest to do: rebuild the forecast, refresh the tariff, or ask the operator for an override. A ready no-op gives downstream code no reason to pause.

I now treat do_nothing as a domain decision, not as an absence handler.

That rule carries across the codebase. Missing asset telemetry becomes ASSET_TELEMETRY_INVALID before scoring. A flat-rate tariff produces a tariff-matrix rejection before curtailment can enter selected actions. Battery state of charge and cycle limits reject discharge before expected savings are calculated. Each one is visible because the planner has to show what it refused to do.

The result is less forgiving code, and that is the point. A planner that fails with a clear reason is safer than a planner that returns a calm answer from bad inputs.

I would carry this further if I rebuilt the planner from scratch. staleForecast is still a boolean moving through function calls. It works, and the tests pin the behavior, but an explicit input-validity type would make the boundary harder to blur later. Something like PlanInputStatus could separate ready, stale forecast, tariff mismatch, and missing history before the planner sees any assets. That is a better shape for the next version because it makes invalid input impossible to confuse with an infeasible action set.

The transferable lesson is narrow: fallback logic needs a domain name. If you cannot name the state it represents, it will eventually hide a state you meant to expose.

Why I Made Stale Forecasts Fail Instead of Falling Back to Do Nothing

Tags

Author

Stats

Published