When your optimizer silently returns the wrong answer (and how to catch it)

Numerical solvers have a failure mode that is worse than crashing: every so often they return status: Optimal and hand you a number that is simply wrong. No exception, no warning — just a confident, incorrect optimum. If that number drives a downstream decision (a schedule, an allocation, a price), you may never notice.

I ran into a clean example of this in HiGHS recently while reducing a bug that had surfaced through cvxpy, and the debugging path generalizes to any LP/QP/MILP stack. Here's the case, how I isolated it, and a short checklist you can apply to your own models.

The symptom: same model, two answers

A mixed-integer model that HiGHS solves to Optimal with objective 0.0 under default settings — but solve the same model with presolve turned off and you get Optimal with objective ≈ 6.68e8. Both runs report success. One of them is wrong.

When presolve-on and presolve-off disagree on a problem that has a well-defined, bounded optimum, that is not a tolerance issue — it means one of the reduction steps is mangling the model. (This particular case is an open, actively-investigated issue; a separate wrong-answer I reduced to a standalone .mps from a cvxpy program is filed here.)

The first diagnostic is free: flip presolve

Before anything else, re-solve with presolve disabled and compare the two objectives:

import highspy

def solve(path, presolve="on"):
    h = highspy.Highs()
    h.setOptionValue("presolve", presolve)
    h.setOptionValue("output_flag", False)
    h.readModel(path)
    h.run()
    return h.getObjectiveValue()

on  = solve("model.mps", "on")
off = solve("model.mps", "off")
print(on, off)        # disagree on a feasible, bounded model => bug in a reduction

The same idea works through the modeling layer — in cvxpy, compare prob.solve(solver=cp.HIGHS) against the same solve with {"presolve": "off"}. If the two disagree, a reduction step is the culprit, and you have already cut the search space in half.

Why scaling is so often the trigger

The common thread in this family of bugs is coefficient magnitude. HiGHS prints the coefficient ranges at the top of every run:

Coefficient ranges:
  Matrix  [4e-01, 5e+02]
  Cost    [2e+01, 3e+02]
  Bound   [1e+02, 1e+02]
  RHS     [3e+01, 2e+04]

When a single constraint mixes coefficients spanning many orders of magnitude, bound-tightening and substitution accumulate floating-point error, and integer-rounding logic ("this RHS rounds up to the next integer bound") can tip the wrong way. The minimal reproducer I extracted kept exactly the rows whose coefficients carried the large magnitudes — drop them and the collapse disappears.

The same root cause shows up across solvers, just wearing different clothes:

OSQP (QP): an open report where v1.0.0+ runs all the way to max-iterations with gap = -nan, even though the primal and dual residuals are already at 1e-14. The duality-gap termination criterion is poisoned by a NaN, so the solver never recognizes that it has already converged.
Clarabel (conic/QP): a report where a wildly ill-scaled QP (objective on the order of 1e9) returns a false PrimalInfeasible with equilibration on, but solves cleanly with equilibrate_enable=False. Ruiz equilibration is capped at equilibrate_max_scaling = 1e4 by default — about four orders short of a 1e8 dynamic range, so the post-scaling KKT system is still badly conditioned.

Different solvers, same lesson: magnitude is not cosmetic.

How to minimize a solver bug so it actually gets fixed

A 350-row model is not a bug report a maintainer can act on. The reduction loop is mechanical and worth automating:

Reproduce on the latest release first. Half of "bugs" are already fixed. Pin the version you tested.
Greedily drop rows and columns. Remove a chunk; if the wrong-answer signature survives, keep it removed; otherwise restore it and try a smaller chunk. Binary-search your way down. I took one case from 348×169 to 41×40 this way and it still collapsed.
Make the "still broken" check a predicate, not an eyeball. Here it was abs(on - off) > tol (or status == Infeasible while presolve-off says Optimal), re-evaluated after every removal.
Export the reduced model to a portable format (.mps) so the report is solver-version- and language-independent.
File with three things: the version, the exact on/off command delta, and the minimal .mps. That is a report that gets triaged in minutes instead of sitting untouched.

A scaling-hygiene checklist

Even when there is no solver bug, bad scaling silently erodes accuracy. Cheap habits that prevent most of it:

Read the coefficient ranges on every run. If the matrix or RHS spans more than ~1e6, treat the result with suspicion.
Rescale units before the solver sees them (dollars → millions, bytes → GB). Single highest-leverage fix.
Do not encode big-M larger than necessary. An M of 1e9 where 1e4 would do is how you manufacture these bugs.
Keep a presolve-off run in your test suite for any model whose output you trust blindly — a periodic on/off agreement check is a cheap regression guard.
For QP/conic, check the equilibration cap against your data's dynamic range, and prefer pre-scaling to relying on the solver to rescue pathological inputs.

The broader point

These bugs are dangerous precisely because the solver's contract — "I returned Optimal" — is exactly what you would normally trust. The on/off differential is so useful because it doesn't trust that contract: it cross-checks two code paths that are supposed to agree and flags the moment they don't. That "verify the thing that claims to be correct" instinct is worth wiring into any pipeline where a wrong number is expensive.

I work on reliability and verification for numerical and AI systems — minimal reproducers, determinism, and "prove the output is what it claims" tooling; the HiGHS reducer above came out of that. The issues referenced are linked inline. If you hit something in this family, I'm happy to compare notes — GitHub.

When your optimizer silently returns the wrong answer (and how to catch it)

The symptom: same model, two answers

The first diagnostic is free: flip presolve

Why scaling is so often the trigger

How to minimize a solver bug so it actually gets fixed

A scaling-hygiene checklist

The broader point

Tags

Author

Stats

Published

You Might Also Like

I Cut My Next.js + Supabase App Load Time by 73% - Here Are the 5 Techniques That Actually Worked

Next.js + Supabase Performance Optimization: From Slow to Lightning Fast

Streamline Web Development: Reusable Loading Indicators to Save Time Across Projects

Optimizing Kubernetes Clusters: Rightsizing and Autoscaling to Improve Resource Utilization While Minimizing Business Impact

Building a High-Performance E-Commerce Store for Visual Products: A Street Art Retailer's Guide

Building High-Converting E-Commerce Sites for Pet Products: A Developer's Technical Checklist