I shipped 30+ releases of an AI job-search dashboard through vibecoding — here's the doctrine

I'm Sergey. Over the last few months I built career-ops-ui — a local-only browser dashboard for managing job applications — entirely through Claude Code vibecoding sessions. 30+ releases, 9 locales, 1000+ unit tests, 70+ Playwright cases, MIT licensed.

This post is a writeup of the doctrine that made vibecoding actually scale to a production-grade codebase, and the footguns that almost killed it. Skip the marketing — I'm sharing the rules I codified after enough failures to know they matter.

What I built

A self-hosted web UI on top of career-ops (an AI job-search pipeline that hit 27K GitHub stars in 4 days). Original is CLI-only. I was processing 100+ job postings a week and triaging through the terminal was eating my evenings, so I built the visual layer:

Paste a JD URL → AI scores it A–F → tailored resume generated
Visual pipeline for hundreds of postings (filter, sort, dedupe inline)
Application tracker with funnel (applied → interview → offer)
CV editor with live markdown preview + PDF export
Deep company research mode (with brief warning if upstream prompt drifts)
Interview prep using STAR+R framework
Apply checklist with per-URL persistence
Salary range filter (currency-agnostic, NBSP-aware)
Real-time SSE scanner across 12 sources (Greenhouse, Ashby, Lever, Workday, hh.ru, Habr Career, Trudvsem, RSS, etc.)
9 locales: en, es, pt-BR, fr, ru, ja, ko, zh-CN, zh-TW
Provider-agnostic LLM routing (Anthropic / Gemini / OpenAI / Qwen / OpenRouter)
Runs at 127.0.0.1:4317, MIT licensed, no signup, no telemetry

The vibecoding doctrine (5 rules that survived)

1. ONE fix per release. Never batch.

Every release ships exactly one logical change. HIGH → MEDIUM → LOW priority. Each ship gets:

Version bump
CHANGELOG entries in all 9 locales (parity-gated)
A dedicated regression-lock test that must fail before the fix
Playwright verify on the specific surface
Pre-commit AI review approval
CI green on Node 18 / 20 / 22 + Playwright e2e

Sounds slow. Actually faster than batching — bugs are attributable, rollback is trivial, and the AI never loses context about what we're shipping in a session.

2. TDD-first means "RED BAR MANDATORY"

The biggest lesson came from a regression I closed five times before it actually held. Each previous "close" had passing tests:

// What the test asserted (wrong)
test('scroll-spy implementation present', () => {
  const file = readFileSync('public/js/views/help.js', 'utf-8');
  assert(/IntersectionObserver/.test(file));   // ✅ passes
});

But the user-visible behavior was broken — active TOC entries never got highlighted on scroll. The test checked source-code shape, not behavior.

On the 6th attempt I forced myself to:

Write the failing Playwright test FIRST
Commit it on a branch
Push it
Screenshot the red bar in the PR
ONLY THEN write the fix

// What the test should have asserted
test('TOC scroll-spy highlights active section', async ({ page }) => {
  await page.goto('http://127.0.0.1:4317/#/help');
  await page.evaluate(() => document.getElementById('help-h-5').scrollIntoView({ block: 'center' }));
  await page.waitForTimeout(800);
  await expect(page.locator('.help-toc a.toc-current')).toHaveCount(1);
});

That cycle closed the bug in one shot.

The lesson: the AI will happily write tests that pass against your existing broken code. You have to force the red bar visible before any fix code touches the repo.

3. Methodology footguns get documented

After enough false-negative sign-offs, I started a §−1 Footguns section in the QA prompt. Three I hit hardest:

Footgun A: file-path assertions vs behavior assertions. I asserted presence of a suggested extracted file when the actual implementation inlined into an existing one. git grep mountHelpToc public/ returned 0 — not a regression, just a wrong probe. Assert behavior (class applied? element painted?), never file paths.

Footgun B: client-side URL normalization. fetch() and curl (without --path-as-is) normalize URLs before sending. They never exercise the server's raw .. traversal guard. To verify the middleware:

curl -s --path-as-is 'http://127.0.0.1:4317/api/jds/../../../etc/passwd'
# Expect: {"error":"invalid path"}

Footgun C: vm-realm deepEqual. Objects built inside node:vm have a foreign prototype. assert.deepStrictEqual against a JSON snapshot fails even on identical values. Round-trip first:

const snapshot = JSON.parse(JSON.stringify(assembledInVm));
assert.deepStrictEqual(snapshot, expected);   // now works

Documenting these saved hours of false-positive debugging in later sessions. New AI sessions read §−1 first and apply the right probe technique.

4. Locale-aware everything (I18N-SPLIT architecture)

Originally all 9 locales lived in one 36KB JS dictionary. A community contributor (Mike from Discord) flagged:

".po files would be proper for translators. Your current setup is painful."

He was right. We refactored to per-locale files with an @alias mechanism for shared keys:

public/js/lib/locales/
  i18n-dict.en.js          // window.__I18N_DICT_EN = { ... }
  i18n-dict.es.js
  i18n-dict.fr.js          // added by community via Ollama
  i18n-dict.ja.js
  i18n-dict.ko.js
  i18n-dict.pt-BR.js
  i18n-dict.ru.js
  i18n-dict.zh-CN.js
  i18n-dict.zh-TW.js
  i18n-dict.aliases.js     // shared canonical keys
public/js/lib/i18n-dict.js // assembler

Load order in index.html: 9 locale files → aliases → assembler → i18n.js. The t() function never changed. Zero call-sites edited.

The architecture validated when French was added two months later by a community contributor using local Ollama + qwen2.5:14b for translation:

// translateChunk() called by the contributor
async function translateChunk(chunk) {
  const payload = Object.fromEntries(chunk.map(c => [c.key, c.english]));
  const prompt = [
    'Translate this UI locale JSON object from English to French.',
    'Return ONLY a valid JSON object with exactly the same keys.',
    'Preserve placeholders like {n}, {path}, {hotkey}, URLs, env vars.',
    'Keep English technical product names when natural in French.',
    'Use concise, natural French for a software interface.',
    '',
    JSON.stringify(payload, null, 2),
  ].join('\n');
  const r = await fetch('http://127.0.0.1:11434/api/generate', {
    method: 'POST',
    headers: { 'content-type': 'application/json' },
    body: JSON.stringify({
      model: 'qwen2.5:14b-instruct',
      prompt,
      stream: false,
      options: { temperature: 0, num_ctx: 8192 },
    }),
  });
  return extractJsonObject((await r.json()).response);
}

This same workflow now scales to a 10th, 11th locale without touching the main dict. Free local inference, zero coordination cost.

5. Read-only boundary tests catch destructive AI suggestions

The parent career-ops project is mutable user data. The web-ui has a hard rule: NEVER write to parent files except on explicit user actions (Pipeline +Add, CV Save, Config write).

Every test runs with CAREER_OPS_ROOT pointed at mktemp -d:

// tests/setup.mjs
import { mkdtempSync } from 'node:fs';
import { tmpdir } from 'node:os';
import { join } from 'node:path';

const root = mkdtempSync(join(tmpdir(), 'career-ops-test-'));
process.env.CAREER_OPS_ROOT = root;
// PATHS resolves once per process; setup must run before any imports.

If any test writes to the real parent, it fails immediately. This makes AI-assisted code review trivially safe — Claude can suggest any refactor; the boundary tests catch parent mutations automatically.

What didn't work

Batching fixes "just this once" — every doctrine-exception bundled release later required a follow-up fix-ship. Pure overhead.
"Implement and write tests" in one prompt — produces happy-path tests that pass against broken code. Split: write failing test → confirm red → only then code.
npm test 2>&1 | grep — grep returns 0 on any match, masking the exit code. Run npm test, capture $?, then grep.
Static lock-tests for behavioral promises — git grep for a symbol doesn't prove the user-visible behavior works.

What surprised me

Community pulled the project forward harder than I expected. Mike flagged the i18n architecture problem; the French contributor used local Ollama to translate the entire dict in 48 hours. Zero coordination cost.
9 locales × 19 routes × 75 H3 sections are testable. Every cycle runs the same 9-locale × route-sweep automated test. Took 30 min to write, saves hours every release.
Vibecoding scales to architecture-level decisions. The I18N-SPLIT refactor was a 12-hour Claude Code session. The AI walked through 8 locale files, suggested the @alias pattern, wrote the migration script, regenerated the snapshot, and pushed parity tests — all in one session.

Stack

Backend: Node.js 18+ Express server, ~130 LOC server/index.mjs orchestrator + 15 route modules
Frontend: no framework, vanilla JS, hash-router SPA
Prod deps: express + js-yaml + multer only
Tests: node --test (1000+ unit), 70+ Playwright cases, 23 comprehensive e2e
LLM routing: Anthropic / Gemini / OpenAI / Qwen / OpenRouter (auto-route to whichever key is set; manual fallback works without any key)
i18n: 9 per-locale dict files + @alias mechanism, server-side fallback to English
Security: CSP without unsafe-inline/unsafe-eval, SSRF guard, stripDangerousMarkdown() on CV ingress, masked secrets, JSON-404 on /api/*
Streaming: Server-Sent Events for long ops (scan, auto-pipeline, batch)
Data: Markdown for all user state (CV, applications, reports) — version-controllable, cat-able, never proprietary format

Takeaways for other vibecoding builders

Write the failing test first, screenshot the red bar. The AI will happily produce green tests against broken code.
Document methodology footguns as you hit them. Future sessions read them first. Saves hours.
One-fix-per-release scales surprisingly well. Bugs are attributable; rollback is trivial; AI doesn't lose context.
Per-language-file i18n architecture is worth the cost. Community contributors can fork single files. Painless onboarding.
Read-only boundary tests catch destructive AI suggestions. The AI can suggest anything; the tests enforce the contract.

Try it

git clone https://github.com/Fighter90/career-ops-ui
cd career-ops-ui
npm install
npm start
# open http://127.0.0.1:4317

Free, MIT, no signup, runs locally. AI keys optional (works in manual-prompt mode too).

GitHub: Fighter90/career-ops-ui
LinkedIn (open for chat): sergey-emelyanov-in-job

Question for the dev.to community: what's your hardest-learned vibecoding lesson? I'm especially curious about doctrines other builders shipped through when the AI confidently suggested broken solutions. Drop a comment or share your own architecture takeaways below.