I'm Sergey. Over the last few months I built career-ops-ui — a local-only browser dashboard for managing job applications — entirely through Claude Code vibecoding sessions. 30+ releases, 9 locales, 1000+ unit tests, 70+ Playwright cases, MIT licensed.
This post is a writeup of the doctrine that made vibecoding actually scale to a production-grade codebase, and the footguns that almost killed it. Skip the marketing — I'm sharing the rules I codified after enough failures to know they matter.
What I built
A self-hosted web UI on top of career-ops (an AI job-search pipeline that hit 27K GitHub stars in 4 days). Original is CLI-only. I was processing 100+ job postings a week and triaging through the terminal was eating my evenings, so I built the visual layer:
- Paste a JD URL → AI scores it A–F → tailored resume generated
- Visual pipeline for hundreds of postings (filter, sort, dedupe inline)
- Application tracker with funnel (applied → interview → offer)
- CV editor with live markdown preview + PDF export
- Deep company research mode (with brief warning if upstream prompt drifts)
- Interview prep using STAR+R framework
- Apply checklist with per-URL persistence
- Salary range filter (currency-agnostic, NBSP-aware)
- Real-time SSE scanner across 12 sources (Greenhouse, Ashby, Lever, Workday, hh.ru, Habr Career, Trudvsem, RSS, etc.)
- 9 locales: en, es, pt-BR, fr, ru, ja, ko, zh-CN, zh-TW
- Provider-agnostic LLM routing (Anthropic / Gemini / OpenAI / Qwen / OpenRouter)
- Runs at
127.0.0.1:4317, MIT licensed, no signup, no telemetry
The vibecoding doctrine (5 rules that survived)
1. ONE fix per release. Never batch.
Every release ships exactly one logical change. HIGH → MEDIUM → LOW priority. Each ship gets:
- Version bump
- CHANGELOG entries in all 9 locales (parity-gated)
- A dedicated regression-lock test that must fail before the fix
- Playwright verify on the specific surface
- Pre-commit AI review approval
- CI green on Node 18 / 20 / 22 + Playwright e2e
Sounds slow. Actually faster than batching — bugs are attributable, rollback is trivial, and the AI never loses context about what we're shipping in a session.
2. TDD-first means "RED BAR MANDATORY"
The biggest lesson came from a regression I closed five times before it actually held. Each previous "close" had passing tests:
// What the test asserted (wrong)
test('scroll-spy implementation present', () => {
const file = readFileSync('public/js/views/help.js', 'utf-8');
assert(/IntersectionObserver/.test(file)); // ✅ passes
});
But the user-visible behavior was broken — active TOC entries never got highlighted on scroll. The test checked source-code shape, not behavior.
On the 6th attempt I forced myself to:
- Write the failing Playwright test FIRST
- Commit it on a branch
- Push it
- Screenshot the red bar in the PR
- ONLY THEN write the fix
// What the test should have asserted
test('TOC scroll-spy highlights active section', async ({ page }) => {
await page.goto('http://127.0.0.1:4317/#/help');
await page.evaluate(() => document.getElementById('help-h-5').scrollIntoView({ block: 'center' }));
await page.waitForTimeout(800);
await expect(page.locator('.help-toc a.toc-current')).toHaveCount(1);
});
That cycle closed the bug in one shot.
The lesson: the AI will happily write tests that pass against your existing broken code. You have to force the red bar visible before any fix code touches the repo.
3. Methodology footguns get documented
After enough false-negative sign-offs, I started a §−1 Footguns section in the QA prompt. Three I hit hardest:
Footgun A: file-path assertions vs behavior assertions. I asserted presence of a suggested extracted file when the actual implementation inlined into an existing one. git grep mountHelpToc public/ returned 0 — not a regression, just a wrong probe. Assert behavior (class applied? element painted?), never file paths.
Footgun B: client-side URL normalization. fetch() and curl (without --path-as-is) normalize URLs before sending. They never exercise the server's raw .. traversal guard. To verify the middleware:
curl -s --path-as-is 'http://127.0.0.1:4317/api/jds/../../../etc/passwd'
# Expect: {"error":"invalid path"}
Footgun C: vm-realm deepEqual. Objects built inside node:vm have a foreign prototype. assert.deepStrictEqual against a JSON snapshot fails even on identical values. Round-trip first:
const snapshot = JSON.parse(JSON.stringify(assembledInVm));
assert.deepStrictEqual(snapshot, expected); // now works
Documenting these saved hours of false-positive debugging in later sessions. New AI sessions read §−1 first and apply the right probe technique.
4. Locale-aware everything (I18N-SPLIT architecture)
Originally all 9 locales lived in one 36KB JS dictionary. A community contributor (Mike from Discord) flagged:
".po files would be proper for translators. Your current setup is painful."
He was right. We refactored to per-locale files with an @alias mechanism for shared keys:
public/js/lib/locales/
i18n-dict.en.js // window.__I18N_DICT_EN = { ... }
i18n-dict.es.js
i18n-dict.fr.js // added by community via Ollama
i18n-dict.ja.js
i18n-dict.ko.js
i18n-dict.pt-BR.js
i18n-dict.ru.js
i18n-dict.zh-CN.js
i18n-dict.zh-TW.js
i18n-dict.aliases.js // shared canonical keys
public/js/lib/i18n-dict.js // assembler
Load order in index.html: 9 locale files → aliases → assembler → i18n.js. The t() function never changed. Zero call-sites edited.
The architecture validated when French was added two months later by a community contributor using local Ollama + qwen2.5:14b for translation:
// translateChunk() called by the contributor
async function translateChunk(chunk) {
const payload = Object.fromEntries(chunk.map(c => [c.key, c.english]));
const prompt = [
'Translate this UI locale JSON object from English to French.',
'Return ONLY a valid JSON object with exactly the same keys.',
'Preserve placeholders like {n}, {path}, {hotkey}, URLs, env vars.',
'Keep English technical product names when natural in French.',
'Use concise, natural French for a software interface.',
'',
JSON.stringify(payload, null, 2),
].join('\n');
const r = await fetch('http://127.0.0.1:11434/api/generate', {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({
model: 'qwen2.5:14b-instruct',
prompt,
stream: false,
options: { temperature: 0, num_ctx: 8192 },
}),
});
return extractJsonObject((await r.json()).response);
}
This same workflow now scales to a 10th, 11th locale without touching the main dict. Free local inference, zero coordination cost.
5. Read-only boundary tests catch destructive AI suggestions
The parent career-ops project is mutable user data. The web-ui has a hard rule: NEVER write to parent files except on explicit user actions (Pipeline +Add, CV Save, Config write).
Every test runs with CAREER_OPS_ROOT pointed at mktemp -d:
// tests/setup.mjs
import { mkdtempSync } from 'node:fs';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
const root = mkdtempSync(join(tmpdir(), 'career-ops-test-'));
process.env.CAREER_OPS_ROOT = root;
// PATHS resolves once per process; setup must run before any imports.
If any test writes to the real parent, it fails immediately. This makes AI-assisted code review trivially safe — Claude can suggest any refactor; the boundary tests catch parent mutations automatically.
What didn't work
- Batching fixes "just this once" — every doctrine-exception bundled release later required a follow-up fix-ship. Pure overhead.
- "Implement and write tests" in one prompt — produces happy-path tests that pass against broken code. Split: write failing test → confirm red → only then code.
-
npm test 2>&1 | grep—grepreturns 0 on any match, masking the exit code. Runnpm test, capture$?, then grep. -
Static lock-tests for behavioral promises —
git grepfor a symbol doesn't prove the user-visible behavior works.
What surprised me
- Community pulled the project forward harder than I expected. Mike flagged the i18n architecture problem; the French contributor used local Ollama to translate the entire dict in 48 hours. Zero coordination cost.
- 9 locales × 19 routes × 75 H3 sections are testable. Every cycle runs the same 9-locale × route-sweep automated test. Took 30 min to write, saves hours every release.
-
Vibecoding scales to architecture-level decisions. The I18N-SPLIT refactor was a 12-hour Claude Code session. The AI walked through 8 locale files, suggested the
@aliaspattern, wrote the migration script, regenerated the snapshot, and pushed parity tests — all in one session.
Stack
-
Backend: Node.js 18+ Express server, ~130 LOC
server/index.mjsorchestrator + 15 route modules - Frontend: no framework, vanilla JS, hash-router SPA
-
Prod deps:
express + js-yaml + multeronly -
Tests:
node --test(1000+ unit), 70+ Playwright cases, 23 comprehensive e2e - LLM routing: Anthropic / Gemini / OpenAI / Qwen / OpenRouter (auto-route to whichever key is set; manual fallback works without any key)
-
i18n: 9 per-locale dict files +
@aliasmechanism, server-side fallback to English -
Security: CSP without
unsafe-inline/unsafe-eval, SSRF guard,stripDangerousMarkdown()on CV ingress, masked secrets, JSON-404 on/api/* - Streaming: Server-Sent Events for long ops (scan, auto-pipeline, batch)
-
Data: Markdown for all user state (CV, applications, reports) — version-controllable,
cat-able, never proprietary format
Takeaways for other vibecoding builders
- Write the failing test first, screenshot the red bar. The AI will happily produce green tests against broken code.
- Document methodology footguns as you hit them. Future sessions read them first. Saves hours.
- One-fix-per-release scales surprisingly well. Bugs are attributable; rollback is trivial; AI doesn't lose context.
- Per-language-file i18n architecture is worth the cost. Community contributors can fork single files. Painless onboarding.
- Read-only boundary tests catch destructive AI suggestions. The AI can suggest anything; the tests enforce the contract.
Try it
git clone https://github.com/Fighter90/career-ops-ui
cd career-ops-ui
npm install
npm start
# open http://127.0.0.1:4317
Free, MIT, no signup, runs locally. AI keys optional (works in manual-prompt mode too).
- GitHub: Fighter90/career-ops-ui
- LinkedIn (open for chat): sergey-emelyanov-in-job
Question for the dev.to community: what's your hardest-learned vibecoding lesson? I'm especially curious about doctrines other builders shipped through when the AI confidently suggested broken solutions. Drop a comment or share your own architecture takeaways below.














