TL;DR for vibe coders: Shipped an app with Cursor, Claude Code, or v0 and got a scary Vercel or Neon bill? You probably don't need a bigger plan. You need a few fixes. Not technical? Copy the
agent-rulesfolder from the companion repo into your project and tell your AI editor "apply these cost rules." That's the whole job.
You don't need Vercel Pro. You don't need to "Launch" on Neon. A lot of the time you need the opposite of an upgrade. You need your stack to go to sleep.
Here's the moment that started this. A Neon bill landed, across three small Next.js apps running on Vercel with a Neon Postgres database, and the breakdown was lopsided in a way that gave the whole game away. $32.65 of it was compute. Storage was 5 cents. History and data transfer were zero. The meter said 308 hours of compute time in about three weeks, which is a database awake for roughly fifteen hours a day, every day.
The instinct when a bill climbs is "I must be getting traffic, time for a bigger plan." So before doing that, I opened Google Analytics.
Zero users in the last 30 minutes. Meanwhile the database compute was pinned active, and had been more or less around the clock.
So this was never a storage problem or a traffic problem. Almost the entire bill was one thing: paying for a database to stay awake doing nothing.
Both of those things were true at the same time, and here's the part I'll say up front so nobody has to guess: I didn't hand-write the mistakes that caused this. AI coding agents did. I described features, the agent shipped working code, and that code carried a handful of patterns that quietly defeat scale-to-zero. That's not a confession of bad engineering. It's just what building looks like in 2026. Most of us are vibe-coding onto serverless and cloud infra now, and the agent optimizes for "make the feature work," not "keep the database asleep when no one's around." It has no idea your compute bills by the hour it stays awake, so it has no reason to care.
That's the whole reason this post exists, and why the fix at the end isn't a clever line of code. It's a set of rules the agent reads on every session. What follows is a short tour of the four patterns that kept the lights on, light on detail because you don't really need to memorize them. You need your agent to stop writing them.
The cost model nobody reads until the bill arrives
Two layers, two meters.
Neon bills compute by active time, which is roughly the hours the compute endpoint is running multiplied by its size in compute units. Storage is separate, and a suspended compute costs nothing. (pricing, plans)
Vercel bills builds (about $0.0035 per CPU-minute), functions (active CPU, memory, and invocations), bandwidth, and image optimization. (pricing)
The headline feature on both is scale to zero. When nothing is happening, the database suspends and your functions aren't running, so you pay close to nothing. Neon's compute auto-suspends after 5 minutes of inactivity by default and wakes again in a few hundred milliseconds. (scale to zero)
So why was an app with zero users never going to sleep? Because "inactivity" is carrying a lot of weight in that sentence, and four different things were quietly keeping the stack busy.
First, measure the right thing (GA is lying to you)
Before you fix anything, you have to be able to see it. There are two traps here.
The first trap is Google Analytics. GA is client-side JavaScript, so it only fires when a real browser loads your page and runs the script. The traffic that actually hits your functions and database, things like AI crawlers, search bots, uptime monitors, and scrapers, never runs that JavaScript. GA simply doesn't see it. (More on the crawlers in a minute, because they turned out to be the villain.)
The second trap is that Neon's active_time field lags. The cumulative active_time on the projects API updates on a coarse, roughly hourly cadence, so a quick before-and-after read will tell you nothing changed even when it did. The signal you want is the real-time endpoint state, current_state, which is either active or idle:
# Watch the compute go idle, then suspended, in real time
NEON_API_KEY=napi_xxx NEON_PROJECT_ID=your-project node measure/endpoint-state.mjs
For the cumulative proof, take an active_time snapshot, wait a few hours (overnight is best), and diff it against an unchanged control project, so you know a drop is your fix and not just a quiet hour. Both scripts are in the companion repo.
One honest caveat. "Truly zero" has a small floor, because Neon's control plane checks availability periodically, so don't expect a perfectly flat line. What you're hunting for is long idle stretches, and those were completely missing here.
Lever 1: stop paying to build
This one isn't about sleep, it's just the easiest money on the table. By default Vercel runs a metered cloud build on every push and every PR preview. Build it yourself and upload the result, and Vercel skips the build entirely:
vercel pull --yes --environment=production # get project settings + prod env
vercel build --prod # build on YOUR machine into .vercel/output
vercel deploy --prebuilt --prod # upload the output; Vercel does not rebuild
Prebuilt-only costs you preview URLs, push-to-deploy, and Git rollback. Run the same step from GitHub Actions instead and you keep all of that with no billed build minutes (see examples/03-prebuilt-deploy).
Lever 2: the connection that never let go
The worst offender was one innocent-looking line the agent reaches for by default, a module-level connection pool.
// keeps the database awake
import { Pool } from "pg";
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
Serverless functions reuse module scope across warm invocations, so this pool isn't recreated per request. It lives on, holding a TCP connection to Neon, and a held connection keeps the compute from ever scaling to zero. An app with no users pays for a database awake 24/7. The fix Neon recommends is the stateless HTTP driver: each query is a single HTTPS round-trip with no persistent connection. (serverless driver)
// lets the database sleep
import { neon } from "@neondatabase/serverless";
const sql = neon(process.env.DATABASE_URL, { fullResults: true });
const { rows } = await sql.query("SELECT now()");
Keep a real Pool only in CLI scripts, migrations, and long-running servers. (If you genuinely need a pool inside functions, Vercel's Fluid Compute plus attachDatabasePool() closes the same leak.) Full before-and-after in examples/01 and 02.
Lever 3: the cron that re-woke the database on a timer
One app had a cron polling a table every 5 minutes for work to do. Next to a 5-minute auto-suspend, the math is brutal: every time the database tries to sleep, the cron pokes it awake. Net result is ~100% active, forever, for a job that almost always found nothing.
If the work is event-driven, don't poll for it. Do it at request time, return the response immediately, and run the slow part in the background with after() from next/server. Then delete the cron, so the database only gets touched when something real happens.
import { after } from "next/server";
export async function POST(req: Request) {
const job = await createJob(req); // fast: the user gets a response now
after(async () => { await processJob(job.id); }); // slow part runs after the response
return Response.json(job);
}
Keep crons for genuinely scheduled work like a nightly digest or a daily cleanup, and have them return early without touching the database when there's nothing to do.
Lever 4: the robots you can't see
With the pool and the cron fixed, one app still wouldn't sleep. The cause was the traffic GA can't see: LLM crawlers. The app had just shipped a big SEO surface (sitemap, llms.txt, per-entity llm.txt routes, public JSON), which is catnip for GPTBot, ClaudeBot, PerplexityBot and friends. They crawl hard, and none of them run your analytics JavaScript, so GA showed a clean zero while bots made up the majority of actual hits. (Cloudflare Radar, Vercel on AI bot traffic)
What made it a cost story: every bot hit touched the database twice. A write, because middleware logged each visit with an INSERT into a crawler_hits table. And a read, because the crawler routes were force-dynamic, so each fetch ran fresh queries instead of serving cache. Bots hammering DB-backed dynamic routes around the clock means a database that never sleeps. Three fixes:
-
Stop the per-hit write. Crawler logging belongs in stdout or a log drain, not a synchronous
INSERTon the request path. -
Cache the routes. Drop
force-dynamic; a route withexport const revalidate = Nserves from cache and only hits the DB on revalidation. For[slug]/[id]routes, add an emptygenerateStaticParams()so they cache on demand instead of querying every time. (Next.js docs) -
Match
revalidateto the data. This data refreshed daily and the source only changed quarterly, so hourly revalidation was ~24x more often than anything actually changed. Daily was plenty.
After this, a bot crawl serves from the CDN instead of waking the database, and the compute can finally suspend between real events.
The result
The proof was in the real-time endpoint state and the forward active_time delta. Long idle stretches appeared where there had been none, the unchanged control project stayed put, and the apps did the same work while spending most of their lives asleep. No plan change required. You can run the same check on your own stack with the scripts in the repo. "Good" looks like your database sitting idle or suspended whenever no human is actively using the app.
The part that makes this stick: write it down as agent rules
Here's the payoff to what I said at the top. The agent wrote these patterns, so the agent will write them again next week unless something stops it. Fixing the code once doesn't hold. The durable fix is putting the lessons where the agent reads them: in its rules.
I dropped a compact set of rules into the repo, split into neon.md and vercel.md (one file per platform, so you can use just the one you need), with an AGENTS.md and a CLAUDE.md that point your editor at them. They say, in plain imperative terms: use the HTTP driver in handlers, don't poll the DB on a timer, never leave DB-backed routes force-dynamic, don't write to the DB on every bot hit, build prebuilt to skip build minutes, and measure with endpoint state instead of GA. Now the agent reads them on every session and prevents the regressions instead of shipping them.
That's also why the companion repo is built the way it is. If you're not deep in serverless billing, maybe you vibe-coded the app and just want the bill to stop, you don't have to understand any of the above. Copy the rules file into your editor and tell your agent to apply it. Cost knowledge belongs in the agent's context, not just in a senior engineer's head.
The checklist
Copy and paste this into your next serverless project:
- Driver: HTTP
neon()in request handlers;Poolonly in CLI and migrations (or Fluid Compute withattachDatabasePool()). - No DB-polling crons more frequent than the suspend window; event-driven work via
after(). - No
force-dynamicon DB-backed public routes; ISR withrevalidatematched to the data's cadence; emptygenerateStaticParams()for[slug]routes. - No DB write on every bot or analytics hit; log to stdout or sample.
- Build prebuilt with
vercel deploy --prebuiltto skip remote build minutes. - Measure with endpoint
current_stateplusactive_timedelta and the Vercel usage dashboard, not Google Analytics. - Drop the agent-rules in so it stays fixed.
A surprise serverless bill is usually not a signal to upgrade. It's a signal that something in your code is keeping the lights on. Turn them off.
Built three apps on Vercel and Neon and want them to actually scale to zero? The full toolkit, including agent rules, runnable examples, and measurement scripts, is in the companion repo. If it helped, a star helps other people find it.













