Building an AI Clothes Changer: provider abstraction, async jobs, and a credit system that won't lose money

I recently launched Dressora, an AI clothes changer that swaps outfits onto a single photo for virtual try-on. The product side is fun, but the parts I actually sweated over were the boring backend bits: orchestrating multiple AI providers, handling long-running generation jobs, and building a credit system that never double-charges or loses money. Here's what I learned.

Stack

Next.js 15 (App Router) + React 19 + TypeScript
PostgreSQL + Drizzle ORM
Cloudflare R2 for media storage
Multiple AI image/video providers behind one interface

1. Don't marry a single AI provider

AI providers change pricing, rate limits, and quality constantly. Hardcoding one is a trap. I put everything behind a small factory:

const provider = getProvider("evolink");
const task = await provider.createTask({ prompt, aspectRatio });

Each provider implements the same interface (createTask, handleCallback, status mapping). Swapping or adding a provider is a new file, not a refactor. When one provider had an outage, switching the default was a one-line env change.

2. Generation is async — embrace callbacks

AI generation takes 10s–minutes. Blocking a request is a non-starter. The flow:

generate() — create a DB record, freeze credits, call the provider with a callback URL
Provider processes and hits my webhook when done
handleCallback() — download the result, re-upload to R2, mark complete, settle credits

The frontend just polls a lightweight status endpoint. The webhook is the source of truth.

A gotcha: always re-upload the provider's output to your own storage. Provider URLs expire. Downloading and pushing to R2 on completion saved me from dead links later.

3. The credit system was the hardest part

Money + concurrency + async failures = the scariest combination. The pattern that worked: freeze → settle / release.

On request: freeze(credits) — move credits to a "held" state
On success: settle() — actually consume them
On failure/timeout: release() — give them back

freeze  -> hold created, balance reserved
settle  -> hold consumed (success)
release -> hold returned (failure)

This way a failed generation never costs the user, and a user can't fire 10 concurrent jobs with credits for one. I also did FIFO consumption across credit packages so credits with the nearest expiry get used first — fairer for users and simpler for accounting.

4. Lessons

Put external dependencies behind interfaces before you think you need to.
For async jobs, design the failure path first (release credits, retry, timeout) — the happy path is easy.
Re-host anything an external API generates.
A "frozen" intermediate state for credits/money is worth the extra table.

If you want to see the end result, it's live at aiclotheschanger.me. Happy to answer questions about the architecture in the comments.