AI image generation can look like a simple request-response feature.
A user enters a prompt, clicks generate, and waits for an image.
For a prototype, that can work. For a production product, it usually becomes fragile.
Image generation may take several seconds or minutes. A provider may return a job ID first and the final result later. Some results arrive through webhooks. Others need polling. Requests can fail, time out, or finish after the user has already left the page.
That is why AI image generation is usually better designed as an asynchronous workflow.
This is the approach I use while building Image 2, a multi-model AI image generation and editing platform.
The Simple Version
The most direct implementation looks like this:
User -> API route -> AI provider -> result -> user
This is easy to understand, but it has several problems:
- the HTTP request may time out
- retries can create duplicate jobs
- the frontend depends on provider latency
- billing or credit logic becomes harder to protect
- generated media may live on temporary provider URLs
- failures are difficult to repair after the request ends
This pattern is fine for demos. It is not ideal once real users, payments, storage, and retries are involved.
A Better Shape
A more reliable version separates the user request from the generation work:
User request
|
v
Create generation record
|
v
Push message to queue
|
v
Background worker submits job
|
v
Webhook or polling gets result
|
v
Store asset and update status
The user-facing request returns quickly after creating the task. The UI can then show a status such as queued, processing, completed, or failed.
The slow work happens in the background.
Why Async Helps
Async generation gives the system more room to recover.
If the provider is slow, the task can remain in processing.
If the provider fails, the system can mark the task as failed and roll back credits.
If a webhook is missed, a scheduled job can poll the provider later.
If both a webhook and a polling job see the same final result, the system can ignore duplicate settlement.
That last point matters. In production, the same generation result may be observed more than once. Final states such as completed and failed should be idempotent.
A Small State Model
You do not need a complicated state machine to start. A simple model is often enough:
created -> queued -> processing -> completed
|
-> failed
Each state should mean something clear:
-
created: the request was accepted -
queued: background work has been scheduled -
processing: the provider job has started -
completed: the final asset is available -
failed: the task cannot complete
The important rule is that terminal states should be protected. Once a task is completed or failed, retries and duplicate callbacks should not apply the same result again.
Store the Result Yourself
Many AI providers return a URL for the generated image. That URL may be temporary or provider-controlled.
For a real product, it is often safer to copy the result into your own storage:
Provider result URL -> app storage -> stable asset URL
On Cloudflare, that might mean storing the final image in R2 and serving it from your own CDN domain.
This makes future product behavior easier:
- user ownership checks
- downloads
- cleanup
- moderation
- stable previews
- billing history
The AI provider creates the image. Your application should own the product workflow around that image.
Where Multi-Model Apps Get More Complex
Async workflows become even more useful when an app supports more than one model or generation style.
A text-to-image model, an image editing model, and a reference-image workflow may all behave differently. Some may return results quickly. Others may need a provider-side job ID. Some may support high-resolution output. Some may have different input limits.
A product like Image 2 can expose those workflows through a simpler user interface while keeping provider-specific details in the backend. For example, separate pages such as the GPT Images 2.0 image generator or the Nano Banana 2 AI image generator can still share the same general task lifecycle.
That is the main benefit of designing around the workflow instead of designing around one provider API.
Final Thought
AI image generation is not just a model call. It is a product workflow.
For experiments, a synchronous API route is enough. For production, async architecture gives you a cleaner way to handle slow jobs, duplicate callbacks, retries, storage, moderation, and credit accounting.
The model creates the image. The workflow makes the product reliable.













