You're shipping an AI chat product. The LLM streams ~40 tokens/sec per user.
50,000 concurrent users on launch day. Browser clients only. Tokens flow one way: server → user.
Your team meets to pick the transport. Everyone shows up with a strong opinion.
Here's the setup:
• Frontend: React in the browser
• Backend: Python (FastAPI) behind an ALB
• Payload: UTF-8 text tokens, ~5–20 bytes each
• Direction: server pushes, client just renders
• Reconnects must be invisible (mobile networks drop constantly)
The team lead says "WebSockets, obviously." The platform engineer pushes back. What do you ship?
A) WebSockets — the default for "real-time," full-duplex, every chat app uses it.
B) Server-Sent Events (SSE) — one-way HTTP stream, native browser EventSource, auto-reconnect built in.
C) gRPC server streaming — HTTP/2, binary frames, backpressure handled for you.
D) Long polling — boring, battle-tested, works through every proxy on earth.
Three of these are real production patterns. One is what most teams default to and quietly regret six months in.
Pick one — A, B, C, or D — and tell me why. I'll drop the full breakdown in the comments (including why the most popular answer is the senior engineer trap).
If your team is about to argue this exact tradeoff, send them this post before the meeting. Save yourself a whiteboard session.
Drop your answer 👇











