Your checkout endpoint has a 400ms P95. Profiling shows 70% of that is DB reads.
You add a read replica and point all SELECT queries at it. P95 drops to 90ms. The team celebrates.
Two hours later, support tickets flood in. Customers update their shipping address but see the old one on the confirmation screen. One customer gets charged twice because the "order already exists" check read stale data and missed the duplicate.
Here's the setup:
• Primary → handles all writes, replication lag ~200ms
• Replica → handling 100% of reads
• Affected flows → profile updates, order dedup, payment idempotency
The replica is working exactly as designed. That's the problem.
What do you do?
A) Read-your-writes consistency: route a user's reads to primary for a short window after they write.
B) Synchronous replication: make primary wait for replica to confirm before ACKing the write.
C) Monitor replica lag + retry: detect when lag exceeds a threshold and fall back to primary.
D) Route critical reads to primary: replicas only serve non-critical reads like analytics.
All four are real patterns running in production. Only one solves the stale-read problem without killing the performance win you just shipped.
Pick one (A, B, C, or D) and tell me why. Full breakdown in the comments, including which answer is the senior engineer trap.
If your team has ever added a read replica and spent a week debugging stale data, share this with them.
Drop your answer 👇













