I Built PromptShark in One Evening to Stop My AI Agents from Burning Money on Infinite Loops

If you've ever built an AI agent using function calling (tool use), you probably know the exact feeling of pure panic. You test your agent, it seems fine, you go grab a coffee, and come back to see it got stuck in an infinite loop—calling the exact same tool, failing, and retrying 500 times.

Suddenly, your OpenAI API balance is completely drained. 💸

After burning my own credits one too many times, I decided enough was enough. I built PromptShark—an open-source MITM proxy designed specifically to catch these loops, save your wallet, and make debugging agents painless.

🦈 What is PromptShark?

PromptShark is a drop-in local proxy that sits between your code and the LLM provider. The best part? Zero code changes required.

You don't need to rewrite your agent or install heavy SDKs. Just change your base_url to http://localhost:8080/v1 and keep your API key as is. PromptShark intercepts the traffic and does the magic.

Here are the main features:

1. 🛑 Infinite Loop Detection

PromptShark tracks your agent's sessions and hashes the request payloads. If it detects that your agent is stuck in a repetitive loop (e.g., repeatedly passing the exact same wrong arguments to a tool and failing), it instantly blocks the request before it hits the OpenAI API. No more burned money.

2. ⏪ Time-Travel Replay & Caching (The Killer Feature)

Debugging multi-step agents is expensive because when it fails at step 10, you usually have to restart and pay for steps 1-9 all over again.

PromptShark caches the session locally in SQLite. If your agent makes a mistake, you can:

Go to the UI and rewind the session.
Edit the JSON payload (tweak the prompt or fake a tool response).
Resume the execution. Steps 1-9 will be instantly served from the cache for free, and the API will only be called for step 10.

3. 📊 Real-time Dashboard

It comes with a clean, dark-mode UI where you can track Time-To-First-Token (TTFT), monitor token usage (prompt vs. completion tokens separated per step), and see your exact cost in USD in real-time.

🛠️ Under the Hood

I wanted this tool to be fast, reliable, and easy to run locally:

Go handles the proxying, streaming, and WebSockets (perfect for network IO).
C++ Engine powers the fuzzy loop-detection logic via IPC.
SQLite (WAL mode) ensures blazingly fast caching without needing to spin up Postgres.