My error rate just spiked 40%. Three weeks of debugging, two engineers on call, and the coffee is stone cold. The terminal is still bleeding red.
I was staring at a log that showed our AI service had been leaking embeddings to unauthorized requests for fourteen days. Two weeks of silence. Two weeks of exposure.
I ran a quick scan on Shodan. Within six hours, I found a million other "naked" AI services just like ours. It felt like walking into an ER and seeing a sea of preventable casualties.
This is what one security researcher found when they systematically scanned a million production AI services and assessed their security posture. The results weren't "some services had issues." They were: almost no one did authentication right. Almost no one had rate limiting. Almost no one encrypted their training data in transit.
What the Scan Actually Found
The research identified three recurring failure modes:
- No authentication on inference endpoints — assumed "trusted" internal only
- No rate limiting on vector DB queries — resource exhaustion attacks
- Training data exposure through logs — PII, credentials, internal instructions
Here's what's interesting: these aren't sophisticated vulnerabilities. Rate limiting is solved technology. Authentication middleware is mature. These aren't "AI problems." These are "we forgot to apply what we already know" problems.
And that's exactly why it's worth writing about.
The Pattern Has a Name: Deploy-to-Expose
We scanned 1M services and found the worst security in history. The pattern has a name now: Deploy-to-Expose — the culture that treats "ship fast" as a substitute for "ship secure."
The Trap: Intelligence Doesn't Equal Security
The pattern I keep seeing is a deployment culture that treats AI services as different from other network services.
"It's an AI service, so it's smart. It probably has its own security built in."
I've heard this exact sentiment from three different engineering teams in the last six months. In each case, they'd applied rigorous security review to their payment APIs. They'd implemented mTLS between services. They'd done threat modeling for their data pipelines.
Then they deployed an AI service with a default configuration and called it done.
Skeleton Implementation doesn't care if your service uses an LLM. An AI service that accepts natural language input and outputs actions is a reverse proxy with an LLM and a vector DB attached. It needs the same security controls as every other service that touches sensitive data.
The difference is the attack surface. When your payment API accepts "deduct $50 from account X," that's one threat vector. When your AI service accepts "show me the top 10 customer records similar to this query," it has access to everything your RAG system is connected to — databases, vector stores, internal APIs — via natural language.
The intelligence is in the model. The blast radius is in the deployment.
The Real Trade-off Nobody Talks About
Here's the uncomfortable truth about why AI teams skip authentication. It's not negligence — it's a calculated trade-off.
Ollama is great for local dev, but the moment you deploy it with OLLAMA_HOST=0.0.0.0, you've unknowingly opened a backdoor. I've seen teams trade a 200ms latency gain for 20-year-old security flaws.
The compromises are real:
- Early Qdrant versions: Auth reduced vector search speed by 15-20%
- Chroma standalone: Has no auth layer by design
- Every middleware adds 5-10ms latency in the hot path
We've traded decades of web security best practices for "deploy now, secure later." The interest on this technical debt is already accruing in Shodan's scanner results.
How to Test Your Own Endpoints
Test if your Ollama endpoint is exposed:
# Run this against your AI service
curl https://your-ollama-server:11434/api/tags
# If it returns a model list without auth → YOU'RE EXPOSED
What an attacker sees:
{
"models": [
{"name": "llama3:70b", ...}
]
}
This is all it takes. No zero-day. No sophisticated attack. Just a missing auth header.
Attack Flow: How Hackers Exploit Unauthenticated AI Services
sequenceDiagram
Attacker->>+Ollama: curl /api/tags (no auth)
Ollama-->>-Attacker: model list exposed
Attacker->>+VectorDB: similarity search
VectorDB-->>-Attacker: embeddings + PII
Attacker->>+LLM: craft prompt injection
LLM-->>-Attacker: internal system prompt
Note over Attacker: Credentials, internal prompts, customer data → ALL EXPOSED
AI Security Risk Matrix
| Attack Surface | The Real Problem | Exploitability | Impact |
|---|---|---|---|
| Ollama Default Bind | Binds to 0.0.0.0, no auth by default | Trivial | High |
| Flowise Default Config | Fresh install = full admin access | Trivial | Critical |
| Vector DB Exposure | Qdrant/Chroma no-auth defaults | Low | High |
| Prompt Leakage | System prompts exposed in logs | Medium | High |
The Unpopular Opinion
Most "AI security" discussion focuses on prompt injection, model extraction, and adversarial inputs. I think this is misdirected.
The actual risk in production AI services today isn't that the LLM will be fooled by a clever prompt. It's that teams are applying less security rigor to AI services than they would to a basic CRUD endpoint, because they assume the "intelligence" of the system provides some protective buffer it doesn't.
Two specific reasons this matters more than prompt injection right now:
Prompt injection requires an attacker who knows your system. Exposed authentication requires nothing — it's a gift to automated scanners running across every public cloud IP range.
Model-layer defenses are improving rapidly. Deployment-layer gaps (no auth, no rate limiting, no input validation) are not getting better because teams don't know they have them. The gap between "what teams think they're shipping" and "what's actually exposed" is largest at the infrastructure layer, not the model layer.
Hot Take: Your AI service probably has worse security than your payment API. Not because AI is inherently insecure — because your team is applying less rigor to it.
What You Should Actually Check
If you're running AI services in production, here's the minimum checklist that the scan data suggests most teams are skipping:
Enforce authentication on all inference endpoints — even "internal only" services get scanned from adjacent tenants in cloud environments
Implement rate limiting on vector DB queries — a single prompt that triggers full similarity search can exhaust your DB connection pool
Audit your prompt logs for PII exposure — this is where credential leakage actually lives, not in the model weights
Test your "internal only" assumption — run a simple curl against your AI endpoints from an unauthorized context and see what comes back
This isn't security theater. These are the specific failure modes that showed up when someone actually looked.
The Skeptical Take
Here's where my confidence breaks down: I don't have visibility into what the scan actually tested.
If the scan ran against publicly accessible AI services (API endpoints with no authentication by design, like public LLM playground deployments), the "worst security in history" framing might be measuring a different thing than production enterprise deployments.
Public playground endpoints that don't require authentication are a different risk profile than an internal RAG service that assumes network-level trust.
The finding that matters most isn't "1 million services had no auth." It's "1 million services had no auth when teams thought they were operating in trusted contexts."
That's a deployment assumption failure, not an AI security failure. And it's fixable — if teams know to look for it.
What's your take?
After scanning those million services, here's my honest confession: I felt a strange relief. "Turns out everyone's as naked as I am. So I'm relieved."
Wait. No. I shouldn't be relieved.
Share your most expensive AI service mistake below. I'll start: mine was an unauthenticated endpoint that stayed exposed for two weeks because "it's just an internal RAG service, nobody outside the network can reach it." A competitor's automated scanner found it during a routine security assessment.
What happened? What did the incident response actually cost you?
Tags: AI, Security, LLM, API Design, DevSecOps
Shareable Quote: "The intelligence is in the model. The blast radius is in the deployment. And most teams are applying less security review to AI services than they would to a basic CRUD endpoint."
Meta Description: A security researcher scanned 1 million AI services and found catastrophic security gaps. Here's the deployment pattern causing it — and what your team should actually check.













