My Company Spent $500K on an AI Platform. I Proposed an Open-Source Alternative and Got Fired. 6 Months Later, the Demo Crashed at 4 AM.

Based on real benchmark data. About hubris, technical bias, and one last weekend project.

1

3:47 AM. My phone buzzed me awake.

Unknown number. Beijing area code. I hesitated for three seconds, then picked up.

"Hey… is this Chen Yuan? It's Zhao, from the tech center."

Zhao. My old team lead. I'd been gone for six months. He'd never reached out. Not even a Like on my Moments posts.

But I still had his number saved.

"The customer demo crashed," his voice was low, like he didn't want anyone around to hear. "The AI customer service system. 200+ people on site. Every response is gibberish. We… we can't fix it. Can you come take a look?"

In the background, I could hear chaos. Someone shouting "it's down again" and "logs are all OOM." Another voice kept repeating "the vendor's remote session still hasn't connected."

Zhao didn't hang up. He didn't push. Just held the line.

I stayed quiet.

Not out of resentment — that's not how it works in this industry. But because I knew this was coming. I'd known it a month ago, when I saw someone's WeChat post celebrating the system's launch.

It's a strange feeling. Like watching a car slowly drift toward a cliff while you're yelling from behind and nobody listens. And then one day, the phone rings.

"Still at the same address?" I asked.

"Yeah."

"Forty minutes."

I hung up and turned on the desk lamp. My cat stirred beside me, then went back to sleep.

I pulled my mini workstation from under the desk — the one I'd built myself six months ago. When the screen lit up, the benchmark data from that time was still sitting on the desktop.

2.3 seconds. 180 milliseconds.

Two numbers. I hadn't forgotten either of them.

2

Here's the irony: I was an iOS developer. Not an ML engineer. Not an AI architect. I studied software engineering at a second-tier university, joined this company right after graduation, and stayed for three and a half years.

Three and a half years — enough to go from "how do I align this button" to owning an entire module. But at this company, three and a half years was an awkward spot. Not new enough to be cheap. Not old enough to have a voice.

I was invisible. The kind who never spoke in meetings and never broke prod. Performance reviews: B+, steady. Bonuses: median. Year-end chat: "Strong execution. We'd like to see more leadership." Same script, every year.

Then the company launched Project Smart Service — an LLM-powered customer service platform that was supposed to handle 80% of pre-sales inquiries. ¥20 million budget. One-year timeline. The procurement team's KPI was "bring in a world-class AI platform."

They bought a full enterprise suite from a major vendor. ¥5 million in licensing fees. 30% annual maintenance on top. Eight GPU servers — another ¥800K. The product director personally approved it. The PPT said: "The industry's first end-to-end intelligent solution."

At the kickoff meeting, the big conference room was packed. The product director stood on stage, beaming. Everyone clapped. I clapped too. Afterward, I was smoking by the gate and old Zhang from Ops walked past and muttered: "¥5 million for an API token. Easy money."

I didn't reply. But that night, I went home and opened Hugging Face.

Over the next two weeks, I ran 28 benchmark tests on nights and weekends. I wasn't deep into LLMs yet, but I found the vendor's solution ranked 17th on a public Chinese NLP benchmark. The top 16 were all open-source.

I hooked into their API and ran my own tests. Then I pulled a couple of open-source models on my workstation, picked a lightweight quantized version — VRAM usage under half, ran as smoothly as a local app.

When the results came in, I ran the tests a second time to be sure.

I hadn't made a mistake.

It was real — the ¥5 million platform couldn't even beat a lightweight quantized open-source model.

3

Two weeks later, I presented at a technical review.

I had a full report:

The vendor platform's API performance: 2.3s average latency, 67% Chinese intent accuracy
My open-source baseline: a lightweight Chinese model, 4-bit quantized, served through a local inference framework — 180ms latency, 89% accuracy
Hardware cost comparison: their solution needed 8 GPU servers (~¥800K); mine ran on a single GPU in the mini workstation I built myself, with VRAM usage under half
Cost projection at 100K daily calls: ¥137 per 10K calls for their solution, ¥1.5 for mine

I presented for about 15 minutes. When I put the comparison table on the screen, the room went quiet for a few seconds.

Then the product director laughed.

"A front-line dev knows AI now?"

The laughter spread like a lit fuse. A couple of colleagues glanced at me and chuckled.

"Xiao Chen," he leaned back, his tone like he was talking to an intern, "your little DIY thing — no SLA, no technical support. If something breaks, who takes responsibility? Their product is Fortune 500-grade, ISO-certified, compliance-audited, used globally. What do you have to compare?"

I said: "Accuracy and cost."

He didn't respond to that.

"I'm not targeting you personally," he shifted, "but think about it — a ¥20 million project. If something goes wrong, does the partner take the hit? Or does your weekend project?"

I opened my mouth. Closed it. Because I knew — in his logic, this was never a technical question. It was a question of trust. Procurement's trust. The client's trust. The compliance red lines of a listed company.

A rank-and-file dev couldn't carry that.

I put my report away. I didn't say another word.

4

The next two months, I was slowly pushed to the margins.

Code reviews got strict. A merge request that used to pass in half an hour now took two days. Tickets dried up — some weeks I got only two bug reports. My desk was moved from the window — good light — to a corner next to the printer. Humming all day.

I didn't appeal. Instead, I spent more time on my own solution. I swapped the model for a smaller variant and ran A/B tests. For the specific task of intent classification, the smaller model was only 2% less accurate, with nearly half the latency. The main model handled primary inference; the smaller one served as a fallback. I even built a simple chat demo with Streamlit.

Month four, HR called me in.

"The company is restructuring. iOS demand is shrinking. Your position has been eliminated."

N+1 severance. Sign and leave. From the start of the conversation to signing, it took less than 20 minutes.

As I cleared my desk, I found a sticky note in a drawer — the API key for that enterprise platform. I tore it in half and threw it in the trash.

Walking out the gate, the product director stepped out of a meeting room to take a call. He saw me, nodded, said nothing, and turned away.

I carried a cardboard box with three things: my badge, a succulent plant, and a charger.

That afternoon at home, I opened my workstation and ran the full benchmark one more time. Stared at the numbers. Packed the code and uploaded it to a private Git repo.

Not for revenge. Just because those nights I'd spent deserved to exist somewhere.

5

Forty minutes later. 4:27 AM.

I was sitting in a side seat in the old company's big conference room. Across from me: a row of panicked faces. The CTO. The product director. Three Ops engineers on night duty. On the main screen, the real-time monitoring panel of the AI customer service system was blood-red — all 5xx errors.

From the breakout room next door, I could hear 200+ real users venting in the discussion forum. "This AI is artificial stupid." "It can't answer three questions in a row." "I'm out, I'm out."

"It's been barely usable for three months," the Ops lead wiped his forehead with his sleeve. "Today's stress test went from 37 concurrent users to 65. The load balancer collapsed. Cascading failure — three distributed nodes all OOM."

He paused.

"The vendor's support said to wait until morning. The India team… it's midnight there."

The product director sat in the corner, staring at his phone. The screen was on, but I noticed he wasn't actually typing — just scrolling back and forth. He didn't look at me.

The CTO cleared his throat. He spoke slowly, like he was choosing every word: "Chen Yuan. I heard you had a proposal before you left."

I pulled a portable SSD from my bag.

"Give me 15 minutes."

6

I plugged the drive into a spare machine in the conference room.

Installed the inference framework. Loaded the 4-bit quantized model weights. Routed traffic to the demo environment.

When the first log line appeared, the Ops lead leaned in to read it.

I kept typing.

5 minutes. The error rate on the monitoring panel started dropping from 100%. The first successfully routed request appeared in the logs — 200, 180ms.

8 minutes. Error rate down to 3%. Users in the discussion forum started receiving coherent replies. Someone typed: "Finally, it works."

14 minutes 30 seconds. Zero errors. Average response time: 170ms.

I pulled the full data from the logs and projected it onto the big screen:

Period	Avg Response	Accuracy	Concurrent Users
Vendor platform (before crash)	2.1s	64%	37
Vendor platform (during crash)	Timeout	0%	62
Open-source solution (after fix)	170ms	90%	65

Silence.

The CTO spoke first: "How long did this take you?"

"Two weekends for the prototype. Three months of nights for optimization."

"And the cost?"

I tapped the SSD. "Everything on here is open-source. The hardware is my own workstation at home."

The CTO turned and looked at the product director. His screen was still dark.

Later that morning, the CTO opened a bottle of whiskey from the conference room cabinet — left over from last year's annual party, apparently. He poured two glasses.

"Would you consider coming back?"

I took a sip.

"Not as an employee. But I'd consider being an independent technical consultant on the client project."

The CTO was quiet for a moment.

"Name your rate."

I named a number — my old monthly salary broken down to an hourly rate. He finished his whiskey and extended his hand.

7

I heard later that the ¥5 million platform was decommissioned after the project ended. The contract wasn't renewed. The client's audit raised two questions: first, the third-party benchmark ranking of the solution; second, the P0 incident report from that night.

The product director was transferred to a marginal department. Old Zhang from Ops messaged me with: "You should've seen his face."

I replied with a smiley emoji.

Last week, the power supply fan on my workstation started making noise. I opened it up, blew out the dust, and organized the old benchmark data into a folder.

I didn't delete it.

It sits on the drive like an engine no one knows is there.

Sometimes the best backup plan isn't the most expensive one. It's the one that was laughed at in a corner, but you still believed was right.

Your plan too — maybe not at a conference, maybe not in a meeting room — but there's always a 4 AM somewhere that will need it.

If you've been in a similar situation — pushing a cheaper, better solution while
the org went with the expensive vendor — how did it end? Did they come around,
or did you move on?
Drop your story below 👇