I've been working with AI agents daily for the past few months – building them, testing them, using them for everything from email triage to meeting prep. I thought I had a pretty clear picture of what they're good at and where they fall short. I designed the skill system myself, reviewed every integration, mapped out every capability 😄
But a few times the agent genuinely surprised me. I'd throw some random life problem at it – something I never designed it for – and watch it figure things out using tools that were originally built for completley different purposes.
Here are five of those moments.
1. "Can you add subtitles?"
I record short videos for my personal social media – just me talking to a camera for a minute or two. Subtitles were always the most tedious part of the process: export the video, upload to a transcription service, wait, download the subtitle file, import into a video editor, adjust timing, re-export. Twenty minutes minimum for a two-minute clip. I'd skip it half the time, which meant lower engagement on every video.
One morning I was running late and just sent the raw video to my assistant in Telegram: "Can you add subtitles?" I didn't think it through. I just asked.
Three minutes later, the assistant sent the video back. Subtitles burned in. Timing synced. Clean.
What happened under the hood: the assistant extracted the audio track from the video with ffmpeg, sent it to Whisper for transcription with timestamps, wrote a Python script on the fly to convert the Whisper output into .srt subtitle format, then ran ffmpeg again to burn the subtitles back into the video with white font and positioning. Four tools chained together, none of which were designed for "subtitling." Whisper was built for transcription. ffmpeg was available in the sandbox for general media processing. The assistant connected them because the request made the connection obvious.
Nobody on my team designed a subtitle feature. There's no "subtitle skill" in our catalog. The assistant invented the workflow on the spot, and now I use it three or four times a week. The whole process takes less time than it used to take me to open the video editor.
The thing that stuck with me: the most useful "features" aren't features at all. They're combinations of existing capabilities that emerge when you give an agent enough tools and let it reason about how to chain them.
2. "I'm here – plan my afternoon"
I was in Croatia a few weeks ago, on Korčula. Not the busiest tourist island – the kind of place that doesn't show up in many guidebooks, but has beautiful coastline and old villages scattered across the hills. I'd finished what I came to do and suddenly had four free hours. I dropped a GPS pin in Telegram and typed: "I want to plan a road trip by car from here. Drive around the area and stop at the best sights, viewpoints, hidden gems, and local landmarks. Give me a circular route".
Five minutes later, the assistant sent back a Google Maps link. A full route with seven waypoints: Vela Spila cave, the hilltop viewpoint at Hum, the coastal villages of Prigradica and Prižba, a stop in Brna, Žitna Beach near Zavalatica, and Pupnatska Luka to finish. Total drive time between all stops: under 80 minutes. Every stop was rated, and the order minimized backtracking.
I followed the route. It was genuinely good – better than what I'd have cobbled together from TripAdvisor in 30 minutes of Googling.
The route my AI agent created for me was genuinely good
There's no "trip planner" skill. The assistant used web search to find attractions near the coordinates, filtered by ratings and travel time, and formatted a Google Maps URL with waypoints in the optimal order. Search, reasoning, and URL construction.
The next time I traveled – different country, different mood – I did the same thing with different constraints. "I have 6 hours, I want good food and architecture, skip the tourist traps." Completely different route, same pattern, same quality.
This is the thing that specialized apps get wrong. They build a trip planner with a database of attractions and a fixed UI. The assistant needs search, a sense of what "good" means in context, and the ability to format a URL. General intelligence plus tools beats a purpose-built app, more often than you'd expect.
3. "Find someone who hauls away furniture, and get me prices"
I was on Korčula and needed to clear out a place – bags of garden waste, some old furniture, and a few broken appliances. The old way: Google "odvoz otpada Korčula," try to parse Croatian results, call local numbers, stumble through a conversation in a language I don't speak. An hour minimum, assuming anyone picks up.
Instead, I told the assistant what I needed gone and where. "Find companies that can deliver a waste container to this address. Write to them in Croatian."
The assistant searched for local waste removal services, found three with contact details, and drafted a personalized inquiry email to each – in Croatian. Not a generic "I need junk removed" message. It wrote a properly structured request: the address, a breakdown of the waste types (garden waste, old furniture, appliances), asked about container delivery, pricing, how long the container could stay, and whether different waste types needed to be separated. Five specific questions, polite formal Croatian. Better than I could have written in English, honestly.
The next morning I asked if anyone had replied. Two had – in Croatian. The assistant translated both: Company A could deliver a container on Wednesday for €70. Company B quoted €90 but offered next-day delivery. I told the assistant to go with A but try to get them to come a day earlier. It wrote back in Croatian, negotiated, and confirmed – Tuesday afternoon, €70.
The whole interaction cost me maybe five minutes of actual attention, spread across two days. I don't speak Croatian. The assistant handled the entire negotiation without me reading a single word of the original emails.
My AI assistant searched for local waste removal services, and drafted a personalized inquiry email to each
What the assistant did wasn't complex. Search, draft, translate, send, check, negotiate, confirm. Any bilingual human assistant could do it. But that's exactly the point – this is work that's too small to hire someone for and too tedious to do yourself, especially across a language barrier. It sits in a dead zone that no app addresses because it's too specific to productize. "Find me a waste removal service on a Croatian island and negotiate in Croatian" isn't a market. But "handle the tedious procurement of any one-off service, in any language" absolutely is.
The assistant doesn't care whether I need a plumber, an electrician, or a babysitter – or what language they speak. The pattern is identical: find options, reach out, collect responses, present a comparison. The specificity of the task is what makes it valuable, and what makes it impossible for a specialized app to cover.
4. "Wait – check that again"
We built a skill called Second Opinion. The idea: after the assistant gives you a response, you can ask a different LLM from a different model family to weigh in on the same question. Like asking a second colleague – they have access to the same information, but they might notice different things.
I was preparing talking points for a partner call. I asked the assistant to help me structure the pitch: what Amplify does, our technical edge, integration options. It came back with a solid, logical breakdown – capabilities first, then differentiators, then next steps. Well organized, easy to follow.
Then I ran Second Opinion.
The second model didn't find errors. It suggested a completely different structure: lead with the partner's problem, show how we solve it, save the technical details for the Q&A. Same information, rearranged around the audience instead of around the product. I used that version. The call went well.
I built this feature thinking developers would use it for code review. Instead, the most common use case turned out to be decision validation. People run Second Opinion on proposals, on important emails before sending, on plans they've already committed to. They use it to get a different perspective on something they've already thought through. It's the same instinct that makes you ask a coworker "does this look right?" – except it's instant and available at 2 AM.
What makes a second AI valuable is that it genuinely reasons differently. Two models reading the same brief will emphasize different things, and that diversity of attention is useful.
5. "Should I rent this apartment?"
My friend was apartment hunting and asked for help. If you've done this recently, you know the drill: find a listing, check the price, try to figure out if the neighborhood is good, wonder if you're overpaying, spend 30 minutes on Google Street View and forum threads, and still feel uncertain.
He sent me a listing URL from Daft.ie. I forwarded it to the assistant with one line: "Check this apartment – worth viewing?"
The assistant parsed the listing page – price, square meters, BER energy rating, listed amenities, location. Then it searched for average rental prices in that specific neighborhood and compared. This listing was about 15% above the area median for a similar-sized flat. It checked the neighborhood: walkability score, public transport connections, nearby shops and schools, any notable mentions in recent local news. It looked at how long the listing had been up and whether the price had been reduced – it had, twice, from a starting price €400 higher.
What it sent back was a one-page verdict: "Above market rate even after two price drops. The neighborhood scores well for transport and walkability but has limited grocery options within walking distance. BER rating D2 – expect above-average heating costs. Recommend viewing only if they're open to negotiating below €X."
I sent him the summary. He didn't rent that apartment.
This isn't a real estate tool. The assistant has no property database, no MLS access, no special API. It used web search, basic arithmetic, and structured reasoning to compress 30 minutes of manual research into a 60-second read. The same approach works for evaluating a used car listing, a freelancer's portfolio, or a conference you're considering attending.
The common thread is due diligence – the boring, essential research that most people skip because it takes too long.
What connects all five
None of these are features. No product manager specced a subtitle tool. No designer mocked up a trip planner. No one wrote a PRD for "apartment due diligence." They emerged because a general-purpose agent with access to the right tools can decompose any problem into search, reason, and act.
The pattern across all five is the same: I had a real problem, I described it in plain language, the assistant figured out what tools to chain, executed, and came back with a usable result.
Chatbots give you information. Agents do work. That's the shift.
I should be honest about the limits. It doesn't always get it right. The trip planner once recommended a restaurant that had closed months earlier – great reviews online, just no longer open. I've learned to trust but verify, especially when stakes are high.
But the ratio of useful to imperfect is high enough that I now default to asking the assistant first, for almost everything. Even when it's imperfect, it compresses an hour of tedious work into five minutes of reviewing a result. And five minutes of review is always faster than an hour of doing it yourself.
That's why we built Amplify – so anyone can set up a personal AI assistant and discover their own unexpected use cases. The five I listed here are just mine. If you build something unexpected with it, I'd genuinely like to hear about it.
**
Yevhen Fychak is CTO & co-founder of Amplify. He writes about building AI agents and the things they do that he didn't expect.














