A few weeks ago I wrote about the information gap between what AI search engines confidently tell people and what is actually happening inside a local business right now. The response to that post — especially one exchange with a researcher named Cheng — pushed me somewhere I didn't expect to go this fast: out of the whiteboard and into an actual kitchen.
This is an update on where things stand, and on two open questions I still don't have good answers to.
Out of the Lab, Into the Floor
Komiru is no longer just a framework on paper. We've started a live pilot with a real, operating local business in Nagano — not a demo environment, not a mockup, but a place with actual customers, actual staff, and a actual weekly rhythm of writing down what came in, what's running low, and what changed since last week.
I won't go into the mechanics of how the system works under the hood. That's deliberate. What I can say is that the shift from "this should work in theory" to "a person has to actually do this every week, in between serving customers" has been the most clarifying part of the whole project so far.
A few things became obvious almost immediately that no amount of whiteboarding surfaced:
- The friction of habit formation is real. Writing a structured, timestamped observation every week is a behavior change, not a feature toggle. The first few weeks are the hardest, and that's exactly the period where the corpus is most fragile and most valuable.
- "Specificity" has a human cost. Asking someone to write "5kg of bracken from the Ōoka cooperative, no restock expected" instead of "fresh local vegetables" is asking them to think differently about their own business. Some people find this energizing. Others find it exhausting. Both reactions are useful data.
- The gap between intention and output is where the real design work lives. Most of what I've been iterating on isn't the data layer — it's the human layer. How do you make it easy, fast, and even satisfying for a busy owner to produce something an AI can later cite?
None of this invalidates the original thesis. If anything, watching it happen in a real space made the thesis feel more urgent, not less. But it also reframed the problem: this isn't purely an infrastructure problem anymore. It's an infrastructure problem and a habit-formation problem, running on the same clock.
The Conversation That Wouldn't Let Me Off the Hook
The other thing that's happened since the last post is an ongoing exchange with a researcher in Ireland who works on production-scale LLM deployment. I'm going to call him by his first name, Cheng, since that's how the conversation has felt — less like a review and more like an ongoing argument I'm grateful for.
Cheng raised two points that I haven't been able to stop thinking about, and I want to be honest that I don't think either is fully resolved.
The first is about fabrication. My original framing leaned on the idea that sustaining 52 weeks of internally consistent, hyper-specific false data would be too costly for a bad actor to bother with. Cheng's pushback was direct: that assumption is already out of date. Generating a year's worth of plausible, weather-adjusted, internally consistent "facts" via automation is not hard anymore. If specificity alone was supposed to be the integrity mechanism, it isn't enough on its own.
I think he's right, and I think the honest answer is that specificity was never meant to be a wall — it was meant to be a cost. The question I'm sitting with now is: what raises the cost further, without turning the whole system into a verification bureaucracy that defeats the purpose? I don't have a clean answer. I have some directions I'm exploring, but nothing I'd call a solution yet, and I'd rather say that plainly than pretend otherwise.
The second point is about trust — what Cheng called the "Yelp problem." Even with perfectly authentic, perfectly structured data, why would an LLM (or the retrieval system underneath it) prefer a small, newly-published source over the accumulated authority of an established platform? Domain authority isn't just a search ranking artifact — it's baked into how these systems reason about what's worth citing at all.
This one stings a bit more, because it's not something a better data format can fix. It's closer to a chicken-and-egg problem: the corpus needs time and consistency to earn trust, but trust is exactly what determines whether anyone — human or AI — ever encounters the corpus in the first place.
Where That Leaves Things
I don't have a tidy resolution to either of these, and I think that's the honest state of the project right now. What I do have is a live pilot that's forcing both questions to stop being abstract. Every week that passes is either evidence for the thesis or evidence against it, and for the first time that evidence is coming from a real place with real stakes, not from my own assumptions about how busy people behave.
If you've worked on problems at the intersection of provenance, trust calibration in retrieval systems, or getting non-technical people to sustain a data-producing habit over months — I'd genuinely like to hear from you. Cheng's questions opened up more than they closed, and I suspect the people who can help me think through them aren't all in one field.
More updates as the weeks accumulate. That's rather the point.













