25
Points
17
Comments
epitrochoid413
Author

Top Comments

epitrochoid413May 29
I built a context-aware furigana converter for Japanese text, files, and web pages.

The main problem I wanted to solve was that simple dictionary-based furigana works well for common cases, but breaks on words where the reading depends on context:

* 市場: いちば or しじょう

* 大分: おおいた or だいぶ

* 人気: にんき or ひとけ

* 最中: さいちゅう or さなか or もなか

* 方: かた or ほう

The engine is a hybrid system:

* Sudachi for tokenization, base forms, POS, and candidate readings

* Expanded dictionary coverage for compounds and fixed expressions

* Custom rules for counters, suffixes, rendaku patterns, and phrase overrides

* ModernBERT fallback for 144 especially context-dependent target words

I have been testing it against an LLM-assisted benchmark of 7,500 Japanese lines. On the current benchmark, it gets about 12 wrong readings per 1,000 tokens. I treat that as a practical regression benchmark rather than a formal academic evaluation, but it has been useful for comparing versions and catching regressions.

The hardest remaining cases are personal names, place names, rendaku, rare vocabulary, and domain-specific terms.

I would especially appreciate examples where it gets the reading wrong, since those are the most useful for improving the system.

uasiMay 29
Got an incorrect result on my first try. Input was 振り仮名変換器の性能が如何程か試してみよう. It returned 如何(どう)程(ほど) instead of 如何(いか)程(ほど).

Regardless, I'm impressed with the tool!

bluechairMay 29
Fantastic tool and love the delivery; no sign up required. Interested to hear how you pulled that off.

Also interested to hear if you plan to eventually support an option to add pitch accent; I've never seen what training material exists for that or how that is supported in unicode.

altiluniumMay 29
It really works. Very cool. I’ve been looking for this kind of service for a long time since I started learning Japanese, and I’ve rarely been satisfied with the available services.
k-taro56May 29
I’m Japanese. I was surprised that it was able to answer correctly even when I entered commonly seen difficult-to-read place names. However, there seem to be cases where it may incorrectly read “今日” when it should be read as “こんにち.” Example: 今日の日本社会では、少子高齢化が大きな課題となっている。

Also, it’s disappointing that Japanese does not appear even when I select it.

Please let me know if there’s anything I can do to help.

sollnissMay 29
Uh, in 田中さんは今何をしている, 今何 comes out as こんなに.
Visit the Original Link

Read the full content on ezfurigana.com

Source
ezfurigana.com
Author
epitrochoid413
Posted
May 29, 2026 at 12:24 PM


More Top Stories

koenvangilst.nl May 29
Notes from the Mistral AI Now Summit in Paris
11017 commentsby vnglst
Details
owenmcgrann.com May 29
The Dead Economy Theory
180212 commentsby WillDaSilva
Details
inkandswitch.com May 29
Bijou64: A variable-length integer encoding
14154 commentsby justinweiss
Details
techcrunch.com May 29
Robinhood now lets your AI agents trade stocks
1511 commentsby wapasta
Details
rockstarintel.com May 29
GTA 6 Developers Unionize
281145 commentsby AndrewKemendo
Details
jeffgeerling.com May 29
It's hard to justify buying a Framework 12
55124 commentsby watermelon0
Details
👋 Need help with code?