mehen’s Tier-0 Japanese readability score is the Tateishi simplified RS (Tateishi, Ono & Yamada 1988). It needs only sentence boundaries and script-run features — both computable without a tokenizer.Documentation Index
Fetch the complete documentation index at: https://mehen.ophi.dev/llms.txt
Use this file to discover all available pages before exploring further.
Tateishi RS
Simplified 6-variable form:tateishi_rs with sanity
guards:
- Refuse when
hiragana_ratio > 0.90(the formula is gamed upward). - Refuse when character count < 300.
Jōyō grade proxy
The 2,136-character 2010 Jōyō list maps each character to a grade 1–8 (1–6 elementary, 7 = secondary Jōyō, 8 = non-Jōyōhyōgai). Ships as a ~6 KB static table behind --features japanese-jouyou:
jouyou_grade_mean is a direct school-grade analogue to Flesch-Kincaid: < 3 indicates elementary
reading; > 6 indicates high-school+ technical prose.
Higher-tier formulas (gated)
Behind--features japanese-morph (Lindera + IPADIC) or --features japanese-unidic (Vibrato +
UniDic):
- Shibasaki & Hara — adds bunsetsu and morphological inputs.
- Lee & Hasebe jReadability — modern web-corpus-trained formula.
- Obi / Obi2 — Japanese textbook-grade analogue.
- Mizuno / Goda — alternative readability work.
--features japanese-jlpt
(~300 KB).
References
- Tateishi, K., Ono, Y. & Yamada, H. (1988). A Computer Readability Formula of Japanese Texts for Machine Scoring. Proceedings of COLING-1988: 649–654. ACL Anthology.
- 文化庁 (Agency for Cultural Affairs of Japan, 2010). 常用漢字表. (Jōyō kanji list, 2010 revision.) Official notice (PDF).
- Lee, J. & Hasebe, Y. (2017). jReadability — a web-based Japanese text-readability indexing system. jReadability.
- Sato, S., Matsuyoshi, S. & Kondoh, Y. (2008). Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus. Proceedings of LREC 2008. LREC PDF.
See also
- Japanese script composition — provides the inputs.
- JTF rules — Japan Translation Federation conformance.