Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mehen.ophi.dev/llms.txt

Use this file to discover all available pages before exploring further.

mehen’s Tier-0 Japanese readability score is the Tateishi simplified RS (Tateishi, Ono & Yamada 1988). It needs only sentence boundaries and script-run features — both computable without a tokenizer.

Tateishi RS

Simplified 6-variable form:
RS = −0.12·ls − 1.37·la + 7.4·lh − 23.18·lc − 5.4·lk − 4.67·cp + 115.79
Calibrated so mean ≈ 50, SD ≈ 10. Higher = easier. Mehen emits this as tateishi_rs with sanity guards:
  • Refuse when hiragana_ratio > 0.90 (the formula is gamed upward).
  • Refuse when character count < 300.

Jōyō grade proxy

The 2,136-character 2010 Jōyō list maps each character to a grade 1–8 (1–6 elementary, 7 = secondary Jōyō, 8 = non-Jōyō hyōgai). Ships as a ~6 KB static table behind --features japanese-jouyou:
jouyou_grade_mean = mean(grade(c) for each kanji c)
hyougai_ratio     = non_jouyou_kanji_chars / kanji_chars
jouyou_grade_mean is a direct school-grade analogue to Flesch-Kincaid: < 3 indicates elementary reading; > 6 indicates high-school+ technical prose.

Higher-tier formulas (gated)

Behind --features japanese-morph (Lindera + IPADIC) or --features japanese-unidic (Vibrato + UniDic):
  • Shibasaki & Hara — adds bunsetsu and morphological inputs.
  • Lee & Hasebe jReadability — modern web-corpus-trained formula.
  • Obi / Obi2 — Japanese textbook-grade analogue.
  • Mizuno / Goda — alternative readability work.
JLPT bands — N5–N1 word and kanji bands — are optional behind --features japanese-jlpt (~300 KB).

References

  • Tateishi, K., Ono, Y. & Yamada, H. (1988). A Computer Readability Formula of Japanese Texts for Machine Scoring. Proceedings of COLING-1988: 649–654. ACL Anthology.
  • 文化庁 (Agency for Cultural Affairs of Japan, 2010). 常用漢字表. (Jōyō kanji list, 2010 revision.) Official notice (PDF).
  • Lee, J. & Hasebe, Y. (2017). jReadability — a web-based Japanese text-readability indexing system. jReadability.
  • Sato, S., Matsuyoshi, S. & Kondoh, Y. (2008). Automatic Assessment of Japanese Text Readability Based on a Textbook Corpus. Proceedings of LREC 2008. LREC PDF.

See also