Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mehen.ophi.dev/llms.txt

Use this file to discover all available pages before exploring further.

mehen emits every formula’s raw score with provenance rather than averaging. Two formulas on the same text routinely disagree by 2–4 grade levels because they target different comprehension thresholds (SMOG ~100%, FKGL ~75%, Dale-Chall in between). Averaging them is statistically wrong.

Formulas

FormulaSyllablesKey notes
Flesch Reading Easeyes206.835 − 1.015·ASL − 84.6·ASW. Higher = easier.
Flesch-Kincaid Gradeyes0.39·ASL + 11.8·ASW − 15.59. MIL-M-38784A standard.
Gunning Fogyes0.4·(ASL + 100·P_complex). Target grade 7–12 for business writing.
SMOGyes1.0430·sqrt(poly·30/sentences) + 3.1291. null below 30 sentences.
ARIno4.71·CPW + 0.5·ASL − 21.43. Syllable-free.
Coleman-Liauno0.0588·L − 0.296·S − 15.8. Syllable-free.
New Dale-Challno0.1579·PDW + 0.0496·ASL (+ 3.6365 if PDW > 5%).
FORCASTcounts 1-syllable20 − (N/10). Non-narrative text.
LIXnoASL + 100·(long_words/words).
RIXnolong_words / sentences.

Ensemble reporting

  1. Emit every formula with provenance.
  2. Compute an ensemble grade band as [min(FKGL, Fog, ARI, CLI), max(…)] — the interval where those four “running-prose” formulas agree.
  3. Emit FORCAST separately as the preferred single score for non-narrative docs.
  4. Suppress SMOG when sentences < 30.
  5. Report Dale-Chall only with an explicit list: provenance tag (NGSL 1.2 by default — Browne et al., 2013).

Syllable counting

Tier 0 default is a vowel-group heuristic (~85% agreement with CMU on open-domain text). Behind --features syllables-cmu, mehen links the CMU Pronouncing Dictionary for exact counts on ~134k words with the heuristic as an OOV fallback.

Sentence segmentation

UAX #29 (unicode-segmentation) plus:
  • A bundled ~150-entry English abbreviation list (Mr., e.g., i.e., U.S., v1.2.3).
  • No split when the period is followed by a lowercase letter, a digit, or <space><digit>.
  • Markdown block boundaries (blank line, heading, fence open/close, list item start) are hard terminators regardless of punctuation.

Doc-type thresholds

Doc typeFKGLFogPassive maxMax sentence words
README / overview≤ 10≤ 1215 %30
Tutorial≤ 9≤ 1110 %25
API reference≤ 12≤ 1420 %35
ADR / design≤ 12≤ 1425 %40
Error messages≤ 7≤ 95 %15
Release notes≤ 11≤ 1315 %30
These are conventions synthesized from Google, Microsoft, and 18F style guides. They are tunable profile defaults.

References

  • Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology 32(3): 221–233. DOI.
  • Kincaid, J. P., Fishburne, R. P., Rogers, R. L. & Chissom, B. S. (1975). Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy enlisted personnel. Research Branch Report 8-75, Naval Technical Training Command. DTIC PDF.
  • McLaughlin, G. H. (1969). SMOG grading — a new readability formula. Journal of Reading 12(8): 639–646. JSTOR.
  • Gunning, R. (1952). The Technique of Clear Writing. McGraw-Hill.
  • Coleman, M. & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology 60(2): 283–284. DOI.
  • Senter, R. J. & Smith, E. A. (1967). Automated Readability Index. AMRL-TR-66-220. DTIC PDF.
  • Chall, J. S. & Dale, E. (1995). Readability Revisited: The New Dale-Chall Readability Formula. Brookline Books.
  • Caylor, J. S. & Sticht, T. G. (1973). Development of a Simple Readability Index for Job Reading Material. HumRRO Professional Paper 1-73 (FORCAST). DTIC.
  • Anderson, J. (1983). Lix and Rix: Variations on a little-known readability index. Journal of Reading 26(6): 490–496. JSTOR.

See also