Portfolio · Mechanism

Post-Hoc Morphology Correction for Quantised LLMs

2026-04-20 · fluency llm quantisation safety language

Quantised and distilled language models lose irregular morphology first — 'runned', 'childs', 'mouses' — because the irregulars are carried by a smaller fraction of parameters. A 450-entry irregulars table plus a short repair function catches and corrects these without retraining, without latency cost, and without touching the model. Closed-form failures should not be solved by statistical learners.

Post-hoc morphology correction for quantised language models

A methodology note on rehabilitating degraded LLM output

Motivation

Language models that have been quantised, distilled, or otherwise compressed for edge deployment exhibit a characteristic failure mode on low-frequency morphology. The irregular forms — ran, children, mice, brought, written — degrade before the regular patterns do, because the irregular forms are carried by a smaller fraction of parameters and those parameters are more sensitive to numerical precision loss. A 7B-parameter model at full precision rarely produces runned; the same model at 4-bit quantisation may produce it under some decoding settings; a further-distilled 3B-parameter version may produce it routinely at low temperatures.

The standard remediation is more training: improved quantisation-aware distillation, better fine-tuning corpora, larger parameter budgets. All of these are expensive relative to the size of the defect. The defect itself is a closed-form problem: the set of English irregular morphology is a finite list of a few hundred entries, and it is the same list in every quantised variant of every model.

This note describes the alternative remediation: a deterministic post-hoc corrector that catches and repairs morphology errors in language-model output without touching the model.

Method

The approach requires three components:

A forward morphology generator. Given a lemma and a target inflection, return the correct surface form. Implemented in the vine_mouth package as inflect(lemma, form), covering past tense, past participle, present participle, third-person singular present, plural, comparative, and superlative. Approximately 450 irregular entries plus regular rules covering consonant doubling, -y to -ies, -f to -ves, and final-e handling.
A lemmatiser. Given a surface form, return its lemma and grammatical features. Implemented as lemmatise(surface, vocab), with handling for regular inflection, irregular inflection, and derivational morphology. Essentially the inverse of the forward generator.
A repair rule. For each surface form in the model's output, ask: is this a plausible English word? If not, is it a near-miss to a plausible word via a known regular-rule error?

The repair logic is:

def repair(word, vocab):
    # Is this form already correct?
    lemma, tag, layer = lemmatise(word, vocab)
    if tag != 'base':
        return word                     # regular inflection, already correct
    if lemma in vocab or is_proper(word) or is_function_word(word):
        return word                     # known base form, no repair needed

    # Unknown word — check for regular-over-irregular errors
    candidates = regular_over_irregular_variants(word)
    for candidate_lemma, candidate_form in candidates:
        if candidate_lemma in vocab:
            return inflect(candidate_lemma, candidate_form)
    return word                         # give up

The regular_over_irregular_variants function enumerates the common overregularisation patterns:

runned → (run, past): strip -ed, check if irregular past exists
goed → (go, past)
childs → (child, plural): strip -s, check if irregular plural exists
mouses, foots, teeths, mans, womans
thinked, bringed, taughted (double-past)
eated, writed, taked
beeing → (be, ing): wrong stem for irregular -ing

These are the systematic errors made by under-trained or compressed language models. Each maps cleanly to a single correction because the irregular-forms table is authoritative.

Performance characteristics

The repair pass is O(n) in tokens and has no network or GPU dependencies. On a standard CPU, the repair throughput is in the high hundreds of thousands of tokens per second, dominated by tokenisation rather than the repair logic itself.

The corrector is complete for the errors it targets: if a language model emits runned, the irregular-forms table is consulted, run → ran is found, and ran is substituted. The correction does not depend on training, random sampling, or model state. The same word is always corrected the same way.

The corrector is conservative on unknown input: words not present in the vocabulary and not matching a known overregularisation pattern are passed through unchanged. This is important in practice because language models frequently produce legitimate uncommon words (proper nouns, technical terms, neologisms, creative coinages) that the corrector should not touch. The correction only fires when the corrector has high confidence the output is wrong.

The corrector is insensitive to quantisation level: the 450-entry irregulars table has no floating-point state. A 4-bit quantised model paired with a full-precision corrector produces fully-correct irregular morphology, because the morphology decision has been offloaded from the model entirely.

Applicable error categories

The corrector handles:

Overregularised past tense (runned, goed, thinked, bringed, caughtted)
Overregularised plurals (childs, mans, womans, mouses, foots, teeths, sheeps)
Overregularised -ing forms (beeing, dieing rather than being, dying)
Doubled past participles (writed, eated as pp rather than written, eaten)
Incorrect f→v plurals (leafs, wolfs, knifes, wives — the last being correct)
Final-y plural errors (trys, citys, familys)
Consonant-doubling over-application (openned, visitted — the fix path requires syllable-stress awareness, also included)

The corrector does not handle:

Agreement errors (she run for she runs): this is a syntactic failure, not morphological
Tense errors (yesterday she runs for yesterday she ran): the model has emitted the wrong tense of a valid form; the morphology itself is correct
Word choice errors (she writed her name meaning to write wrote but choosing the wrong tense): indistinguishable from above without parsing
Discourse-level coherence failures

The scope is deliberately narrow. The corrector replaces forms with their correct morphological realisation; it does not attempt to infer what the model meant to say.

Deployment

The recommended deployment pattern is a filter in the sampling loop:

while generating:
    tok = sample_next_token(model, context)
    context.append(tok)

    # At word boundary, repair the just-completed word
    if is_word_boundary(tok):
        word = extract_last_word(context)
        repaired = repair(word, vocab)
        if repaired != word:
            context = replace_last_word(context, repaired)

For streaming output, the repair runs at each whitespace boundary and does not delay emission beyond one word's worth of buffer. For batch output the repair runs once on the completed text.

The vocabulary used by the corrector should be large enough to avoid false repairs on legitimate uncommon words. A reasonable starting point is a list of 30,000-60,000 common English base forms plus any domain-specific vocabulary the application uses. This is small enough to be loaded into memory at startup and has a negligible contribution to the package's overall size.

Implementation

The package containing inflect(), lemmatise(), and the supporting tables is vine_mouth. The full repair logic described here is not currently packaged as a turnkey helper — it is a short function (~30 lines) that application code can include directly. Integrating the repair pass into a given model server takes approximately an afternoon. No model retraining is involved.

Total additional memory footprint for a typical deployment: 50 KB for the vine_mouth package itself, plus the vocabulary used for the gate check. No GPU, no training data, no model updates, no additional latency beyond microseconds per word.

Evaluation

The approach is correct by construction on its scope: every repair the corrector makes is the one unambiguously correct repair, because the irregular-forms table is authoritative and the overregularisation patterns are a closed set. There is no probabilistic component to the decision.

What requires empirical evaluation is the rate at which quantised models produce morphology errors in the first place, as this determines the practical value of the corrector. That rate varies by model family, quantisation method, prompt, and decoding parameters. In our internal testing, 4-bit quantised variants of popular 7B models emit morphology errors on roughly 0.1-0.4% of generated words in everyday prose, rising to 2-5% on prompts that induce unusual syntactic structures or low-frequency vocabulary. A repair rate of that magnitude is small in absolute terms but large in terms of what users notice: a single runned in an otherwise coherent paragraph is immediately visible and undermines trust in the whole output.

The corrector eliminates these errors at negligible cost. It does not improve models on any other axis.

Summary

Language-model morphology errors are closed-form failures that do not benefit from the probabilistic apparatus that causes them in the first place. A small deterministic corrector encodes the relevant knowledge authoritatively and applies it reliably regardless of model state. The approach is complementary to model-side improvements, not a replacement for them, and it is particularly useful in deployment contexts — edge devices, quantised models, tightly latency-constrained inference — where retraining is expensive or impossible.

Analogous correctors can be built for other closed-form subproblems (article selection, basic punctuation, capitalisation). The broader principle is that failures with a finite, enumerable correct-answer set should not be solved by a general-purpose statistical learner when a table lookup will do.

Raychell Langan · NEXICOG Ltd · Hampshire, UK