The Readability Mandate
All Sectors
Most firms have a readability policy: documents must hit a target Flesch-Kincaid reading age, avoid jargon lists, fit a plain-English template. The Consumer Understanding outcome makes this insufficient. Readability tools measure word length and sentence complexity; they do not measure whether a customer can identify the consequence of what they read. Yet across letters, emails, app screens, contractual terms, regulated disclosures, and marketing, the score is what gets measured and the consequence is what gets missed. The structural problem is that the document factory has no enforced comprehension standard sitting between drafting and dispatch.
The structural move is to make comprehension testing the operating standard for the document factory — proportionate, gated, evidence-driven, and consequential — so that no customer-facing artefact reaches dispatch without a defensible answer to whether the customer can identify what it means for them:
Comprehension standard, not readability scoreThe mandate sets a comprehension outcome for every class of customer-facing document — what the customer must be able to identify, decide, or act on after reading — and tests against that outcome with representative samples including vulnerability cohorts. Readability metrics are an input, not the standard; a document that scores well on Flesch-Kincaid but fails comprehension testing fails the mandate. The design test: for every major document class, can the firm produce a defined comprehension outcome, the test design, the sample composition, the most recent test result, and the action taken when the test failed?
Sign-off gate with authorityThe mandate operates through a gate in the document production workflow: no customer-facing artefact in scope reaches dispatch without a comprehension-test pass or a documented exception authorised at a level that owns the consequence. The gate is enforced when commercial deadlines collide with test failure, not waived. Regulated-format documents (KIDs, summary boxes, arrears templates) sit inside the gate, with feedback to the regulator where prescribed text is the failure point. The design test: in the last twelve months, how many documents were stopped, revised, or escalated by the gate, and what was the outcome of those that were dispatched under exception?
Proportionate testing, continuous evidenceTesting depth scales to consequence: high-stakes, low-volume documents (suitability reports, mortgage offers, claims decisions, drawdown recommendations) receive deep comprehension testing; high-volume, repeated documents (transactional letters, app screens, marketing emails) receive sample-based and synthetic-user testing with periodic real-user retesting. Post-deployment evidence — call drivers, complaint themes, chat transcripts — feeds back into the test cycle, so a document that passes pre-launch testing but generates downstream misunderstanding is identified and revised. The design test: can the firm show, for any document, the link from drafting through testing through post-deployment evidence to the next revision?
Every in-scope document class has a defined comprehension outcome, a sample design that includes vulnerability cohorts, a most-recent test result, and a documented action where the test failed — not a readability score reported as if it were comprehension evidence.
The sign-off gate produces a record: documents stopped, documents revised, documents dispatched under authorised exception. The exception register is reviewed at governance level, and recurrent exceptions trigger redesign of the production workflow rather than a routine waiver.
Post-deployment evidence — call drivers, complaint themes, chat transcripts, frontline-agent feedback — is attributed back to the document and clause that generated it, and the next revision cycle addresses the attributed clauses with re-testing, not cosmetic edits.
Comprehension performance is reported to the board as a customer-outcome metric with cohort breakdowns (including vulnerable customers), not as a count of documents that passed an internal readability check, and the trend over time is visible in Duty board reporting.
A retail bank, after the Consumer Understanding Review (March 2026) named reliance on sales data and absence of complaints as inadequate, rebuilt its arrears-communications mandate. The bank's standard arrears letter had been Flesch-Kincaid-compliant for years and was issued at scale. Comprehension testing with a sample of customers in early arrears, including low-financial-confidence and vulnerable cohorts, found that only 38% correctly identified that the letter required action within fourteen days, only 29% understood that ignoring the letter would trigger a default registration, and only 18% recognised that free debt advice was available at a route that did not involve calling the bank. The bank rewrote the letter against a defined comprehension outcome — the customer must identify the action required, the consequence of inaction, and at least one independent advice route — and re-tested with the same cohort. Comprehension on each measure rose above 75%. The mandate then formalised the gate: no future arrears template variant could be dispatched without comprehension testing against the same three outcomes, and no commercial revision (such as adding a payment-promotion message) could override the gate without sign-off at risk-committee level. Within two quarters, calls to the bank's collections line citing letter confusion fell by more than a third, and engagement with the named debt-advice route rose materially.
A wealth firm, applying the BIT investment fee disclosure work and the ESMA behavioural insights workshop methodology to its suitability reports, found that its standard advice-suitability template — drafted within firm-wide regulatory templates and Flesch-Kincaid-checked — did not produce reliable comprehension of the recommendation's most consequential elements. Testing with a sample of advised clients, including those advised in vulnerable circumstances, found that 41% could not articulate the cost of the recommended portfolio over five years in pounds, 34% misidentified the liquidity profile, and 26% believed an income recommendation was guaranteed when it was a projection. The firm restructured its suitability template around three tested comprehension outcomes — projected pounds-cost, liquidity, projection-versus-guarantee — placed each at the front of the relevant section, and added a tested plain-language explainer panel. Sign-off authority was moved to the firm's Consumer Duty Champion, with the gate holding for any template variant that failed re-testing. The Digital Design review (July 2025) and the Consumer Understanding Review (March 2026) had both flagged the suitability report as a high-stakes document where comprehension testing was rare. Post-redesign, complaint themes citing 'I didn't realise the cost' or 'I thought the income was guaranteed' fell substantially across the advised book.
- Common failure modes
The most common failure mode is treating the readability score as the standard. Flesch-Kincaid and equivalent tools measure surface complexity; they say nothing about whether a customer can identify the consequence — the rate change, the exclusion, the fee, the irreversibility — of what they read. The Consumer Understanding Review (March 2026) explicitly named reliance on readability metrics, sales data, or absence of complaints as inadequate evidence of understanding, and the mandate must reject these substitutions. A second is regulated-format defeatism: firms treat KIDs, KFIs, summary boxes, and arrears templates as untestable because the format is prescribed, when the mandate's job at exactly those moments is to test comprehension within the format and feed evidence back to the regulator where the prescribed text itself is the failure point. A third is sign-off without consequence: a comprehension test that produces a low score but does not stop dispatch is theatre, and the mandate must include the gate that holds when commercial deadlines collide with comprehension failure. A fourth is testing only the average customer: comprehension data that excludes vulnerable cohorts, low financial confidence, low digital confidence, or non-native English speakers reproduces exactly the inequality the Duty was written to address. A fifth is one-shot testing: the mandate is broken if comprehension is tested at launch and never retested as the document is revised, the customer base shifts, or post-deployment evidence accumulates.