Skip to content

BrailleKit

Braille Translation for Everyone

All notable changes to BrailleKit will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.8.0] — 2026-06-03

Added

  • Auto-update on Windows via WinSparkle 0.9.3. BrailleKit now checks for updates daily in the background and offers signed installer updates through a standard "Software Update" dialog. A "Check for Updates…" entry is also available in the Help menu. Update appcast hosted at dl.slohmaier.com/appcast/braillekit-win.xml with EdDSA-signed release artifacts. WinSparkle.dll bundled with the Inno installer.

  • Spanish G2 (es-g2): full coverage of CBE Documento Técnico B-16-1 (V3 2025) — the authoritative standard for Spanish contracted braille (Estenografía española). Discovered and bundled the previously-missing standard PDF from the official ONCE website (CC BY-NC-ND). Programmatically extracted every contraction:

    • Tables 1, 2, 3 (51 single-cell abbreviations)
    • Tabla 5 (15 gender/number variants)
    • Tabla 6 (5 number-plural variants)
    • Tabla 8 (159 two-cell abbreviations)
    • Tabla 9 (48 three-cell abbreviations)
    • Tabla 10 (21 Iberoamerican country names — including Costa Rica + República Dominicana via the multi-word phrase_contractions engine path)
    • §2.3 (10 standard Spanish abbreviation variants — doctora, señoras, ustedes, etc.)
    • Apéndice C (full alphabetical listing including ~700 derivative gender/number/verbal/-al/-mente/-ble/-dad forms)

    es-g2 grew from 128 word_contractions to **749 word_contractions

    • 2 phrase_contractions** (5.85× larger), with 733/733 verified gold entries at 100.00% match rate. Promoted beta → STABLE.
  • Portuguese G2 (pt-g2): full coverage of IBC Estenografia Braille 2006 Section I — the authoritative Brazilian standard (ISBN 978-85-60331-05-5, 70-page PDF bundled). Programmatically extracted all 140 entries from Sections I.1 (Sinais Simples / Duplos / Triplos / Quádruplos by total and partial syllabic representation), I.2 (Contração Apoiada / Pura / Emergência), I.3 (Suspensão), and I.4 (Convenção Relativa).

    pt-g2 grew from 81 word_contractions to 149, with 140/140 verified gold entries at 100.00% match rate. Promoted beta → STABLE.

Fixed

  • es-g2: 9 broken/noise entries removed (foreign-language paste error 'tak', 3 demonstrative collisions that silently truncated the 't' character, 5 literal-only entries with no compression value). 16 collisions against B-16-1 corrected (alguno/alguna/algunos/algunas, ninguno-family, otra-family, aquel-family, según, qué/cuál/dónde/quién question-form @ prefix, mediante, mientras).

  • pt-g2: 5 legacy mismatches against IBC standard corrected (jamais ⠚⠁⠍⠁⠮ → ⠚⠍, você ⠧⠕⠉⠣ → ⠧⠉, qual ⠟⠅ → ⠟⠇, quem ⠟⠲ → ⠟⠍, tudo ⠞⠥⠙⠕ → ⠞⠕).

  • Benchmark runner (scons benchmark): set BRAILLEKIT_CLI_ALLOW_NONPRO=1 so the Pro-tier CLI gate doesn't cause empty CLI output that scored every table 0% (the symptom was every table showing 0% raw match in the summary despite working translation).

  • Braille music from MusicXML (alpha): selecting a MusicXML score now produces international braille music notation following the New International Manual of Braille Music Notation (World Blind Union, 1996). The note/rhythm/interval signs are language-independent; the score's text (title, composer, part names, lyrics, directions) is still translated through your selected language table. New MusicTranslator API and batch-musicxml CLI command. v1 covers notes, rests, octave marks, dotted notes, accidentals, key/time signatures, bar lines and chords (via intervals); precise lyric-to-note alignment for vocal scores is planned for a later release.

[0.7.0] — 2026-04-29

Added

  • Inline-MathML auto-dispatch from prose tables (P1, commit aa869015). When a prose table declares mappings.math_table (e.g. en-ueb-g2 → en-nemeth) and the caller has not opted out, Translator::translate segments the input on <math>…</math> fragments and routes each math fragment through MathTranslator::translate_mathml. Switch indicators (BANA UEB-with- Nemeth 2014: open ⠸⠩, close ⠸⠱) are inserted at each prose↔math boundary. Opt out per call via TranslationOptions.disable_math_dispatch. See src/core/src/text_math_segmenter.{h,cpp} and the 8 integration tests in tests/core/test_translator_math_dispatch.cpp.

  • phrase_contractions engine path for cross-word phrase abbrev- iations (W3 / P2). Per-table top-level array of {pattern, braille, priority?, reference?, group?} mapping a whitespace-separated multi-word pattern to a single braille output (e.g. French il y a → ⠽⠁ per AVH AOÉ). Pre-translation scanner in src/core/src/phrase_scanner.{h,cpp} walks the input once and returns word-boundary-aligned, non-overlapping hits; the translator splices the phrase braille verbatim and translates the surrounding text segments via the normal pipeline. Match rules: longest-token- count wins, ties by descending priority, ASCII case-insensitive, wrapping punctuation stripped. Opt out per call via TranslationOptions.disable_phrase_dispatch. 30 new unit tests. Tables that don't define the array are byte-identical with prior behaviour.

  • settings.lookahead_strategy (P4) — per-table opt-in that controls which overlapping peek matches may suppress the current greedy candidate when lookahead_window > 0. Two values: "suffix-only" (default — preserves canonical S2 behaviour) and "all" (any context-applicable peek with strictly higher priority may suppress). Empty / unknown values fall back to default.

  • NFB Nemeth lesson extraction infrastructure (P3). New benchmark/golden/extract_nfb_lessons.py (anchor-and-block parser walks all 15 lesson PDFs → 556 candidate Nemeth fragments) and benchmark/golden/reconstruct_nfb_mathml.py (conservative MathML reconstructor + batched MathTranslator verifier). Drove the en-nemeth gold expansion below.

Changed

  • en-nemeth: BANA Nemeth Code 2022 §1 numeric indicator emit (numeric_indicator: "⠼"). MathTranslator now emits the indicator at the start of an expression and after spaces (the only positions BANA requires), via a new cleanup_pass rule that walks the post-emit braille and inserts ⠼ before any Nemeth lower-cell digit cell at byte position 0 or after a braille space ⠀. Inside operators / fractions / sub-/super-scripts no insertion. Output changed for every digit-leading expression: e.g. <math><mn>42</mn></math> now emits ⠼⠲⠆ (was ⠲⠆), <math><mn>2</mn><mo>=</mo><mn>22</mn></math> now emits ⠼⠆⠀⠨⠅⠀⠼⠆⠆ (lesson-correct, was ⠆⠀⠨⠅⠀⠆⠆). Other math tables are unaffected — Rule 5 only fires when the loaded table declares a non-empty numeric_indicator. Updated 28 fixtures + 28 verified gold entries + 5 unit-test expectations.

  • en-nemeth gold corpus: 69 → 84 verified entries at 100 % match. 15 newly-merged lesson-derived Nemeth pairs from reconstruct_nfb_mathml.py. The remaining 538 NFB candidates await a richer reconstructor (currency, comma-lists, multi- operand) — tracked in docs/NFB_LESSON_RECON.md.

  • Status counts: 25 stable / 45 beta / 23 alpha → 36 stable / 42 beta / 15 alpha. Promotions in this cycle (largely 2026-04-28): af-g2, ca-g1, eo-g1, fi-g1, fil-g1, fr-g1, ga-g1, ga-g2, hi-g1, hu-g1, id-g1, it-g1, mn-g1, nl-g1, sk-g1, sl-g1, ta-g1, te-g1, en-nemeth.

Fixed

  • benchmark/golden/_pdf_extractor.BRF_TO_UNICODE BANA chart drift (commit 8d17567f). Three entries disagreed with the NLS spec: 0 was mapped to ⠼ (= the # cell) instead of ⠴; + was mapped to ⠴ (what 0 should have been) instead of ⠬; [ was mapped to ⠨ (= the . cell) instead of ⠪. Fix is byte-identical for any extractor whose gold doesn't exercise the affected characters; verified by full test suite + full gold benchmark.

Documentation

  • New per-table plan: docs/tables/fr-g2/PLAN.md splits P5 fr-g2 contraction completion into three independent sub-projects. docs/tables/fr-g2/PHRASES.md lists the 34 candidate AVH §IV Locutions that block on extending the AVH BRF codelocal decoder.
  • docs/NFB_LESSON_RECON.md Phase 2-4 outcome section explains the extractor pipeline + the 538-skipped breakdown by shape.
  • docs/MULTIDAY_PROJECTS.md STATUS blocks updated for P1, P2, P3, P4. P5 fr-g2 has its own plan; es-g2 / pt-g2 plans deferred to follow the AVH decoder pattern.
  • docs/IMPROVEMENT_ROADMAP.md W3 row promoted to "Resolved"; S2 row updated with the P4 evaluation outcome.

Notes for downstream consumers

  • Math output stability: Tables with numeric_indicator: "" (empty) are byte-identical with prior behaviour. Tables that declare numeric_indicator (currently only en-nemeth ships with "⠼") emit the new BANA-correct output. If you have callers locking on bare-digit Nemeth output and want the old behaviour, set the table's numeric_indicator back to "".

  • API additions: TranslationOptions::disable_phrase_dispatch (default false) matches the pre-existing disable_math_dispatch shape.

[0.6.2] — 2026-04-21

Added

  • Portable DOCX output with embedded Braille font. New DocxWriterOptions.embed_braille_font flag ships an OOXML-obfuscated copy of Braille CC0 (GGBotNet, CC0 1.0 public domain, full U+2800..U+28FF) inside the .docx so the document renders correctly on machines that don't have SimBraille/Tiger/Swell installed. Uses the spec-compliant XOR-twice obfuscation in OOXML §17.8.1 — verified byte-identical round-trip with the embedded GUID as the fontKey. Adds ~50 KB to the binary .docx size (compressed ~7 KB in the zip).

  • RTF writer (write_rtf, write_rtf_to_memory, RtfWriterOptions). Emits Microsoft-compatible RTF with the same embed_braille_font option, using RTF 1.9 \*\fontemb + \*\fontfile hex encoding. Unicode braille goes through \uN? escapes with \uc1 so the fallback "?" is consumed by strict readers. Documented compat note: macOS Cocoa RTF parser (textutil, TextEdit) doesn't support \*\fontemb — use DOCX instead on those readers.

  • Document tab (GUI) gains:

    • A fifth radio button for Rich Text (.rtf) output parallel to DOCX.
    • A fifth font picker entry "Braille CC0 (embedded, portable)" that flips on embed_braille_font for whichever container is selected.
    • Save-dialog filters for .rtf; persistent settings extended; tab order + EN/DE accessibility labels added.
  • Performance documentation (docs/PERFORMANCE.md): measured throughput, startup cost, and memory ceiling for en-ueb-g2 and de-g2 on 12 MB prose input.

Engine

  • Allow capitalized prefix per-contraction flag (allow_capitalized_prefix). Enables shortforms like blind, braille, first, friend, good, great, letter to fire as a bare prefix of a capitalised word (Blindcraft → ⠠⠃⠇⠉⠗⠁⠋⠞, Greatford, Goodge, …). Approximates UEB §10.11 compound-word shortform expansion.

  • Proper-noun not_words lists for 11 groupsigns / initial-letter contractions in en-ueb-g2 (§10.11 bridging rule). 21 specific proper nouns (Boone, Chisholm, Esther, Jamestown, Hades, Hadrian, Dayan, Bighorn, Airedale, Newhaven, Sontheim, …) no longer get incorrect groupsigns applied mid-word.

Fixes

  • CLI --version / --help: were previously consumed as input text and translated to braille. Now short-circuit to print version / usage before the translate dispatch.

  • RTF \uc changed from \uc0 to \uc1 so strict readers don't render the ? fallback character verbatim.

[0.6.1] — 2026-04-20

Engine features

  • Grade-2 number-mode persistence: punctuation listed in a table's number_mode_chars (by default , and .) now keeps the following digit run inside the same number segment. Grade 1 already honored this; Grade 2 was incorrectly resetting number mode after every non-word segment. Affects Norwegian, German, French, Italian etc.
  • LOWERCASE_WORD shortform extensions: the extension-match path in apply_contractions now fires for LOWERCASE_WORD contexts too, not only WHOLE_WORD. Unlocks UEB in/enough followed by apostrophe clitics ('s, 't, 'd, 'll, 've, 're).
  • Per-entry flag allow_adjacent_to_hyphen: escape hatch for suppress_lower_wordsigns_at_hyphens. UEB §10.6 allows in and enough adjacent to hyphens while other lower wordsigns stay suppressed.
  • Per-entry flag not_as_whole_word: UEB §10.3.2 forbids the digraph strong-groupsigns (sh, th, ch, st, wh, gh, ar, ou) from standing as the entire word — they presume surrounding letter context. Enables sh → ⠎⠓ (not ) for standalone input.

Table-level fixes

  • en-ueb-g2: apostrophe-clitic extensions on all 29 alphabetic wordsigns and all 286 shortforms; n't on could/would/should/must; dis and con not_words pruned to just the genuine UEB exceptions (coney, coneys); ing context restricted to middle/final so it can't fire at word start post-hyphen (to-ing, fro-ing); several bogus indicator/math-symbol shortforms removed (terminator, division, proportion, acknowledge, o'clock — each was a PDF-extraction artifact that truncated the real translation).
  • de-g2: resolved 6 of 9 BSKDL A1 residual mismatches via targeted dictionary entries + a context change (ihrige, lässt's, möcht's, Man, mrs/drs rejected as Luxembourgish).
  • no-g1: 10 missing ASCII symbol mappings added (&, §, ©, π, °, |, =, +, @, _, $); the double-quote mapping was fixed ().

[0.6.0] — 2026-04-15

Added

  • Desktop UI overhaul

    • Wizard flow (WelcomeStage → InputStage → ConfigStage → ResultStage) for guided text/document conversion
    • eBraille export from DocumentTab
    • EPUB and Markdown import, drag-and-drop for single files and batch
    • Live preview, async-feel conversion pipeline
    • System-aware dark mode
    • Table maturity indicator in the picker (stable / beta / alpha)
  • Table maturity classification — new status field in table JSON (stable / beta / alpha), surfaced in the UI picker

  • Engine features

    • Recursive compound boundary detection (German compound handling, +77 words)
    • check_remainder morphology engine infrastructure
    • not_words morphology engine (blocks contractions in specific word forms)
    • Grade 2 back-translation (BRF → text via inverse contraction lookup)
    • Capital passage indicator (⠠⠠⠠) support
    • UEB be- / con- / dis- prefix shortforms with proper overrides
  • Windows ARM64 cross-compilation support including ARM64 wxWidgets build and MSIX bundle auto-detection

  • Language coverage

    • 591 Norwegian (no-g1/no-g2) gold entries + Scandinavian BRF decoder
    • 76 Swedish (sv-g2) entries from MTM Kortskrift PDF
    • 215 English (en-ueb-g2) entries bulk-imported from UEB Rules 2024
    • 453 French (fr-g2) AVH abbreviations
    • pt-g2, cy-g2 promoted to stable
  • C API expansion with documented error model, new entry points for table enumeration and status queries; new C API README

  • Documentation

    • docs/GETTING_STARTED.md onboarding guide
    • C API README with usage examples
    • All diagrams converted from Graphviz to inline PlantUML (light + dark)
    • Accessible Doxygen HTML with auto dark mode (new in this release)

Changed

  • Engine refactors

    • Extracted capital_indicators.cpp, compound_boundary.cpp, grade1_translator.cpp from the monolithic translator
    • ContractionGuards struct replaces 9 ad-hoc lambdas
    • pass2_processor.cpp — named pass2 phase classes, documented responsibilities
    • Removed 222-line dead fallback contraction path
    • table_manager.cpp broken into focused modules
  • Table accuracy — continued quality push on major tables:

    • de-g2: BSKDL A1 coverage 90% → 98% (multiple whole-word batches, Fugen-s compound boundary fix, proper-noun/loanword additions)
    • en-ueb-g2: 14 broken shortforms removed, new entries from UEB Rules 2024
    • fr-g2: +453 AVH Manuel abbreviations (+585 words correctly translated)
    • Scandinavian tables promoted to stable
  • Bindings: Python / C# / Flutter versions synced from 0.1.0 → 0.6.0 to match the core library

Fixed

  • Korean Hangul decomposition: MSVC silently re-encoded "\u3131" in CP1252; all tables now use explicit "\xe3\x84\xb1" UTF-8 byte escapes
  • UTF-8 bounds checks missing on several translator paths; silent try/except handlers replaced with typed error propagation (review fixes)
  • de-g2 compound boundary incorrectly blocked st contraction via Fugen-s guard
  • Capital indicator suppressed correctly after hyphens in compounds (per BSKDL convention) and before letter-prefix indicator
  • en-ueb-g2 con- prefix false positives in words like console, contest
  • en-ueb-g2: added Beatrice, Beatrix, Belinda, acknowledge shortforms previously missing
  • CLI stdin handling for piped input in Grade 2 mode

Technical

  • New central VERSION file (0.6.0) — single source for all version stamping
  • scons docs now produces accessible HTML with system-aware dark mode toggle, tree navigation, proper ARIA, and respects prefers-reduced-motion
  • site_scons/build_tools.py SDK packager now bundles CHANGELOG.md, LICENSE, and the generated Doxygen documentation
  • ~5 GB of old benchmark data removed (word lists, HTML reports, per-table corpus outputs)

[0.5.0] — 2026-03-24

Added

  • Accessible HTML benchmark report (scons benchmarkbenchmark/results/benchmark_report.html)

    • CDN-based Pico CSS, dark/light mode toggle
    • Full screenreader support (ARIA labels, semantic HTML)
    • Sorted by language, expandable per grade/table
    • Per-table: Braille system, standard reference, multi-comparison results
    • SDK disclaimer about independent implementation and verification status
  • Chinese Braille engine: Hanzi→Pinyin pre-processing via embedded Pinyin dictionary

    • 44,348 CJK→Pinyin mappings from Unicode Unihan (Unicode License, permissive)
    • New pinyin_dictionary table field in table_impl.h
    • zh-g1 accuracy: 0% → 91.8% (vs pypinyin reference)
  • Latin letter indicator engine feature

    • New latin_letter_indicator table field: emits indicator once before Latin text in non-Latin scripts
    • ru-g1 accuracy: 94.0% → 98.8%
  • Japanese Tenji table overhaul

    • Added all Hiragana + Katakana to character map (152→237 characters)
    • Fixed わ/を swap, added 33 youon combinations, 6 foreign sound combos
    • ja-g1 accuracy: 0% → 97.3% (vs uhyo/tenji reference)

Changed

  • Benchmark accuracy improvements across 20+ tables:

    Language Table Before After Change
    Spanish es-g2 15.0% 70.2% +55.2pp
    Hindi hi-g1 77.8% 98.0% +20.2pp
    Portuguese pt-g2 40.3% 52.1% +11.8pp
    French fr-g2 28.9% 40.7% +11.8pp
    Arabic ar-g1 89.5% 95.9% +6.4pp
    Dutch nl-g1 91.3% 97.7% +6.4pp
    Russian ru-g1 94.0% 98.8% +4.8pp
    Czech cs-g1 95.6% 99.8% +4.2pp
    Polish pl-g1 93.6% 98.1% +4.5pp
    Slovenian sl-g1 96.0% 98.3% +2.3pp
    Spanish es-g1 96.9% 99.3% +2.4pp
    Slovak sk-g1 98.6% 99.3% +0.7pp
    English en-ueb-g1 98.4% 99.3% +0.9pp
    English en-ueb-g2 96.5% 97.4% +0.9pp
    Finnish fi-g1 99.7% 99.9% +0.2pp
    French fr-g1 99.4% 99.8% +0.4pp
    Italian it-g1 99.5% 99.7% +0.2pp
    Portuguese pt-g1 99.4% 99.6% +0.2pp
    Norwegian no-g1 99.4% 99.6% +0.2pp
    German de-g2 95.6% 95.7% +0.1pp

Fixed

  • _batch_translate() now passes BRAILLEKIT_TABLES_DIR environment variable correctly
  • Chinese CCB initials/finals sections now loaded into character_map by table_manager

Technical

  • New files: benchmark/report_generator.py, benchmark/references/, benchmark/config/, core/data/pinyin_dict.json
  • Engine changes: table_impl.h (2 new fields), translator.cpp (Latin indicator + Pinyin pre-processing), table_manager.cpp (CCB/Pinyin loading)
  • All changes are table-configurable — no hardcoded language-specific code in engine