Changelog | BrailleKit | Stefan Lohmaier

All notable changes to BrailleKit will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.8.0] — 2026-06-03

Added

Auto-update on Windows via WinSparkle 0.9.3. BrailleKit now checks for updates daily in the background and offers signed installer updates through a standard "Software Update" dialog. A "Check for Updates…" entry is also available in the Help menu. Update appcast hosted at dl.slohmaier.com/appcast/braillekit-win.xml with EdDSA-signed release artifacts. WinSparkle.dll bundled with the Inno installer.
Spanish G2 (es-g2): full coverage of CBE Documento Técnico B-16-1 (V3 2025) — the authoritative standard for Spanish contracted braille (Estenografía española). Discovered and bundled the previously-missing standard PDF from the official ONCE website (CC BY-NC-ND). Programmatically extracted every contraction:
- Tables 1, 2, 3 (51 single-cell abbreviations)
- Tabla 5 (15 gender/number variants)
- Tabla 6 (5 number-plural variants)
- Tabla 8 (159 two-cell abbreviations)
- Tabla 9 (48 three-cell abbreviations)
- Tabla 10 (21 Iberoamerican country names — including Costa Rica + República Dominicana via the multi-word phrase_contractions engine path)
- §2.3 (10 standard Spanish abbreviation variants — doctora, señoras, ustedes, etc.)
- Apéndice C (full alphabetical listing including ~700 derivative gender/number/verbal/-al/-mente/-ble/-dad forms)
es-g2 grew from 128 word_contractions to **749 word_contractions
- 2 phrase_contractions** (5.85× larger), with 733/733 verified gold entries at 100.00% match rate. Promoted beta → STABLE.
Portuguese G2 (pt-g2): full coverage of IBC Estenografia Braille 2006 Section I — the authoritative Brazilian standard (ISBN 978-85-60331-05-5, 70-page PDF bundled). Programmatically extracted all 140 entries from Sections I.1 (Sinais Simples / Duplos / Triplos / Quádruplos by total and partial syllabic representation), I.2 (Contração Apoiada / Pura / Emergência), I.3 (Suspensão), and I.4 (Convenção Relativa).

pt-g2 grew from 81 word_contractions to 149, with 140/140 verified gold entries at 100.00% match rate. Promoted beta → STABLE.

Fixed

es-g2: 9 broken/noise entries removed (foreign-language paste error 'tak', 3 demonstrative collisions that silently truncated the 't' character, 5 literal-only entries with no compression value). 16 collisions against B-16-1 corrected (alguno/alguna/algunos/algunas, ninguno-family, otra-family, aquel-family, según, qué/cuál/dónde/quién question-form @ prefix, mediante, mientras).
pt-g2: 5 legacy mismatches against IBC standard corrected (jamais ⠚⠁⠍⠁⠮ → ⠚⠍, você ⠧⠕⠉⠣ → ⠧⠉, qual ⠟⠅ → ⠟⠇, quem ⠟⠲ → ⠟⠍, tudo ⠞⠥⠙⠕ → ⠞⠕).
Benchmark runner (scons benchmark): set BRAILLEKIT_CLI_ALLOW_NONPRO=1 so the Pro-tier CLI gate doesn't cause empty CLI output that scored every table 0% (the symptom was every table showing 0% raw match in the summary despite working translation).
Braille music from MusicXML (alpha): selecting a MusicXML score now produces international braille music notation following the New International Manual of Braille Music Notation (World Blind Union, 1996). The note/rhythm/interval signs are language-independent; the score's text (title, composer, part names, lyrics, directions) is still translated through your selected language table. New MusicTranslator API and batch-musicxml CLI command. v1 covers notes, rests, octave marks, dotted notes, accidentals, key/time signatures, bar lines and chords (via intervals); precise lyric-to-note alignment for vocal scores is planned for a later release.

[0.7.0] — 2026-04-29

Added

Inline-MathML auto-dispatch from prose tables (P1, commit aa869015). When a prose table declares mappings.math_table (e.g. en-ueb-g2 → en-nemeth) and the caller has not opted out, Translator::translate segments the input on <math>…</math> fragments and routes each math fragment through MathTranslator::translate_mathml. Switch indicators (BANA UEB-with- Nemeth 2014: open ⠸⠩, close ⠸⠱) are inserted at each prose↔math boundary. Opt out per call via TranslationOptions.disable_math_dispatch. See src/core/src/text_math_segmenter.{h,cpp} and the 8 integration tests in tests/core/test_translator_math_dispatch.cpp.
phrase_contractions engine path for cross-word phrase abbrev- iations (W3 / P2). Per-table top-level array of {pattern, braille, priority?, reference?, group?} mapping a whitespace-separated multi-word pattern to a single braille output (e.g. French il y a → ⠽⠁ per AVH AOÉ). Pre-translation scanner in src/core/src/phrase_scanner.{h,cpp} walks the input once and returns word-boundary-aligned, non-overlapping hits; the translator splices the phrase braille verbatim and translates the surrounding text segments via the normal pipeline. Match rules: longest-token- count wins, ties by descending priority, ASCII case-insensitive, wrapping punctuation stripped. Opt out per call via TranslationOptions.disable_phrase_dispatch. 30 new unit tests. Tables that don't define the array are byte-identical with prior behaviour.
settings.lookahead_strategy (P4) — per-table opt-in that controls which overlapping peek matches may suppress the current greedy candidate when lookahead_window > 0. Two values: "suffix-only" (default — preserves canonical S2 behaviour) and "all" (any context-applicable peek with strictly higher priority may suppress). Empty / unknown values fall back to default.
NFB Nemeth lesson extraction infrastructure (P3). New benchmark/golden/extract_nfb_lessons.py (anchor-and-block parser walks all 15 lesson PDFs → 556 candidate Nemeth fragments) and benchmark/golden/reconstruct_nfb_mathml.py (conservative MathML reconstructor + batched MathTranslator verifier). Drove the en-nemeth gold expansion below.

Changed

en-nemeth: BANA Nemeth Code 2022 §1 numeric indicator emit (numeric_indicator: "⠼"). MathTranslator now emits the indicator at the start of an expression and after spaces (the only positions BANA requires), via a new cleanup_pass rule that walks the post-emit braille and inserts ⠼ before any Nemeth lower-cell digit cell at byte position 0 or after a braille space ⠀. Inside operators / fractions / sub-/super-scripts no insertion. Output changed for every digit-leading expression: e.g. <math><mn>42</mn></math> now emits ⠼⠲⠆ (was ⠲⠆), <math><mn>2</mn><mo>=</mo><mn>22</mn></math> now emits ⠼⠆⠀⠨⠅⠀⠼⠆⠆ (lesson-correct, was ⠆⠀⠨⠅⠀⠆⠆). Other math tables are unaffected — Rule 5 only fires when the loaded table declares a non-empty numeric_indicator. Updated 28 fixtures + 28 verified gold entries + 5 unit-test expectations.
en-nemeth gold corpus: 69 → 84 verified entries at 100 % match. 15 newly-merged lesson-derived Nemeth pairs from reconstruct_nfb_mathml.py. The remaining 538 NFB candidates await a richer reconstructor (currency, comma-lists, multi- operand) — tracked in docs/NFB_LESSON_RECON.md.
Status counts: 25 stable / 45 beta / 23 alpha → 36 stable / 42 beta / 15 alpha. Promotions in this cycle (largely 2026-04-28): af-g2, ca-g1, eo-g1, fi-g1, fil-g1, fr-g1, ga-g1, ga-g2, hi-g1, hu-g1, id-g1, it-g1, mn-g1, nl-g1, sk-g1, sl-g1, ta-g1, te-g1, en-nemeth.

Fixed

benchmark/golden/_pdf_extractor.BRF_TO_UNICODE BANA chart drift (commit 8d17567f). Three entries disagreed with the NLS spec: 0 was mapped to ⠼ (= the # cell) instead of ⠴; + was mapped to ⠴ (what 0 should have been) instead of ⠬; [ was mapped to ⠨ (= the . cell) instead of ⠪. Fix is byte-identical for any extractor whose gold doesn't exercise the affected characters; verified by full test suite + full gold benchmark.

Documentation

New per-table plan: docs/tables/fr-g2/PLAN.md splits P5 fr-g2 contraction completion into three independent sub-projects. docs/tables/fr-g2/PHRASES.md lists the 34 candidate AVH §IV Locutions that block on extending the AVH BRF codelocal decoder.
docs/NFB_LESSON_RECON.md Phase 2-4 outcome section explains the extractor pipeline + the 538-skipped breakdown by shape.
docs/MULTIDAY_PROJECTS.md STATUS blocks updated for P1, P2, P3, P4. P5 fr-g2 has its own plan; es-g2 / pt-g2 plans deferred to follow the AVH decoder pattern.
docs/IMPROVEMENT_ROADMAP.md W3 row promoted to "Resolved"; S2 row updated with the P4 evaluation outcome.

Notes for downstream consumers

Math output stability: Tables with numeric_indicator: "" (empty) are byte-identical with prior behaviour. Tables that declare numeric_indicator (currently only en-nemeth ships with "⠼") emit the new BANA-correct output. If you have callers locking on bare-digit Nemeth output and want the old behaviour, set the table's numeric_indicator back to "".
API additions: TranslationOptions::disable_phrase_dispatch (default false) matches the pre-existing disable_math_dispatch shape.

[0.6.2] — 2026-04-21

Added

Portable DOCX output with embedded Braille font. New DocxWriterOptions.embed_braille_font flag ships an OOXML-obfuscated copy of Braille CC0 (GGBotNet, CC0 1.0 public domain, full U+2800..U+28FF) inside the .docx so the document renders correctly on machines that don't have SimBraille/Tiger/Swell installed. Uses the spec-compliant XOR-twice obfuscation in OOXML §17.8.1 — verified byte-identical round-trip with the embedded GUID as the fontKey. Adds ~50 KB to the binary .docx size (compressed ~7 KB in the zip).
RTF writer (write_rtf, write_rtf_to_memory, RtfWriterOptions). Emits Microsoft-compatible RTF with the same embed_braille_font option, using RTF 1.9 \*\fontemb + \*\fontfile hex encoding. Unicode braille goes through \uN? escapes with \uc1 so the fallback "?" is consumed by strict readers. Documented compat note: macOS Cocoa RTF parser (textutil, TextEdit) doesn't support \*\fontemb — use DOCX instead on those readers.
Document tab (GUI) gains:
- A fifth radio button for Rich Text (.rtf) output parallel to DOCX.
- A fifth font picker entry "Braille CC0 (embedded, portable)" that flips on embed_braille_font for whichever container is selected.
- Save-dialog filters for .rtf; persistent settings extended; tab order + EN/DE accessibility labels added.
Performance documentation (docs/PERFORMANCE.md): measured throughput, startup cost, and memory ceiling for en-ueb-g2 and de-g2 on 12 MB prose input.

Engine

Allow capitalized prefix per-contraction flag (allow_capitalized_prefix). Enables shortforms like blind, braille, first, friend, good, great, letter to fire as a bare prefix of a capitalised word (Blindcraft → ⠠⠃⠇⠉⠗⠁⠋⠞, Greatford, Goodge, …). Approximates UEB §10.11 compound-word shortform expansion.
Proper-noun not_words lists for 11 groupsigns / initial-letter contractions in en-ueb-g2 (§10.11 bridging rule). 21 specific proper nouns (Boone, Chisholm, Esther, Jamestown, Hades, Hadrian, Dayan, Bighorn, Airedale, Newhaven, Sontheim, …) no longer get incorrect groupsigns applied mid-word.

Fixes

CLI --version / --help: were previously consumed as input text and translated to braille. Now short-circuit to print version / usage before the translate dispatch.
RTF \uc changed from \uc0 to \uc1 so strict readers don't render the ? fallback character verbatim.

[0.6.1] — 2026-04-20

Engine features

Grade-2 number-mode persistence: punctuation listed in a table's number_mode_chars (by default , and .) now keeps the following digit run inside the same number segment. Grade 1 already honored this; Grade 2 was incorrectly resetting number mode after every non-word segment. Affects Norwegian, German, French, Italian etc.
LOWERCASE_WORD shortform extensions: the extension-match path in apply_contractions now fires for LOWERCASE_WORD contexts too, not only WHOLE_WORD. Unlocks UEB in/enough followed by apostrophe clitics ('s, 't, 'd, 'll, 've, 're).
Per-entry flag allow_adjacent_to_hyphen: escape hatch for suppress_lower_wordsigns_at_hyphens. UEB §10.6 allows in and enough adjacent to hyphens while other lower wordsigns stay suppressed.
Per-entry flag not_as_whole_word: UEB §10.3.2 forbids the digraph strong-groupsigns (sh, th, ch, st, wh, gh, ar, ou) from standing as the entire word — they presume surrounding letter context. Enables sh → ⠎⠓ (not ⠩) for standalone input.

Table-level fixes

en-ueb-g2: apostrophe-clitic extensions on all 29 alphabetic wordsigns and all 286 shortforms; n't on could/would/should/must; dis and con not_words pruned to just the genuine UEB exceptions (coney, coneys); ing context restricted to middle/final so it can't fire at word start post-hyphen (to-ing, fro-ing); several bogus indicator/math-symbol shortforms removed (terminator, division, proportion, acknowledge, o'clock — each was a PDF-extraction artifact that truncated the real translation).
de-g2: resolved 6 of 9 BSKDL A1 residual mismatches via targeted dictionary entries + a context change (ihrige, lässt's, möcht's, Man, mrs/drs rejected as Luxembourgish).
no-g1: 10 missing ASCII symbol mappings added (&, §, ©, π, °, |, =, +, @, _, $); the double-quote mapping was fixed (⠶ → ⠲).

[0.6.0] — 2026-04-15

Added

Desktop UI overhaul
- Wizard flow (WelcomeStage → InputStage → ConfigStage → ResultStage) for guided text/document conversion
- eBraille export from DocumentTab
- EPUB and Markdown import, drag-and-drop for single files and batch
- Live preview, async-feel conversion pipeline
- System-aware dark mode
- Table maturity indicator in the picker (stable / beta / alpha)
Table maturity classification — new status field in table JSON (stable / beta / alpha), surfaced in the UI picker
Engine features
- Recursive compound boundary detection (German compound handling, +77 words)
- check_remainder morphology engine infrastructure
- not_words morphology engine (blocks contractions in specific word forms)
- Grade 2 back-translation (BRF → text via inverse contraction lookup)
- Capital passage indicator (⠠⠠⠠) support
- UEB be- / con- / dis- prefix shortforms with proper overrides
Windows ARM64 cross-compilation support including ARM64 wxWidgets build and MSIX bundle auto-detection
Language coverage
- 591 Norwegian (no-g1/no-g2) gold entries + Scandinavian BRF decoder
- 76 Swedish (sv-g2) entries from MTM Kortskrift PDF
- 215 English (en-ueb-g2) entries bulk-imported from UEB Rules 2024
- 453 French (fr-g2) AVH abbreviations
- pt-g2, cy-g2 promoted to stable
C API expansion with documented error model, new entry points for table enumeration and status queries; new C API README
Documentation
- docs/GETTING_STARTED.md onboarding guide
- C API README with usage examples
- All diagrams converted from Graphviz to inline PlantUML (light + dark)
- Accessible Doxygen HTML with auto dark mode (new in this release)

Changed

Engine refactors
- Extracted capital_indicators.cpp, compound_boundary.cpp, grade1_translator.cpp from the monolithic translator
- ContractionGuards struct replaces 9 ad-hoc lambdas
- pass2_processor.cpp — named pass2 phase classes, documented responsibilities
- Removed 222-line dead fallback contraction path
- table_manager.cpp broken into focused modules
Table accuracy — continued quality push on major tables:
- de-g2: BSKDL A1 coverage 90% → 98% (multiple whole-word batches, Fugen-s compound boundary fix, proper-noun/loanword additions)
- en-ueb-g2: 14 broken shortforms removed, new entries from UEB Rules 2024
- fr-g2: +453 AVH Manuel abbreviations (+585 words correctly translated)
- Scandinavian tables promoted to stable
Bindings: Python / C# / Flutter versions synced from 0.1.0 → 0.6.0 to match the core library

Fixed

Korean Hangul decomposition: MSVC silently re-encoded "\u3131" in CP1252; all tables now use explicit "\xe3\x84\xb1" UTF-8 byte escapes
UTF-8 bounds checks missing on several translator paths; silent try/except handlers replaced with typed error propagation (review fixes)
de-g2 compound boundary incorrectly blocked st contraction via Fugen-s guard
Capital indicator suppressed correctly after hyphens in compounds (per BSKDL convention) and before letter-prefix indicator ⠠
en-ueb-g2 con- prefix false positives in words like console, contest
en-ueb-g2: added Beatrice, Beatrix, Belinda, acknowledge shortforms previously missing
CLI stdin handling for piped input in Grade 2 mode

Technical

New central VERSION file (0.6.0) — single source for all version stamping
scons docs now produces accessible HTML with system-aware dark mode toggle, tree navigation, proper ARIA, and respects prefers-reduced-motion
site_scons/build_tools.py SDK packager now bundles CHANGELOG.md, LICENSE, and the generated Doxygen documentation
~5 GB of old benchmark data removed (word lists, HTML reports, per-table corpus outputs)

[0.5.0] — 2026-03-24

Added

Accessible HTML benchmark report (scons benchmark → benchmark/results/benchmark_report.html)
- CDN-based Pico CSS, dark/light mode toggle
- Full screenreader support (ARIA labels, semantic HTML)
- Sorted by language, expandable per grade/table
- Per-table: Braille system, standard reference, multi-comparison results
- SDK disclaimer about independent implementation and verification status
Chinese Braille engine: Hanzi→Pinyin pre-processing via embedded Pinyin dictionary
- 44,348 CJK→Pinyin mappings from Unicode Unihan (Unicode License, permissive)
- New pinyin_dictionary table field in table_impl.h
- zh-g1 accuracy: 0% → 91.8% (vs pypinyin reference)
Latin letter indicator engine feature
- New latin_letter_indicator table field: emits indicator once before Latin text in non-Latin scripts
- ru-g1 accuracy: 94.0% → 98.8%
Japanese Tenji table overhaul
- Added all Hiragana + Katakana to character map (152→237 characters)
- Fixed わ/を swap, added 33 youon combinations, 6 foreign sound combos
- ja-g1 accuracy: 0% → 97.3% (vs uhyo/tenji reference)

Changed

Benchmark accuracy improvements across 20+ tables:

Language	Table	Before	After	Change
Spanish	es-g2	15.0%	70.2%	+55.2pp
Hindi	hi-g1	77.8%	98.0%	+20.2pp
Portuguese	pt-g2	40.3%	52.1%	+11.8pp
French	fr-g2	28.9%	40.7%	+11.8pp
Arabic	ar-g1	89.5%	95.9%	+6.4pp
Dutch	nl-g1	91.3%	97.7%	+6.4pp
Russian	ru-g1	94.0%	98.8%	+4.8pp
Czech	cs-g1	95.6%	99.8%	+4.2pp
Polish	pl-g1	93.6%	98.1%	+4.5pp
Slovenian	sl-g1	96.0%	98.3%	+2.3pp
Spanish	es-g1	96.9%	99.3%	+2.4pp
Slovak	sk-g1	98.6%	99.3%	+0.7pp
English	en-ueb-g1	98.4%	99.3%	+0.9pp
English	en-ueb-g2	96.5%	97.4%	+0.9pp
Finnish	fi-g1	99.7%	99.9%	+0.2pp
French	fr-g1	99.4%	99.8%	+0.4pp
Italian	it-g1	99.5%	99.7%	+0.2pp
Portuguese	pt-g1	99.4%	99.6%	+0.2pp
Norwegian	no-g1	99.4%	99.6%	+0.2pp
German	de-g2	95.6%	95.7%	+0.1pp

Fixed

_batch_translate() now passes BRAILLEKIT_TABLES_DIR environment variable correctly
Chinese CCB initials/finals sections now loaded into character_map by table_manager

Technical

New files: benchmark/report_generator.py, benchmark/references/, benchmark/config/, core/data/pinyin_dict.json
Engine changes: table_impl.h (2 new fields), translator.cpp (Latin indicator + Pinyin pre-processing), table_manager.cpp (CCB/Pinyin loading)
All changes are table-configurable — no hardcoded language-specific code in engine