[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82796":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":14,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":15,"rankGlobal":9,"rankLanguage":9,"license":16,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":19,"hasPages":17,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":13,"lastSyncTime":24,"discoverSource":25},82796,"mathtypejx","a917470154\u002Fmathtypejx","a917470154","mathtypejx is a Python package for converting MathType and legacy Equation Editor OLE formulas in Word .docx files into native OMML equations.",null,"Python",42,4,2,0,42.1,"MIT License",false,"main",true,[],"2026-06-12 04:01:39","# mathtypejx\n\n`mathtypejx` is a Python package for converting MathType and legacy Equation Editor OLE formulas in Word `.docx` files into native OMML equations.\n\nThe core MathType Equation Format (MTEF) parser is implemented in Python. The final MathML-to-OMML step uses Microsoft Office's `MML2OMML.XSL`, which is normally installed with Office.\n\n## Features\n\n- Scan Word `.docx` files for embedded MathType OLE formula objects.\n- Extract MTEF bytes from OLE compound files.\n- Parse MTEF v5 and legacy v3 equation streams in Python.\n- Convert parsed equations to MathML, then to OMML.\n- Replace OLE objects in the `.docx` XML with native Word math.\n- Provide per-formula validation and conversion reports.\n\n## Development and Validation Background\n\n`mathtypejx` was developed primarily against Chinese Gaokao physics exam documents, where old Word files often contain MathType OLE formulas rather than native OMML. The private validation corpus is not redistributed in this repository, but the development run that drove the parser covered a 344-document Chinese physics exam corpus with 7,888 MathType OLE formulas. A normalized comparison against the Ruby-based pipeline recorded 7,886\u002F7,888 MathML and OMML matches; the two differences were overbar formulas where the Python path preserved the base character that the Ruby path dropped.\n\nThe formulas exercised in that corpus include common high-school physics notation such as fractions, roots, superscripts\u002Fsubscripts, isotope-style prescripts, vectors, overbars\u002Funderbars, hats\u002Ftildes\u002Farcs, large operators, limits, bracketed expressions, matrices, stacked equations, boxed or crossed-out terms, long division, color\u002Ffont records, and text-mode physical units such as `kg*m\u002Fs`.\n\nThe code also includes defensive handling for failures that appeared while processing those documents:\n\n- Missing or unreadable OLE binaries are marked failed and left in place.\n- Multiple MathType stream names are tried: `Equation Native`, `EquationNative`, and `Equation`.\n- MTEF v3 and v5 records are parsed separately, with future\u002Fcomment records skipped safely when possible.\n- `EQN_PREFS` nibble-packed data is bounded and includes recovery logic for over-consumption that can otherwise hide the equation body.\n- Top-level `PILE` and `MATRIX` records are accepted, and their internal `LINE` records are converted into the slot shape expected by the bundled XSLT.\n- Subscript\u002Fsuperscript movement handles parenthesized bases, isotope-style preceding scripts, invisible MathType spacing, and per-formula mover state isolation.\n- Text-mode characters are wrapped as MathML tokens so overbars and other embellishments keep their base character.\n- MathML normalization fixes missing namespaces, bare text inside containers, empty root-degree cases, and `mtext` baseline issues before OMML conversion.\n- OMML quality checks block replacements with empty critical slots, token loss, structure loss, matrix row\u002Fcell loss, delimiter loss, accent\u002Fbar loss, n-ary limit loss, or malformed XML.\n- Unsupported-character noise from `MML2OMML.XSL` is stripped from generated OMML where possible.\n- Failed conversions keep the original OLE object in the output document, and the report preserves per-formula status, risk level, warnings, and errors.\n\n## Install\n\n```powershell\npython -m pip install .\n```\n\nFor development:\n\n```powershell\npython -m pip install -e \".[dev]\"\npython -m pytest -q\n```\n\n## Requirements\n\n- Python 3.10 or newer\n- `lxml`\n- `olefile`\n- `python-docx`\n- Microsoft Office `MML2OMML.XSL` for OMML output\n\nTypical Windows Office paths:\n\n```text\nC:\\Program Files\\Microsoft Office\\root\\Office16\\MML2OMML.XSL\nC:\\Program Files (x86)\\Microsoft Office\\root\\Office16\\MML2OMML.XSL\n```\n\nYou can pass a custom XSL path with `--xsl`.\n\n## CLI\n\n```powershell\nmathtypejx health\nmathtypejx convert input.docx -o output.docx\nmathtypejx convert input.docx -o output.docx --xsl \"C:\\path\\to\\MML2OMML.XSL\"\n```\n\n## Python API\n\n```python\nfrom mathtypejx import convert_mathtype_to_omml\n\nreport = convert_mathtype_to_omml(\n    \"input.docx\",\n    \"output.docx\",\n    remove_edit_info=True,\n    parallel=True,\n    max_workers=8,\n)\n\nprint(report.succeeded, report.failed)\n```\n\n## Scope and Limitations\n\n- Supports `.docx` files, not legacy binary `.doc` files.\n- Supports MathType OLE (`Equation.DSMT4`) and older Equation Editor OLE streams.\n- Does not convert formulas embedded only as WMF images.\n- OMML output requires an available `MML2OMML.XSL`.\n- Public tests use tiny binary OLE fixtures and synthetic documents; large private exam corpora are intentionally not included.\n\n## Acknowledgements\n\nThis project builds on the public MathType-to-MathML work that came before it:\n\n- [`jure\u002Fmathtype`](https:\u002F\u002Fgithub.com\u002Fjure\u002Fmathtype) and [`sbulka\u002Fmathtype`](https:\u002F\u002Fgithub.com\u002Fsbulka\u002Fmathtype), Ruby implementations for reading MathType binaries and representing MTEF as XML.\n- [`jure\u002Fmathtype_to_mathml`](https:\u002F\u002Fgithub.com\u002Fjure\u002Fmathtype_to_mathml), which provides the original XSLT-based MTEF XML to MathML conversion approach.\n- [`transpect\u002Fmathtype-extension`](https:\u002F\u002Fgithub.com\u002Ftranspect\u002Fmathtype-extension), whose public documentation and bundled fontmaps\u002FXSLT ecosystem helped clarify the conversion pipeline.\n- The `mathtype_to_mathml_plus` Ruby gem, which combines the `mathtype` gem and XSLTs into a MathType binary to MathML conversion flow.\n\n`mathtypejx` is a Python implementation and packaging of this conversion path for `.docx` to OMML workflows, not an original discovery of the MTEF conversion model.\n\n## License\n\nProject code is released under the MIT License. Bundled font map assets under `src\u002Fmathtypejx\u002Fmtef\u002Fxslt\u002Fxsl\u002Ffontmaps` retain their upstream BSD-style notice; see `NOTICE` and the upstream `LICENSE` file in that directory.\n","`mathtypejx` 是一个 Python 包，用于将 Word .docx 文件中的 MathType 和旧版 Equation Editor OLE 公式转换为原生 OMML 方程式。其核心功能包括扫描并提取 .docx 文件中的 MTEF 字节，解析 MTEF v5 和 v3 方程流，并通过 MathML 转换为 OMML，最终替换 OLE 对象以生成原生 Word 数学公式。项目使用 Python 实现了 MTEF 解析器，并利用 Microsoft Office 的 `MML2OMML.XSL` 完成最终的 OMML 转换。该工具特别适用于处理包含 MathType OLE 公式的老旧 Word 文档，例如中国高考物理试题文档，能够有效提升文档的兼容性和可读性。","2026-06-11 04:09:16","CREATED_QUERY"]