[codex] Prevent Hancom from rejecting HWPX roundtrips#40
Merged
Conversation
Keep section/header XML roots and archive metadata aligned with Hancom-authored packages so simple read-modify-save operations do not produce files that look damaged or tampered with. Constraint: Hancom Office is stricter than generic XML parsers about HWPML root declarations, standalone XML declarations, and OPC ZIP entry metadata. Rejected: Relying on XML well-formedness alone | it allowed files that validate in Python but can be rejected by Hancom. Confidence: high Scope-risk: moderate Directive: Preserve Hancom-compatible HWPML root metadata when adding new serializers or pack/unpack paths. Tested: python -m pytest tests/test_gap_closure_tools.py tests/test_opc_package.py -q; python -m pytest -q; pyright; real HWPX add_paragraph roundtrip validator/root namespace audit Not-tested: Manual opening in Hancom Office GUI
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
standalone="yes"on section/header HWPML parts during document serialization.package_validatorfail on the exact regression that produced Hancom “damaged/tampered” behavior, and normalize section/header roots when packing directories back to HWPX.Root cause
The previous save path produced XML that generic parsers accepted, but Hancom Office can reject: section/header roots lost broad HWPML namespace declarations, the XML declaration could omit
standalone="yes", and ZIP entries were rewritten without preserving original archive ordering/metadata.Validation
python -m pytest tests/test_gap_closure_tools.py tests/test_opc_package.py -q— 26 passedpython -m pytest -q— 256 passed, 2 skippedpyright— 0 errorsvalidate_packagepassed, section/header roots retainedstandalone="yes"and no required namespace declarations were missingNotes
Manual opening in the Hancom Office GUI was not performed from this environment.