Initial requirements for the tool #1

johndgiese · 2022-06-28T13:53:23Z

We need a way to insert a documents's name, revision, and id into a docx file's header and footer.

When pandoc creates a word file from a markdown file, it inserts any top-level keys with string values in the markdown YAML into the docx's "Custom Properties". Originally, I'd tried to use a docx template that would grab these Custom Properties and insert them appropriately, using "fields" in the docx template. This works, but Microsoft won't update the files without prompting the user, which isn't acceptable.

Thus, instead, we'd like to add a separate python script that can stick the name, revision, and id into the docx files header and footer automatically. This script should be somewhat generic, as we'll re-use it on other projects.

Terms:

Text substitution: process of replacing a delimited bit of text, e.g., "{{somevariable}}", with another bit of text, e.g., "some value".
Associated markdown file: The markdown file that was used to generate the docx file.
Custom properties: Metadata in a docx file; viewable from "File > Properties > Custom".
Generated docx file: The docx file that is generated by the software after performing the text substitutions.
Source docx file: The docx file that contains the text to be substituted.
Main content: All parts of the docx file other than the header and footers.

User Needs:

To substitute text in docx files generated from pandoc for regulatory submissions.
For the generated docx files to NOT prompt users about field substitutions when they are opened.
To do so with minimal configuration / set up.
To be able to do so with a variety of docx files.

Requirements:

Software shall support text substitutions in headers and footers.
Software shall not allow text substitutions in the main content.
Software shall support docx files with multiple different headers and footers.
Software may retrieve the values to be substituted in the text substitutions from the associated markdown file's YAML front matter OR from the docx file's custom properties.
Software shall support text substitutions in tables.
Software shall preserve the substituted text's styling (e.g., bold, italics, font, etc.).
Software shall not modify the main content.

Hazards:

The docx files content changes in places or ways the user doesn't expect.
The source docx file is deleted or corrupted.

Design Notes:

I suspect https://python-docx.readthedocs.io/en/latest/ is the only dependency we'll need for the project. Please let David know if others will be needed and why. Note that Reece has tried using this, but there are some limitations with editing headers. Possibly see https://python-docx.readthedocs.io/en/latest/api/section.html#docx.section._Header.paragraphs

This functionality should be de-risked before going too far forward in this direction.

See python-openxml/python-docx#276 (comment) for a good example.

It probably makes sense to create a python script that has these arguments:

subdocx -i ./input.docx -o ./output.docx

If you grab the test substitution values from the markdown file, you'll need to add one more input file to this.

Verification Notes:

Write automated tests that cover each of these cases. You'll need to create a handful of docx files for this purpose. I think this is unavoidable.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial requirements for the tool #1

Initial requirements for the tool #1

johndgiese commented Jun 28, 2022 •

edited

Loading

Initial requirements for the tool #1

Initial requirements for the tool #1

Comments

johndgiese commented Jun 28, 2022 • edited Loading

johndgiese commented Jun 28, 2022 •

edited

Loading