Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial requirements for the tool #1

Open
johndgiese opened this issue Jun 28, 2022 · 0 comments
Open

Initial requirements for the tool #1

johndgiese opened this issue Jun 28, 2022 · 0 comments

Comments

@johndgiese
Copy link

johndgiese commented Jun 28, 2022

We need a way to insert a documents's name, revision, and id into a docx file's header and footer.

When pandoc creates a word file from a markdown file, it inserts any top-level keys with string values in the markdown YAML into the docx's "Custom Properties". Originally, I'd tried to use a docx template that would grab these Custom Properties and insert them appropriately, using "fields" in the docx template. This works, but Microsoft won't update the files without prompting the user, which isn't acceptable.

Thus, instead, we'd like to add a separate python script that can stick the name, revision, and id into the docx files header and footer automatically. This script should be somewhat generic, as we'll re-use it on other projects.

Terms:

  • Text substitution: process of replacing a delimited bit of text, e.g., "{{somevariable}}", with another bit of text, e.g., "some value".
  • Associated markdown file: The markdown file that was used to generate the docx file.
  • Custom properties: Metadata in a docx file; viewable from "File > Properties > Custom".
  • Generated docx file: The docx file that is generated by the software after performing the text substitutions.
  • Source docx file: The docx file that contains the text to be substituted.
  • Main content: All parts of the docx file other than the header and footers.

User Needs:

  • To substitute text in docx files generated from pandoc for regulatory submissions.
  • For the generated docx files to NOT prompt users about field substitutions when they are opened.
  • To do so with minimal configuration / set up.
  • To be able to do so with a variety of docx files.

Requirements:

  • Software shall support text substitutions in headers and footers.
  • Software shall not allow text substitutions in the main content.
  • Software shall support docx files with multiple different headers and footers.
  • Software may retrieve the values to be substituted in the text substitutions from the associated markdown file's YAML front matter OR from the docx file's custom properties.
  • Software shall support text substitutions in tables.
  • Software shall preserve the substituted text's styling (e.g., bold, italics, font, etc.).
  • Software shall not modify the main content.

Hazards:

  • The docx files content changes in places or ways the user doesn't expect.
  • The source docx file is deleted or corrupted.

Design Notes:

I suspect https://python-docx.readthedocs.io/en/latest/ is the only dependency we'll need for the project. Please let David know if others will be needed and why. Note that Reece has tried using this, but there are some limitations with editing headers. Possibly see https://python-docx.readthedocs.io/en/latest/api/section.html#docx.section._Header.paragraphs

This functionality should be de-risked before going too far forward in this direction.

See python-openxml/python-docx#276 (comment) for a good example.

It probably makes sense to create a python script that has these arguments:

subdocx -i ./input.docx -o ./output.docx

If you grab the test substitution values from the markdown file, you'll need to add one more input file to this.

Verification Notes:

Write automated tests that cover each of these cases. You'll need to create a handful of docx files for this purpose. I think this is unavoidable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant