-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Most development is done within a single dtx, and so most testing is likely to involve repeatedly running the specific test file for the dtx that the dev is working on - which is why having these tests is helpful. But eventually, when everything looks like it is working perfectly, we want to run a comprehensive test with the new code/changes.
Bart suggested doing this with an automated diff method that sounds great to me. To this end I suggest the following (which is - I think - largely what Bart had said):
We need a mechanism in place (maybe a github action? Or a button in the codespace under the extra tab? I think the action makes the most sense, but this is more @wiobber 's area so he should probably chime in on this part) that allows us to run a truly comprehensive check on test code that we have debugged to the point where we are ready to submit it for absorption into the core code.
The specifics for this may need to be discussed some, but I suspect we would want to compare (if possible) the html output for the core code version vs the current-testing code version. We also need something similar for the pdf output (since the pdf output is a core aspect of the ximera package) but this could be tricky since settings like margins and such can obviously greatly impact the diff - but hard coding these kinds of settings runs the risk of our diffs working for specific hard coded cases while being unstable under other settings (For example - maybe when running the test we find that the problem environment works great for 1 inch margins, but breaks spectacularly if the margin is set to 1.5 inches for some reason - and we had hardcoded the margin to 1inch for the comprehensive test runs, so we don't discover this until the update is pushed to the production code).
Once we have a good comprehensive test run mechanism that includes the following we can mark this issue as completed:
- Runs tests for every test file in the Examples repository - using both the production code and the current test code.
- Preferably by loading the xourse document class files rather than the individual ximera test files, so that it is immediately traceable to which dtx file is associated to an output.
- Preferably the code for the mechanism is capable of finding every xourse in the repo, so we don't have to worry about manually updating it if/when xourses are added/removed. Since the xourses load the test files, this also helps automate updates to reflect if test files are added/removed.
- Automated process via a single interaction
- For example: a button in a codespace, or (preferably) in a github action that launches and runs the comprehensive test, so devs don't need to manually test a lot of files, rather just hit a button and get a report once it is done.
- Tests run a diff (either literally or something similar if there is a reason a diff isn't ideal - I think a diff is the right way to go here though?) between something considered the "stable system" code (should be from the current production code?) and whatever code is currently being tested by the dev.
- The differences should be output on a per xourse or test file basis, so we can see if/when an unexpected change occurs, and the dev can be directed to the local xourse/test-file to use for further testing (so they don't have to keep rerunning the full comprehensive test to see if changes fixed unexpected issues)
- Ideally the "stable system" code would be dynamically pulled from the production release of ximera so it doesn't need to be manually updated when future releases are made, with an option to instead flag a different release (like a "dev release version") as the "stable release" to use for comparisons.
- Tests both the webpage output and the pdf output for differences - since these are the two main products of the ximera package.
- The obvious choice here is to run a diff on the html for the webpage. pdf seems less viable for something to run a diff on directly due to all kinds of weird formatting and other issues, but I don't have a better suggestion. This isn't my area, so maybe this is fine and I'm being paranoid.
- Again any found differences should be clear as to whether they were found in the html, the pdf, or both, for followup testing needs.
This is everything I can think of that the comprehensive test should cover, but @bartsnapp and @wiobber are much more knowledgeable for this aspect, so they may comment below with more. We should update the above list (or formalize it) once we have decided on the specifics, so we have a record of what and how the comprehensive test (should) work, for future development.