-
Notifications
You must be signed in to change notification settings - Fork 362
Closed
Labels
Description
A recent PR from the Papers with Code integration causes the integration tests of the Python library to fail.
This is because the PwC integration recently introduced a string with double whitespaces:
acl-anthology/data/xml/N19.xml
Line 5685 in 170ff97
| <pwcdataset url="https://paperswithcode.com/dataset/how-to-fix-quickbooks-error-30159-causes">How to fix QuickBooks Error 30159 – Causes & Fixes</pwcdataset> |
Our XML indentation function automatically "cleans" such double whitespaces to single ones, but is never called by ingest_pwc.py. When the integration tests load & re-save the files, the whitespaces are cleaned, causing an error.
IMHO this an example of the drawbacks of directly manipulating XML without encapsulation, and my suggested fix is to refactor ingest_pwc.py to use the Python library instead.