Skip to content
This repository has been archived by the owner on Jul 14, 2021. It is now read-only.

Input sanitization for pyYAML #5

Open
aliaras opened this issue Aug 12, 2012 · 1 comment
Open

Input sanitization for pyYAML #5

aliaras opened this issue Aug 12, 2012 · 1 comment

Comments

@aliaras
Copy link
Member

aliaras commented Aug 12, 2012

For certain tables, the outputs do not play nicely with pyYAML. In particular, there are frequently double quotes ( " ) within strings (I'm encountering that in the itemTypes table). Escaping these at this stage is hard, because there are plenty of legitimate double quotes used to indicate that the following is a string. When pyYAML encounters these quotes inside a string, it chokes and produces the following error (stack trace available if interested):

ParserError: while parsing a flow mapping
  in "invTypes.yaml", line 2, column 2139
expected ',' or '}', but got '<scalar>'
  in "invTypes.yaml", line 2, column 2256

This issue could be avoided by replacing internal quotes with their ASCII or Unicode equivalents, as suggested here: http://pyyaml.org/wiki/PyYAMLDocumentation#Scalars . Likewise, the string ":" fails for similar reasons.

There's also some HTML in these documents as well, which should probably also be stripped out (although might be harder).

@swsnider
Copy link
Member

I've uploaded a new full version of the database dump with double quotes replaced. Please let me know if this fixes your problem (it certainly seemed to on my end). According to my testing, this was entirely the result of non-quoted strings, the string ":" seems to parse fine in pyyaml.

As for HTML, I'm deliberately leaving it in. Per the design doc (https://docs.google.com/document/d/111_teE7hjzhwD1Mum1ML-wUt1LNGBOoHYxFOvlIJAvg/edit), this tool's only purpose is to convert the SQL tables to YAML. I'll be developing a toolchain shortly for dealing with the YAML files this tool produces in a clean way, but until then, you're going to have to deal with the html yourself. I suggest BeautifulSoup as a first pass, but it might be sufficient to just replace all html tags with whitespace.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants