Improved string quoting #41

jojoelfe · 2023-12-02T17:38:16Z

This addresses #35 and follows the official starfile specs a bit more.

During parsing strings can now be quoted using single- or double-quotes. For simple datablocks this is achieved using the builtin shlex module, for loops by replacing single quotes with double quotes. The shlex solution is probably better as it allows string containing one type of quote, quoted by another such as "This ain't perfect", which should work according to the starfile specs, but will not using the current solution for loop blocks. But I feel this might not be important enough to give up the simplicity of directly parsing using pandas.

During writing the user can now choose which quote character to use and whether to quote all strings or only white space-containing and empty ones. The quoting for loops is done by disabling the pandas quoting logic and instead "manually" adding the quotes if appropriate. This is because pandas will only add quotes if the string contains the delimiter, which is '\t'.

I think all of this behavior is covered by tests, but there are some caveats:

For loops an empty string is parsed as NaN and not an empty string. This is what to_numeric does and I could find no way of changing this. I don't know if this is important enough to give up the simplicity of it.
For very large loops writing might be slower, because the quoting logic will iterate over all values with applymap. No idea how much this affects performance. I added a test to check the write time for the million_row_starfile and it seems to be fine.

alisterburt · 2023-12-02T18:34:29Z

Happy Saturday morning @jojoelfe ! This is great - thanks for getting back to it. I'll take a look this afternoon ☺️

alisterburt

Awesome @jojoelfe, appreciate the effort here! Minor suggestions on naming and a quick q about what to do about nans for empty strings, should be able to get this in right away once those are addressed!

alisterburt · 2023-12-03T03:55:44Z

starfile/functions.py

@@ -35,6 +36,8 @@ def write(
    float_format: str = '%.6f',
    sep: str = '\t',
    na_rep: str = '<NA>',
+    quotechar: str = '"',


Suggested change

quotechar: str = '"',

quote_character: str = '"',

would prefer something more explicit and snake case, could you update in all relevant places?

alisterburt · 2023-12-03T03:56:50Z

starfile/functions.py

@@ -35,6 +36,8 @@ def write(
    float_format: str = '%.6f',
    sep: str = '\t',
    na_rep: str = '<NA>',
+    quotechar: str = '"',
+    quote_always: bool = False,


Suggested change

quote_always: bool = False,

quote_all_strings: bool = False,

simple name preference, could you refactor to this too?

alisterburt · 2023-12-03T03:57:15Z

starfile/functions.py

+        quotechar=quotechar,
+        quote_always=quote_always,


to be updated wrt previous comments

alisterburt · 2023-12-03T03:57:53Z

starfile/writer.py

+        quotechar: str = '"',
+        quote_always: bool = False,


alisterburt · 2023-12-03T03:57:59Z

starfile/writer.py

+        self.quotechar = quotechar
+        self.quote_always = quote_always


alisterburt · 2023-12-03T03:58:07Z

starfile/writer.py

+                    quotechar=self.quotechar,
+                    quote_always=self.quote_always


alisterburt · 2023-12-03T03:58:13Z

starfile/writer.py

+                    quotechar=self.quotechar,
+                    quote_always=self.quote_always


alisterburt · 2023-12-03T03:58:28Z

starfile/writer.py

+    quotechar: str = '"',
+    quote_always: bool = False


alisterburt · 2023-12-03T04:00:01Z

tests/test_parsing.py

+@pytest.mark.parametrize("quotechar, filename", [("'",basic_single_quote), 
+                                                 ('"',basic_double_quote), 
+                                                 ])
+def test_quote_basic(quotechar,filename):
+    import math
+    parser = StarParser(filename)
+    assert len(parser.data_blocks) == 1
+    assert parser.data_blocks['']['no_quote_string'] == "noquote"
+    assert parser.data_blocks['']['quote_string'] == "quote string"
+    assert parser.data_blocks['']['whitespace_string'] == " "
+    assert parser.data_blocks['']['empty_string'] == ""


this is sick 🙂

tests/test_parsing.py

alisterburt

Excellent! I'll push a release too, thanks again @jojoelfe 🙂

alisterburt · 2023-12-05T01:11:52Z

release pending https://github.com/teamtomo/starfile/actions/runs/7094776248

jojoelfe added 9 commits July 8, 2023 20:07

Initial tests and support for other quotechars

56e6c0f

Quote parsing passes tests

6f85fa4

Merge remote-tracking branch 'upstream/main'

a7d7407

Test files

2ab7a6f

Writer tests

722099a

First iteration passes tests

a3834c1

Added option to always quote strings

7fba398

remove unneeded option

30bebdc

Add performance test for writing

ce192b1

alisterburt reviewed Dec 3, 2023

View reviewed changes

Renamed options and better empty string handeling

c96dbf1

alisterburt approved these changes Dec 5, 2023

View reviewed changes

alisterburt merged commit 9f3fe21 into teamtomo:main Dec 5, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved string quoting #41

Improved string quoting #41

jojoelfe commented Dec 2, 2023

alisterburt commented Dec 2, 2023

alisterburt left a comment

alisterburt Dec 3, 2023

alisterburt Dec 3, 2023

alisterburt Dec 3, 2023

alisterburt Dec 3, 2023

alisterburt Dec 3, 2023

alisterburt Dec 3, 2023

alisterburt Dec 3, 2023

alisterburt Dec 3, 2023

alisterburt Dec 3, 2023

alisterburt left a comment

alisterburt commented Dec 5, 2023

Improved string quoting #41

Improved string quoting #41

Conversation

jojoelfe commented Dec 2, 2023

alisterburt commented Dec 2, 2023

alisterburt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alisterburt left a comment

Choose a reason for hiding this comment

alisterburt commented Dec 5, 2023