-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preservation of comments and whitespace #28
Comments
Damn, that's high praise. Thanks! Glad you like it. It's the first general-purpose C++ library I've released for public consumption so it's been nice to see it well-received.
Yeah, this is a piece of functionality I've been asked about a couple of times now, though you're the first to actually provide a compelling use-case. It's tentatively on my mental road-map for |
Also, reciprocal praise: your work on profiling the STL headers (e.g. here) is something I referred to on a regular basis in optimizing compile times for the library, so thanks for that! |
You're very kind. I think if you're a certain kind of C++ dev, stuff like not including headers and, generally speaking, not having the compiler generate opcodes in the first place, leads to a certain shared philosophy. One which I would prefer to see more of on WG21, but that's by the by. Am I right that your iterators don't implement |
Quite likely. Most library functionality has been implemented in a very YAGNI way, i.e. I've only added something when I've personally needed it myself. I added the iterators to make ranged-for work and haven't really used them directly via begin()/end() at all. Which of course means it wasn't an active design decision not to implement the operator, it just hasn't been done. I can add that to my to-do list, too, though all PR's welcome! :D |
The need to retain comments across modification is becoming more urgent within our use case here, as a wider set of engineers use the tooling. We currently have TOML config files where the original source is commented with documentation describing each option, but once it passes through tomlplusplus all those comments get lost. This is quite unhelpful. If you even just added the ability for there to be a custom void * per toml table, into which we could store an unordered map of toml identifier to comment, that would be wonderful. |
Yeah, that'd be doable, though I'd likely make it opt-in to avoid ABI breaks for the masses. What say you to the idea of me exposing it as an "injectable", not unlike ImGui's IM_VEC2_CLASS_EXTRA and friends? |
One possible ABI neutral way is an "invisible" node type which doesn't appear in the normal enumerations. It only appears if you explicitly ask for it by name. Said custom node type could carry user defined payload. I'm not sure if you're already using it, but |
That value is not presently used in that manner, but I'd rather not go down that path. To be clear, I'm not at all concerned about ABI breaks for those opting in to this feature; it would be "buyer beware" advanced use case. I'm only worried about avoiding them for those not opting in. |
I assume |
If I were to implement this fully I'd likely treat the comments as metadata associated with the tables/arrays/values they relate to, similar to how the source information works now, rather than add them as nodes. But, er, unless I misunderstood what you were asking for as an interim solution, you suggested adding a public |
I know that toml11 handles comments by attaching them to nodes, but that falls down if there are blocks of comment without any obvious node to attach them to. What most parsers do in this situation is to have comment nodes which act like whitespace i.e. ignored by default. But separately to all that, what would be ideal for me right now is to be able to store custom data with a secret named key. Then I can do a poor man's regex parse of the incoming TOML, and say for a key named I know this probably sounds like I'm just going ahead and implementing comments, but because our TOML config files are super simple, I can hack a nasty solution which will work for our simple TOML. It wouldn't likely scale out well. Failing that, the custom void * per toml table can also work. Just be aware that we may see mixed I originally thought to simply subclass |
Yeah, it's not a perfect solution, but it's along the lines of what I'd do, too. Another problematic case is comments at the end of a TOML document, though in that case I'd probably just associate them with the most recently seen node. The rest can be glommed together and associated with the subsequent element al a doxygen markup and similar: # this comment would be assoicated with "key"
key = "value"
# this comment would be associated with "table"
[table]
# this comment would be associated with "table.subtable"
[table.subtable]
# ...and maybe this one? Or discard it? Obviously it's not perfect, but it would do the job for the majority of cases I'd think. It also has the practical upshot of maintaining semantic relationships between comments and toml data, whereas if they were two discrete node types, that relationship would be lost, as @levicki noted above. The alternative is a full DOM-style rendition of the TOML with whitespace and comments all treated as separate nodes in the tree, but that seems wholly overwrought for what is fundamentally just a set of key-value pairs, and is just not a direction I want to go in with this library. Regardless of how it's implemented, I won't be able to attack it for a while still, but I'm happy to patch in a simpler hook for you to add a tag pointer or something in the interim.
Yeah that's one way of doing it. See my example above for another. There's lots of wiggle room here, and no single approach is perfect.
I meant injecting stuff into the
If I went down this route I'd certainly make them mutable. It wouldn't be much of a stretch.
In the absence of concrete data I can only speculate, but I suspect the considerable majority of people who go looking for a TOML library do so because they want their application to read in a set of key-value pairs from a config, with serializing coming second and needing to mess with comments being a distant third. The library will always be structred with this in mind, so I won't ever make stuff like that the default. |
Well I disagree with that outright. You might like to think of comments as structures in the TOML tree, with a "proper" place, but thats something you've invented. They are not data. They're comments. Arbitrary bits of effectively-whitespace. That mental model you're using for them is as arbitrary and valid as any other, including one that ignores them entirely. In my experience a comment in a configuration file is typically documentation, and documentation typically precedes the thing it relates to. Sure there's exceptions to this, but I bet they're the minority. The TOML spec is irrelevant here.
There's nothing to disable. It's just a side-effect of using std::map. In fact, the TOML spec explicitly states that order doesn't matter, so the library is written with this in mind (i.e. things are serialised in implementation-defined order, because order has no meaning). As for the rest of it, you've basically just described a DOM tree. I'm not going to take the library in that direction. |
For any future readers of this discussion wondering why I'm talking to myself so much: it's because this part was actually a dialogue with @levicki, who appears to have since deleted his comments. I don't know why. The end result is that I look insane ¯\_(ツ)_/¯ I just want to take a moment to re-frame the discussion, because the potential for scope-creep is high, and this is a project I work on for free in my spare time. Ultimately for me to implement this in an order-preserving, whitespace-preserving, comment-preserving, DOM-like fashion, such that you could exactly round-trip a TOML file, would require effectively re-writing the whole library, adding a bunch of bloat, and making it much, much slower, for very little gain in the general case. Not to mention the time and energy! So, suffice it to say that I will never be doing that. That's not what I want for this library. That's a different library altogether. Anything I implement in aid of closing this feature request will be a "best-fit" compromise. I can also tell you outright I'm not ever going to bother preserving whitespace. Comments, sure, maybe, to some extent, but not whitespace. I might consider preserving order at some point, but that would have pretty bloaty knock-on effects. So, with that in mind, let's focus on just documentation comments to begin with, since that seems the most generally useful. |
For me personally:
In my opinion, comment2 is a comment about table2, and does not belong to table1. It is absolutely the case that one would want to comment on a whole table, as well as on individual keys in a table. Re @marzer's wish to not go full DOM, there are actually ways of implementing primarily a key-value store which is very fast and noncopying, and which preserves comments and whitespace, and avoids dynamic memory allocation and is never throwing. https://github.com/chadaustin/sajson would be the classic example, it mutates the original text save on memory allocations and memory copying. But he's right it's a complete rewrite, total rearchitecture. Ultimately I can live with per node comments if I also have the ability to keep a void* per table. |
@ned14 Yeah it seems your model for relating comments to content is the same as mine (and most coders more generally, I suspect). Should be easy to come up with something that works well for those cases. It seems there's some urgency here so I'll try to come around to this in a reasonable timeframe, but this year has taken a toll on my productivity in a lot of unexpected ways... |
Hey all, thanks for this library (which I'm working with from the python side) and this discussion! :) I agree with many of the points that were said already, but I want to bring in #62 more specifically again - I think it's a fool's game to try to guess which line a comment belongs to, as people have different conventions whether a comment makes more sense before or after (e.g. depending on field vs. array), sometimes even in the same file.
The larger issue is that TOML is specifically designed to be both human- and machine-readable (only that spec cares mostly about the machine side of things, while human interaction patterns are... more complicated). I'd argue that while the spec (and I know that this would make the implementation more complicated, possibly less performant, etc. But for a production-grade TOML library (which might take a while to evolve across the ecosystem), it's IMO a crucial piece. The good thing is that this library already has an abstraction layer over Assuming we can keep the order of nodes, it then becomes pretty arbitrary whether we attach comments & whitespace to the node before or after (just needs a special node at the end for either the first/last comment). |
@marzer I think I owe you an explanation.
I came upon your library and started using it in my projects exactly because it offered serialization unlike the other libraries that were good enough for just reading key-value pairs. It is one thing to use library to read key-value pairs and I agree that most people care about that (but then any other TOML parser will do, not just yours). However, imagine a situation where you have a bunch of services without user interface all using a TOML config file on say a bunch of ATMs scattered in remote locations. Imagine that config options are documented so that people who have to change them when doing field maintenance can have a clue what they are doing. Imagine also that sometimes you need to make an automated change on many config files remotely. You cannot make changes in text editor and just replace all existing config files because each config file has machine-specific configuration. So what are you options?
Same problem if you have a service that reads the config file and a GUI application that writes and modifies said file. If you need to change anything remotely, you need that config file to have comments. However, the next time someone runs that GUI application when doing field maintenance your comments will be gone. I am not aware of any TOML parser that can support those two in my opinion totally valid use scenarios for TOML config files. Yours was the closest until...
So with that in mind...
Well I guess you know why now -- you have made your mind and staying in that discussion seemed pointless. |
So, the above changed recently in our work codebase because the TOML file got big, and the ops team wanted more use of horizontal screen estate, so now we're doing:
Because our TOML is nice and straightforward, I have this awful munge piece of code that serialises the TOML, does some regex matching and patches in the explanatory comments for the keys before dumping to disc. It is slow, buggy, and in every way unsatisfactory. In any case, comment-next-line-node and node-same-line-comment both seem valid orientations. |
Eh. I disagree. The points I've outlined earlier in this thread cover why pretty comprehensively. Some humans care about that a lot, but many others do not. That being said, I would still like to put something in place to allow this for the sake those who would like it (e.g. @ned14), but any efforts to do so are going to necessitate a pretty comprehensive rewrite and I just don't have the time to do that currently. Plus, if I'm being honest, I'm pretty creatively burned out on this project at this point. Motivation comes and goes, but all I've been able to muster over the last 6 months or so is bugfixes and small refactors. I might get a bolt of energy for a big batch of overhaul/feature work at some point, but I don't want to promise anything.
That makes sense, I guess, but deleting all your correspondence (making the other parties look insane in the process) is not the same as simply bowing out of a discussion O_o |
@marzer answered on the issue from the tracker of the python-wrapper that I linked - I'm choosing to answer here:
That's... very unfortunate. Do I understand you correctly that you would not even consider/accept a PR that makes this optional on the From this thread
I read the thread, but I'm not so sure, since it effectively precludes all use cases that involve modifying a toml-file that's also editable by a human. And that's not a fringe use case at all, because readability and meta-information like comments are the key differentiator of TOML vs. JSON. And while most use-cases start out as "I need to read this data", the choice of format already signifies that human interaction is an important requirement, and automated reprocessing (if only for validation of the human edits) is something that almost always comes up soon afterwards. In fact, I'm not aware of a toml library that is able to do this for now, so it becomes a self-fulfilling prophecy that writing comes as an afterthought in use-cases (or rather, everyone has to swallow that they cannot do that automated reprocessing of human-edited files).
Sorry to hear - and very understandable; OSS maintainership is difficult at the best of times, much less the current ones. |
@h-vetinari I'd happily accept a pull request if you want to make it configurable with some drop-in map replacement, but do be aware that it needs to support heterogeneous lookup; last time I tried to find some insertion-order-tracking map with that capability I came up empty handed. Failing that, though, that machinery would need to be added to |
OK - not sure if/when I'll find time myself, but glad to hear that the door is not fully closed
Mostly leaving that as comment for myself so I don't forget - this SO question has some thoughts about this based on abseil's |
Well truthfully I don't consider anything to be fully closed in an absolute sense; I'd love toml++ to be the fastest, most fully-featured TOML library on the planet; the only blocker is this meagre human form and it's time-bound constraints, heh. I have to draw lines and avoid making commitments on things, and that usually ends up favouring the status quo (as you pointed out above). The problem is lessened significantly when it's not me doing the work :D (of course I can't expect the more long-term work pieces that require big rewrites to come in the form of random pull-requests, alas...) |
Let me first say that toml++ looks like a great library - thank you for making it available! But my two cents on this issue is that reasonable preservation of comments is a requirement for me for working with toml. I have the exact same use case as @ned14 described when he opened this issue. I am using toml as a config format for another library, which outputs toml with comments, and I need to tweak its values or add new values without removing the original comments. I look forward to toml++ supporting preservation of comments, but until it does, I am using toml11 because it has support for preservation of comments. |
I've started some refactoring and cleanup work on the v3 branch, and in doing so have removed
Good to hear that toml11 has the feature you need. It's a good library and the author is pretty active in maintaining it. |
Hello, First of all thank you very much for this awesome library. I've been using for quite some time now on top of my own project and I'm really happy about it. As many others here I also now ended up clashing on this feature request and at least I can explain my own use case, hoping it helps to drive this implementation forward. On my own case I have this toml file that contains a simple explanation on how the file will be used, but I'll extend this to have something like this other toml file. If I could preserve this part while parsing/saving it back it would be fantastic. Is there any way I could approach this even with the current status of the library? I'll be happy to implement it on my own side if you can please let me know how to achieve such a thing. Thank you in advance until then. //EDIT: Scratch that, even if this would be implemented unfortunately it won't fix my own use case as apparently this library doesn't support in-memory editing of the parsed toml. Thanks anyway! |
Huh? It absolutely does. There are many examples throughout the documentation. As for the rest of your query, no there's no built-in way to do that. My solution to that would be to keep the comment-based instructions in a separate file and concatenate the formatted toml together with the template before writing them to a file. |
I've seen only this one but it doesn't help my own use-case. Apparently any tentative i do to manipulate data once I parse a file, it doesn't work. The compiler refuses to continue as it doesn't recognize my left side assignment, if this makes any sense. Here is a super simple repro case: https://godbolt.org/z/Ph6MhE7fW
The problem I see with this approach is that it will preserve only a portion of it, but not all. For eg. on this case what would be the template? I have comments at the beginning and for each flag. I'm not sure how Tomlyn manages to handle it, but it does it quite right, as I'm using this library on this project which manipulates my If tomlplusplus could achieve the same result, it would make it the final library for C++! |
Because that's not how you inject additional values using this library. If you want to add new elements to arrays or new KVPs to tables, you need to first cast the node to the actual underlying container type and manipulate it that way - see the Working with TOML data example from the main page (specifically under the comment that says See also:
I'm not going to be implementing any form of comment/whitespace preservation any time soon, unfortunately. It necessitates a huge re-write and I just don't have the time these days. |
I see thanks nevertheless. If I may I found another issue as I'm using the correct method you pointed to work with the TOML data structure. As you can see, the first https://godbolt.org/z/P6c8n99nc Thank you in advance! |
I don't understand what you're trying to accomplish there; If you want to append to the table you created in the first call, get an iterator to it (returned yb the first call to As a general rule I've followed all of the idioms and semantics of the
(the only exception being |
Ok makes sense, thank you :) |
Example: config.insert_or_assign("hello", toml::table{
{ "key1", true }
});
toml::table* child = config["hello"].as_table();
child->insert_or_assign("key2", false); Since the casting stuff is bit tedious it's usually easier to build your configs bottom-up where possible, e.g.: toml::table hello{{ "key1", true }};
// ...
hello.insert_or_assign("key2", false);
// ...
config.insert_or_assign("hello", std::move(hello)); |
Works like a charm, just tested it now, thanks a lot! Looking forward to comment preservation hoping you'll find some time to implement it. Until then, thanks a lot of this amazing library! |
Hello, and thanks for this library which has let me get up and running very quickly! I read a little the previous discussion but couldn't figure out where it landed. Are certain types of comments supported now? Is it considered a feature creep area which will likely never see the day? I'm asking because, like the original poster, I'm looking to parse, edit and dump a config file and I would like to guide users a little on what setting does what using comments. So any comments support would be amazing, and I'm ok with using a given convention laid out by the lib on what gets kept and what doesn't. I have first tried toml11 but I'm facing an issue as that lib breaks the toml spec on dot notation access. This one however has been very straightforward and I feel is more actively supported, so I would like to keep using it. Is there any update on this issue? |
Not currently.
Very likely, yes. I don't really have the time to work on this library much any more, outside of bug fixes. Implementing this necessitates a very large re-write.
Interesting. What does it do? Have you filed a bug report on that repo? IME the maintainer has been pretty active so you might have some luck (cc @ToruNiina) EDIT: never mind, I've just found the discussion: ToruNiina/toml11#174 |
thanks for the reply! A shame but I understand. Maybe if I'm feeling it I'll try to make a PR! |
@marzer Sorry to bother, this isn't exactly related to this issue but since I was there I thought I might as well ask. I have not been able to figure out how to use insert_or_assign with a path (containing dot notations). Nor have I been able to find how I would go about assigning a value to a node retrieved with at_path(). Is there something I'm missing? I'm only trying to assign simple string and float/integer values. I have looked around the examples and couldn't find one where the key wasn't a root key, or retrieved with an iterator. I'm very confused. I would greatly appreciate your help! |
@mchaptel there's no way of doing that, nor will there be. See this discussion for an explanation. If you want to modify the document, you need to work with the hierarchy directly. You can make this easier by working with a toml:: path object alongside your lookups etc. |
thanks for the fast reply! I have found the way you showed above to get the parent as_table then assign afterwards, which might work for my purpose. |
i simply workaround by writing following c++ - if there's no existing config file:
the written file then contains:
sure, that isn't perfect .. but is good enough for me. |
Nope, not currently. Since toml++ does not regard comments and whitespace with any significance, there's nothing to preserve, so I guess your solution is as a good as any, heh. |
Firstly, this is one of those very few libraries I've used where I don't think I'd have done much different myself. Well, possibly I'd have made it always non-throwing. Well done on that, and thank you for making it open source.
Another thing that I would have done is optional comment and whitespace preservation, such that if you read in a TOML file, and then write it out, comments and whitespace are preserved. This is useful because I'm using your library a bit like how clang-tidy or doxygen do things, where you can dump out a default TOML config, hand edit it, or supply TOML delta files to merge against the default TOML config, and so on. Therefore, dumping out the post-merge TOML file, with all comments also merged, would be ideal.
For this to work right, you'd need to retain comments and whitespace as an attribute of the first node following the comment. Then, if the node gets replaced by a merge, one could retain the existing comment and whitespace, or replace it with the new comment and whitespace etc.
I'd personally make this opt-in, as plenty of people don't want to waste CPU nor memory on comments and whitespace. My thanks in advance to you for your consideration!
The text was updated successfully, but these errors were encountered: