-
-
Notifications
You must be signed in to change notification settings - Fork 237
feat(recipe): Add support YAML recipe #3027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ PR OK, no changes in deprecations or warnings Total deprecations: 0 Total warnings: 0 Build statistics: statistics (-before, +after)
-executable size=5105200 bin/dub
-rough build time=59s
+executable size=5154440 bin/dub
+rough build time=60s Full build output
|
Note that I did mention it at a DLF meeting a while back. |
This makes dub recognize `dub.yaml` as a configuration file format. The code is very similar to the JSON one, as Dub has been using the YAML parser for multiple years. It allows us to use a configuration format that is in widespread use and is more human-oriented than JSON.
I remember calling you out on this possible plan when you first integrated Configy. Now, the main argument that has been made in the big community outcry back when the JSON/SDL topic came up - namely fragmenting the package format landscape and adding maintenance burden - is still standing. What has lead people to change their mind on this? Back then there were very strong feelings associated with this. Now, admittedly, I don't quite agree with the gravity of the fragmentation argument. It's more so that this whole discussion, which has been brought up time and time again, has cost me a lot of motivation and, apart from the objective aspects, it still bothers me a bit to now see someones favorite color of the shed just being sneaked in. But regarding YAML itself, I really think the language design is flawed in many ways and not a technically good choice. The main argument is that it's frequently used for CI configuration files and in some other more or less related areas. But in terms of usability and simplicity, SDL is just superior for this use case. The fact that something is technically easy to implement alone is a very weak argument when it comes to designing a system - no matter whether Confiy just happens to be able to parse YAML or DMD conveniently has C bitfield semantics already implemented. Anyway, apart from all of this, this is a change with a strong long-term impact that IMO warrants a proper community discussion. And if we go ahead with this, there is also quite a bit of documentation to be amended (and maintained, in case of new recipe features). |
Interestingly, things went the other way around for Configy. I ended up abstracting away the YAML dependency, and you can now use it to parse SDL, JSON, TOML, as long as you have a backend for it. I actually tried to remove the YAML dependency in Dub as a result (see the first commit of #3023 for the JSON backend), but I was not getting column information (only lines) so it was a bit of a regression. For further reference, Configy was introduced in #2280 . My stance at the time was against adding YAML support: #2280 (comment)
Additionally, @atilaneves ' post was suggesting we migrate everyone away from one of the format. The argument I remember at the time of the SDL discussion was that, had SDL not been made the default immediately, things would have gone over a lot smoother. It also didn't help that the discussion was started by Andrei. As you can see from the diff, YAML will not be the default, and it will only take precedence of
Given the amount of discussions we had on this, the feedback from the two BDFL, and the time that has passed, I hope that PR doesn't count as sneaking it in!
I'm not going to argue here. But when I open
I strongly agree with this, and it became obvious to me, as soon as saw your message, I should have expanded on the rationale. Ease of implementation is not the reason we should do it. At most, ease of implementation helps build confidence that the feature works, and simplify code review, but it should never be a justification for the feature itself.
YAML has its flaws, but is not a bad language, and has definitely gained a massive amount of traction. As discussed, everyone on Github is exposed to YAML. It's hard to expose how important the lack of tooling is, because every single example can be discussed, cherry picked, but it's really a death by a thousand cut. No out-of-the box support for editors / online code viewer, few parsers / emitter available (especially in other languages), little references, very poor support for LLM generation as the language is just now that common, etc...
I'm okay to revive the discussion, but I don't think the forum is particularly productive there. I know Walter and Atila are on board, or at least don't oppose it, and last time I mentioned it at the DLF meeting, I don't recall any objection. |
I don't want to get into the SDL/JSON discussion again, I'm tired of it, as the same arguments are repeated over and over again, everyone focusing on their own specific favorite point. As far as I see it, the real mistake happened earlier and was keeping the JSON support in the first release version of DUB, the transition plan to SDL had already been there for a while at that point and JSON was never really a good format for the purpose, it just happened to be convenient due to the existing implementation(s).
I remember you pushing for YAML in multiple places and occasions, but I don't think it really matters in the end - I just think the now technically convenient possibility should receive the same scrutiny as the earlier package format topic, because a fundamental change like this has a strong long-term effect and we should carefully weigh the costs and benefits.
Yes, I should have put that in quotation marks. Back when the SDL/JSON discussions happened there simply was a lot of community activity around this, which is not really happening in this case (maybe because the forums were just more active in general, maybe because the topic just isn't as interesting anymore). It would have been unthinkable to just open a PR back then to deprecate JSON support or to add another format without causing huge backlash.
I actually implemented a syntax highlighting package for Sublime Text 10 years ago and I don't think it took me more than an evening and it should be easily adaptable to VSCode: https://github.com/s-ludwig/sublime-sdlang GitHub appears to use third-party syntax highlighting modules, so that might then also be solvable with a simple support ticket: https://github.com/github-linguist/linguist/blob/main/vendor/README.md
Despite JSON being the default when creating new packages, SDL makes up about one third of the packages on code.dlang.org. My question would be though, why is anything other than syntax highlighting support even relevant for our use case more in-depth SDL reference, parsers in more foreign languages)? Just for fun, querying Gemma3 (12b) to write a package recipe with some platform specific parts, it did hallucinate in the dependency and platform specific parts, but it nailed the SDL syntax. JSON was very similar in that regard. This might differ for larger models, of course.
Definitely, the forum is awful for anything that offers the opportunity for bike-shedding. However, I think the discussion should follow a similar structure and should not merely rely on a more ephemeral medium like a video conference or IRC/Slack. The main points of concern that I see are long term vision (e.g. keep all three, narrow down on one/two?) and playing through different scenarios with an extended package format syntax (e.g. DEP3/DEP4). Also, do you have a complex example of how a complex YAML recipe would look like with this PR? |
I see no problems here that didn't also exist for the inclusion of SDL support, and since those seem to have been ignored with no consequence, it seems safe to ignore them once again. I would definitely appreciate a common format that supports commenting. |
@Herringway: SDL support has been there from the beginning (before DUB was first released as such), the discussion broke out when SDL was to be made the default format. Apart from that, as already said, the plan was to make it the only format that is human authored, not an additional one. Lastly, it's a fallacy to try to justify a supposed mistake with an analogous earlier supposed mistake. |
My earliest packages (https://github.com/Herringway/natcmp, for example) predate the initial work to support Precedence is not a fallacy. It isn't necessarily proof, either, but I am seeing no reason to disagree with it at this time. |
This was way before it became the official package manager and its first official release in 2016. Anyway, the plan at that time was to replace JSON, being a permanent alternative is what came out of that infamous discussion.
So if you shoot yourself in your left foot, it's okay to also shoot in the right one, because of precedence? |
False equivalence is a fallacy. Shooting yourself in the foot has obvious negative consequences. Supporting a new format does not. |
They may not be as obvious, but there are all kinds of potentially negative impacts that need to be taken into account (maintenance and documentation costs, system complexity and implementation size, user confusion and communication overhead, possibly making it more difficult to make format changes or adding constraints on how format extensions can be represented are a few that come to mind). The fact that removing an existing format has a very high cost, means that those costs and the benefits should be weighted very carefully against each other. |
It's been over ten years since SDL support was added. We have ten years of data to measure these "potential" impacts with. |
Note that the statement after the one you quote answers this: Why don't we have out of the box support for SDL in VSCode and Github ? Because:
I know about linguist, and when things broke for syntax highlighting on Github the issue was handled quickly. But it's been 10 years and Github still doesn't support SDL syntax highlighting despite the ease to do so, as you pointed out. Perhaps the problem is not with the difficulty of the task, then.
I'm not sure I follow the question. As pointed out earlier, this is death by a thousands (paper) cuts: unfamiliarity, lack of tooling, lack of libraries.
I think, before said discussion happens, we should define the criterias for rejecting or approving it. We've always followed a BDFL approach, so I think if both Walter and Atila are in agreement, we should go ahead with it, unless you have another suggestion.
I think we should add support for YAML and gradually deprecate JSON and SDL going forward, first by removing it from When it comes to extend the package format, that was not yet in scope. I would not like to bundle too many changes at once, but perhaps it's an area of discussion as well. |
I'm stating that writing a syntax highlighting file is minimal effort and already done and you reply with "We don't have the capacity / people to build the tooling required"? I'm pretty sure that doing the YAML addition/transition requires a lot more capacity and the impact of missing GitHub/GitLab support doesn't seem to be particularly high.
Perhaps the issue is simply that nobody has made a request to add support based on an existing repository? There are lots if obscure formats in the list and I don't expect them to just constantly search the web for new formats to support.
I'm asking why you think that lack of libraries is relevant to DUB and I'm asking why you think that "little references" are relevant. I'm asking that in a concrete way, as I cannot see where you see the "thousand cuts" theory playing out in practice - to me this is just an empty phrase without at least some kind of evidence.
I don't care how the final choice is made, whether this is a decision from the top, a community vote or Herringway who decides this. What I do care about is that this is an informed decision with a well founded long-term vision. Discussing this in the next meeting makes sense, although I don't think that that's a particularly good medium for working out a vision (the forum format IMO not bad for this, it should just be a smaller group of people involved).
This honestly sounds like an extremely bad trade-off in terms of ecosystem health, even if YAML would ultimately be deemed superior. A change that breaks all packages in existence for zero benefit to those packages, except maybe in cases where the package author is just happy to switch to YAML. It's exactly the kind of change that you'd want to avoid. The only way I can see this being a net positive would be if we assume that the community grows heavily in the coming years, so that eventually the advantages of the new only format outweigh the damage done by the removal. |
Agreed. JSON is deficient (lack of proper comments are a pretty common complaint) and SDL is just too obscure. However, I don't think it's reasonable to remove them, since there are a number of packages will never be updated... Perhaps if enough of the packages also fail to compile? |
You may be waiting a very long time for someone to volunteer for this. |
I would also like to point out that since dub has been using a YAML parser to parse dub.json files without JSON validation for some time now, it's quite possible that some misnamed dub.yaml files already exist in the wild. |
So what I gather from this thread is that the only compelling reason to make this highly disruptive change is that GitHub/Gitlab support YAML syntax highlighting? I'm struggling to see why that merits such a disruptive change. |
I vehemently disagree with yaml support (my position has not changed). It's a terrible format, and adding more format support isn't going to fix any of dub's real problems. It will just add to the problems. Will bring this page up again: https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-from-hell My recommendation has been if we want to support a better format to go with json5. But better still, we should focus on the existing problems in dub before adding more formats. Maybe start with making it so dub can edit its known file formats without destroying the formatting? #2575 |
No, the main reason is familiarity. YAML is by far more common, which leads it to being much better supported everywhere, of which syntax highlighting is an example.
All of those problems are fixed by quoting strings. It's not rocket science. |
"Building the tooling" is not solely about writing code. It's also integrating it and maintaining it. We're all experienced programmers here and we all experienced how much maintainance actually cost compared to the initial development time.
D's ecosystem weakest point is tooling, I hope we can agree with this. To me this is a consequence of being spread to thin, something exacerbed by our NIH tendencies. Today if I want to build tooling for D that somehow uses the recipe file, not only do I have to build the tooling, but most likely I have to select among a short set of SDL parsing libraries, or build one myself. But then I also have to make sure I provide good error handling, and god forbid if we ever want to update the SDL format.
Appreciated.
If it was up to me, I would let SDL / JSON fade for the foreseable future because I'm sure YAML is going to be widely adopted. But the opinion expressed by one of the BDFL already shows that's not going to be a workable path. The timelines for doing so can be quite long though, which would have a similar impact. And as already mentioned, we can put that restriction on projects, which by definition would only affect actively maintained projects. |
SDL is never going away. Anything we add here is adding, SDL will always be there, because nobody is going to go back through all released revisions of all projects and change We will continue to have the problems of SDL, and add all the problems of YAML. This goes for any new format we add. We should only add a format if it's worth adding on its own. If you want a prominently used format, use JSON. There is plenty of tooling for JSON.
So we will have a format where it's easy to put in bizarre corner cases, and our response to users who have those issues is going to be that it's YAML, so you can write bad formats, nothing we can do? This reeks of "you're holding it wrong" scolding. The beauty of JSON is that it is never ambiguous. It's definitive, and there are no "options" when it comes to how to specify things.
I missed this before. I have objected every time I'm at the DLF meeting. I shouldn't have to attend them all to bring this up every time. |
There are options, one of them you've been recommending (JSON5).
We need to continue supporting reading the current SDL format. That's it. We can get rid of SDL formatting code (support as target in Because YAML is a superset of JSON, we get it for (almost) free. SDL has a cost. Going with JSON5 would be less disturbing than YAML, and the upside is IMO pretty limited. |
So when someone looks at a package, and reads the dub.sdl, what are they going to do? How do they look up how to edit that file, or what things mean? The documentation is already there, why would we ever get rid of it? I don't see the point of removing SDL, even if we don't want to encourage it. We literally cannot get rid of it, because of existing packages. And if we have to support it, we should keep documenting the support. |
There can, however, be extreme differences in the ratio here. In the concrete example of the syntax highlighting module, maintenance cost has been more or less zero.
If you want to make a tool that interacts with the recipe file, you should use dub as a library and not reinvent the whole package parsing and higher level logic - that just begs for problems as the package format evolves and multiplies maintenance costs. But the more important point is, having a third format that needs to be supported doesn't make for a good argument in terms of facilitating easier writing of tooling. Even if everything but YAML would be dropped 10 years from now, you would still have made it harder to write tooling until then. And just not supporting the existing formats will not improve the tooling landscape, but will just make it appear more broken. |
I agree that YAML has flaws. I also agree that better formats exist. However, we need to separate our personal feelings, looking for the perfect solution, and making a decision which benefits the project. It is quite clear that YAML won this particular format war - it's used by GitHub, GitLab, all the other big CI providers, Kubernetes, Ansible, Docker... we could revisit this in a few years, but fairly sure the list will have only grown by then.
By the time SDL support in Dub is deprecated and then removed, they would look at the documentation of the Dub version from the time the
Even if we delete it from a new version of Dub, old versions are still available for download. Maybe we could put old Dub documentation on https://docarchives.dlang.io/ if we need to.
Sure we can. We have a deprecation policy for things like language features and Phobos functions, we can do the same thing here. What's the point of supporting a package definition format from 20 years ago if none of the programs from then will build with the current compiler anyway?
Yes, I haven't needed to update the Emacs package for SDLang since I wrote it 8 years ago. However...
|
My objections to YAML aren't personal. It's objectively a bad format. Yes, I understand that it has "won" when it comes to people being forced to use it. However, the plan here shouldn't be to force people to use YAML. This would be a bad move IMO. The existence of SDL does not discourage people from using D. Removing SDL would definitely alienate existing users. Forcing YAML on people would definitely alienate users. Adding YAML as an option will not gain users (nobody is going to use D now because dub does YAML). The discussion here is literally only about us here, do we want YAML or not? All of us will continue to use D whether YAML is there or not. There is no checkboxes on anyones list that this helps with. It costs almost nothing to support SDL. Why is SDL such a bad format that we should remove it? Nobody has explained this. The same goes for JSON. Note that neither SDL nor JSON formats have changed. Not including new dub features in JSON or SDL would be mostly spiteful. You would probably have to reject PRs that fix any issues. I have multiple projects that I inherited (and some that I started) that use SDL as the dub format. I don't use it for new projects, because JSON is easier to use for me. But I certainly don't want to switch to JSON on those or YAML, SDL is fine to use for maintenance. If YAML gets added, I will be sad, but move on. If SDL or JSON gets deprecated, I will be mad. Deprecating is a bad move here that gains us nothing, and just adds friction to using D. |
Let's please see these people who have somehow avoided GitHub, GitLab, all the other big CI providers, Kubernetes, Ansible, Docker, and plan on avoiding all future software that will use what is probably the most common configuration file format used today... Sorry, it's difficult for me to empathize with this. I can't see this as anything but a radical personal opinion.
There is a rather long discussion above that explains this. To bring it into context: the merits of the file format itself are almost entirely irrelevant, the popularity is what actually matters.
Understandable, but please also take into account that this is one Steven Schveighoffer versus many people who either don't care or would be happy to just use a format they're already familiar with, that their text editor can syntax-highlight / auto-format / auto-syntax-check etc. |
Note that the time horizon for deprecation is long. Honestly long. The steps would be:
If X = 4, Y = 10, Z = 10, you're looking at 6 years before we issue an error because your dependency uses |
I'm not talking about some religious quest to avoid YAML at all costs. I'm talking about people who have built software with D for years being forced to switch for a preference here that isn't really justifiable. These are people who would probably not even notice if dub continued to support the format they currently use, but are now yelled at by users or tools to change. Is there a good reason for it? It doesn't look like it to me. While I don't like YAML, and would prefer another format if we are to add one (and this is a personal opinion where reasonable people can disagree), removing existing support is a completely different matter.
How does supporting YAML and SDL at the same time make things worse? I understand that YAML is more used. But why would maintaining support for SDL ruin the popularity of YAML?
Sure, but this is avoiding the point that I'm making -- we shouldn't deprecate SDL or JSON if we add YAML.
But why do this? Nobody has explained why it is beneficial to remove support. Not supporting json is even more puzzling since we currently use a yaml parser to parse it. It costs nothing. The cost of removing support means existing dependencies will break. It is not impossible to imagine that a project that builds today will continue to build in the future. It's not impossible to imagine a package whose older release continues to build in the future. What I think we should avoid is a situation where something fails to build solely because we forced a format change. |
We also remove Phobos and language features which we realized are not great and have built better alternatives for. This isn't really very different.
For the same reason we don't keep old things around forever in Phobos and the language. Keeping them around is a disservice to future users and maintainers.
I think we should, because having a single, widely used configuration format for package definitions, used by all actively maintained Dub packages would be an improvement over the status quo, where you need to be prepared to work with two or three formats. Deprecating and removing support for all but one configuration file format removes friction when:
Moving the community to a single file format will also involve friction, yes, but it will be a one-time effort per package, spread over many years, and we already have scaffolding for automatic conversion.
Yes, we technically can't not support it. However, we should (eventually) stop encouraging it either. (I would say that looking for a |
YAML is many config formats in one. You will have to learn more to understand config file formats (Including JSON!) when reading people's configs.
This is a non-issue. There is a button right in the docs:
We don't need to deprecate JSON or SDL to get rid of the question.
Again, this is not an issue. There is a default, most people starting accept the default. Changing the default isn't going to cause friction or remove friction.
I would assume we serialize to structures and not use the config file as parsed? If not, we really should be doing that. YAML shouldn't make that any different. The file conversion can never be removed, unless you want to make it more difficult to convert to YAML. It is clear this is going to happen whether I like it or not. I guess we will see what happens when the deprecations are turned on. |
Technically true, but this doesn't seem to be a problem in practice. Either by convention or due to the prevalence of linters or auto-formatters, almost everyone is using a reasonable subset of YAML that's easy to read and write. I've used YAML for years before seeing or using the references feature or any block modes other than
The existence of the button is part of the problem :) It's one more thing to spend attention on, and one more thing to maintain while working on the docs.
Fair.
There is more to it - e.g. we need to support editing the configuration file when the user runs
Not sure why you say that, my opinion is not any more important than yours. I just strongly disagree with it! |
I was getting the opposite impression. Nearly everything that interests me gets rejected in the end. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I suggest the following way forward:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should, at the very least, figure out a solution for dlang-community/configy#66 before this is added.
I am aware. I am not literally telling people to go use C++. I am making the point that "everyone else does it" is not a reason to do something, otherwise we should be using the most popular systems language instead of D. Read between the lines a little The people that have been using D for decades plural are already familiar with JSON/SDL. The main argument people have been giving is that YAML is familiar to newbies (or more directly saying "everyone else does it"), but the difficulty of learning SDL is not worth mentioning when compared with the difficulty of learning D and the exact keys/values DUB expects. Most newbies will glance at it and know how it basically works then copy chunks from the docs; it will not break if their editor is configured in a 'problematic' way either There is the argument for tooling, like generating/parsing DUB recipes, but if you get to choose your format JSON has the best tooling, and if you can't fragmenting the ecosystem makes this worse There is nothing to be gained from adding YAML and so deprecating SDL/JSON can do nothing but harm |
It's not a reason to not do something either. It's just irrelevant noise. The inverse of a fallacy is also a fallacy.
I really don't see how the difficulty of learning D justifies the existence of barriers to entry.
I don't understand. Is fragmentation good or bad? Because combatting fragmentation is going to require deprecating at least one format. Ignoring the mentioned benefits of YAML doesn't make them go away, either. |
Adding yet another format and letting SDL languish would be a poor choice imho. A large majority of my (~38) packages use the SDL format because it's easy to read and write; and it doesn't use indentation for scoping which is a big pain point for yaml. I have no desire or intention of switching away from SDL; so if that languishes that would be the point I look into switching away from hosting my D code on dub. |
An additional note; yaml has shown no tangible benefits to the Inochi2D Project. I already use it to write workflows for github CI and yaml makes that process very error prone. New contributors are easily onboarded to SDL, wouldn't be able to say the same for yaml given its footguns. SDL works fine; while I can't decide whether or not it should be a default format: it is the format which is used throughout all of the Inochi2D Project and my business and the format I instruct any contributor working on new components/libraries to use. In my own experience SDL has not been a hindrance for adoption of DLang. |
Such PRs should be prepared only after some joint agreement of the majority of community. |
This makes dub recognize
dub.yaml
as a configuration file format. The code is very similar to the JSON one, as Dub has been using the YAML parser for multiple years. It allows us to use a configuration format that is in widespread use and is more human-oriented than JSON.