Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed errors in the AST instead of raising #10

Closed
panglesd opened this issue Mar 7, 2023 · 6 comments
Closed

Embed errors in the AST instead of raising #10

panglesd opened this issue Mar 7, 2023 · 6 comments
Labels
forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system.

Comments

@panglesd
Copy link

panglesd commented Mar 7, 2023

Currently, when ppx_custom_printf encounters an error, it uses the raise_errorf function to raise a located error.

The exception is caught by ppxlib, which in this case:

  • Catch the error,
  • stops the rewriting process
  • add the error (as a [%%%ocaml.error ...] extension node) to the last valid ast
  • Use the resulting AST

The interruption of the rewriting is quite bad for the user experience! The implication for the users are:

  • Since ppx_custom_printf runs at the "context-free" phase, the "last valid AST" is before the context-free phase. So, no other derivers/extenders get run, which generates a lot of noise in the errors (such as "uninterpreted extensions" or "unbound identifiers")
  • Only one (meaningful) error from your PPX is reported at a time.
Example

For instance:

let time = 1

let invalid1 =
  Format.printf !"The time is %{invalid#invalid} and the timezone is." time

let invalid2 =
  Format.printf !"The time is %{invalid#invalid} and the timezone is." time

let valid = Format.printf !"The time is %{Valid} and the timezone is." time

would report several errors:

  • string "invalid#invalid" should be of the form [...] for invalid1: the right error
  • this expression has type string but an expression was expected of type 'a ref for invalid2: the wrong error, it should be as for invalid1
  • this expression has type string but an expression was expected of type 'a ref for valid: an error when there should not be one

You can find more information about error reporting in PPXs in this section of the ppxlib manual.

❓ Would you be willing to accept contributions to this issue? I'm considering assigning its resolution as part of an outreachy internship: see more information here.

@github-iron github-iron added the forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system. label Mar 7, 2023
@v-gb
Copy link

v-gb commented Mar 7, 2023

Such a change seems reasonable, but why not change ppxlib instead to use the mechanism you refer to when an exception is raised instead of replacing the whole file? That'd seem much better.

@panglesd
Copy link
Author

panglesd commented Mar 8, 2023

Thanks for your answer!

This is a valid question! First, let me argue that even if we changed the above mechanism in ppxlib, this issue is still relevant, as it is still better to embed errors instead of raising: embedding errors allows to report multiple errors at once!

For instance, in

let invalid1 =
  Format.printf !"The time is %{invalid#invalid} and the timezone is %{invalid#too}." time timezone

you ideally would like both invalid syntax to be reported at once. Also you would like to have that the format string has two "hole", for merlin to know the right arity of the function.

Now, why not change ppxlib's behaviour?

The behaviour I explained was introduced a bit more than one year ago (before, it was even worse: when an exception was caught, the whole ast was replaced by the error node; this means that only one error was reported at all, but more importantly, no Merlin feature (such as jump-to-definition) could work...)
When deciding the right behaviour, we had two options:

  1. Catching an exception stops the rewriting process, and adds an error at the beginning of the last valid AST
  2. Catching an error adds an error to the last valid AST, and continues the rewriting.

We went for solution 1. I think the main reason were that we wanted to give a way for rewriters to stop the process, and exceptions seemed well suited for this. Another reason were that in any case it is better to embed errors. Finally, it was a smaller change in the behaviour... (But I confess I am not sure to recall or understand the reasons that lead to this choice!) I would be open to change the behaviour, but in any case, it would be even better to turn all rewriters into embedding errors instead of raising!

@v-gb
Copy link

v-gb commented Mar 8, 2023

Yeah, the change could still be worthwhile but my claim is that changing ppxlib gives you 95% of the value for 10% of the work. And once you have that, maybe the work to report multiple errors is still worthwhile, or maybe it complicates the code and doesn't seem worthwhile.

As for ppxlib's behavior, I have worked on a number of ppx for many years, and I cannot recall a single time where I've wanted to stop all processing. What kind of situation do you have in mind?

If your ppx claims "I'm rewriting [%foo]", it seems pretty clear to me that ppxlib can replace [%foo ...] by [%ocaml.error ...] if the ppx fails, and proceed. The rewrite is local, so the failures are also local. It's certainly the case for the errors from Located.raise_error, and most likely just true for every exception. I mean, you can exclude Break and Out_of_memory and Stack_overflow to behave marginally nicer, but that doesn't seem necessary to me, just nicer.

Similarly if your ppx is [@@deriving something], then ppxlib can add a [%%%ocaml.error ...] on failure to process something.

If your ppx is defined as a whole ast transformation, in that case, maybe it's not possible to do better than what ppxlib does, and avoiding raising would be better. But that's relatively few preprocessors.

@panglesd
Copy link
Author

panglesd commented Mar 9, 2023

I agree with you, and (if the agreement in ppxlib's maintainer is confirmed) I will redirect the work to changing ppxlib's behaviour.

(As I tried to understand the reasons leading to this choice (which I now don't agree with), I remembered that I even opened a PR implementing your suggestion! (the PR would need a complex rebase though now).)

As I mentioned, even with the change in ppxlib's behaviour, embedding can still be worthwhile in order to report multiple errors.
However, I agree that it can be quite a change in the codebase, for a smaller benefit.

So, I would like to now if you would be willing to accept such contributions? If yes, I'll close this issue and open another one, focused on "multiple error reporting". If you don't have the time resource to review, or don't think the risk of introducing new bugs is worth it, I totally understand it, and will simply close this issue.

In any case, thanks for your input!

@v-gb
Copy link

v-gb commented Mar 9, 2023

I think changing the code to embed errors when it leads to better behavior (not systematically replacing all raises) is fine, if the change is not too complicated. Here, I think it's only the code that parses the inside of %{...} that would need to be tweaked (and the tweak seems nice and local), not the code that looks for the %{.

@panglesd
Copy link
Author

Replaced by #11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system.
Projects
None yet
Development

No branches or pull requests

3 participants