Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for returning parsed AST with potential errors #219

Open
zbraniecki opened this issue Feb 6, 2021 · 11 comments
Open

API for returning parsed AST with potential errors #219

zbraniecki opened this issue Feb 6, 2021 · 11 comments
Labels
crate:fluent-syntax Issues related to fluent-syntax crate design-decision Issues pending design decision, usually Rust specific question

Comments

@zbraniecki
Copy link
Collaborator

Since the inception of fluent-rs we've been facing a dilemma of fitting the concept of parser that recovers from errors into Rust Result model.
In all those years there has been very little progress or consolation in the approach to such paradigm, and as we advance to 1.0, I'd like to make a last effort to gather arguments and decide on the API approach we're going to use.
Of course we can revisit such decision in 2.0, but I'd like to minimize the chance that we'll want to.

Current Design

fn parse(input: &str) -> Result<AST, (AST, Vec<Error>)>

This API is clean, fits into the Result model and in my opinion strikes the right balance of communicating the necessity to plan for errors to the consumer, while returning the AST in both cases.

Unfortunately, it also has some limitations, that are either real or not:

  • The "error" is not really an error. It's a Vec of errors. See Replace Vec<ParserError> with opaque Error type. #176 for the impact on API ergonomics
  • It requires a new Vec per resource, while fairly often a bunch of resources are parsed together.
  • For some users it leads to the following consumption snippet:
let (ast, errors) = match parse(input) {
    Ok(ast) => (ast, None),
    Err((ast, errors)) => (ast, Some(errors))
}

Design ideas

When scooping around, I encountered several possible alternative designs:

a) fn parse(input: &str) -> (AST, Vec<Error>);
b) fn parse(input: &str, errors: &mut Vec<Error>) -> AST;
c) fn parse(input: &str, errors: Option<&mut Vec<Error>>) -> AST;
d) fn parse<T: ErrorHandler>(input: &str, errors: T) -> AST;

I see pros and cons to each of them.
a) This one hides whether parsing was successful. You need to check output.1.is_empty(). I think it's a bit unnatural and doesn't allow for bubbling up errors.
b) This one feels C-like and forces the user to construct a mutable Vec to pass it and than again is_empty(). The value is that you need only one Vec for a loop of parses.
c) This one allows the user to ignore errors, which I'm concerned about. Parsing errors are errors, and should almost never be ignored. Designing API around an easy way to hide them, and not even report them, feels like it may cascade into lowering the quality of the ecosystem.
d) This is some variant of (b) to me, were we allow custom ways to handle error handling. I use that model in multiple higher-level Fluent APIs where the errors are closer to the user and the ErrorHandler is mostly some form of console.error from JS. I'm a bit reluctant to do it here because syntax errors feel like they should be explicitly handled, but it is arbitrary (compared to Fluent resolver errors, missing files when looking for fallbacks etc.) and I can see an argument that we should handle it similarly.

Use of traits such as FromStr or TryFrom

Finally currently we have a function parse, but depending on the decisions in this thread, we could instead have impl FromStr for ast::Resource or impl TryFrom<&str> for ast::Resource.

This would fit nicely into Rust API design, but wouldn't work for signatures that take error handlers or mutable vec of errors, and only work if we return a Result.

Conclusion

I think that the choice is mostly arbitrary and I can't see any argument that would heavily favor one over others. There will be customers who would work well with (d) (Firefox runtime being one! If we encounter parser error, we just want to report it in console!), and ones that cater to (c) (user has no way of handling errors, so they just want to get whatever AST was built), and users who'd naturally prefer the current model of (a), where the current model feels more "Rusty", and (a) is what users often end up doing with it.

When thinking about if Fluent parser fits into any category, I keep coming back to CSS parsers, which also are lenient, accumulate errors, but return AST. There are probably other classes, but I also keep thinking that this is not a canonical case of "perform an operation, give me the result, and here's a reporter for any errors" like (d) suggests. I think parsers are different in that they are meant to convert input to output, rather than perform some side-effect operations which may fail.

Ultimately tho, the choice is arbitrary, and I think whatever we chose here will propagate to analogous scenarios, so I'd like to ask the wider Rust community for feedback. There may be arguments I haven't consider that would make one of the choices (or yet another one!) a clear fit for Rust and Fluent Parser.

@zbraniecki zbraniecki added question crate:fluent-syntax Issues related to fluent-syntax crate design-decision Issues pending design decision, usually Rust specific labels Feb 6, 2021
@exrook
Copy link

exrook commented Feb 6, 2021

How about:

impl ErrorHandler for Vec<Error> {
    // ... 
}
impl ErrorHandler for &mut Vec<Error> {
    // ... 
}
impl ErrorHandler for () {
    // do nothing
}

fn parse<T: ErrorHandler>(input: &str, errors: T) -> Result<AST, (AST, T)>;

This way if you don't care about errors you can just do:

let ast = parse(mystr, ()).unwrap_or_else(|e|e.0);

It's still obvious that you are ignoring the presence of errors while still providing flexibility on how to handle multiple errors.

let ast = parse(mystr, vec![]).unwrap(); // panics with the errors in the panic message
let buf = vec![];
let ast = parse(mystr, &mut buf).unwrap_or_else(|e|e.0);
let ast2 = parse(mystr2, &mut buf).unwrap_or_else(|e|e.0);

// handle errors (if any) at end, also reuses buffer

@zbraniecki
Copy link
Collaborator Author

@exrook I think I like the diretion!

Do you know of other APIs where a return value contains the same value as an input (T here)?

let ast = parse(mystr, vec![]).unwrap(); // panics with the errors in the panic message

I'll need to prototype that, but that seems like the readability of the panic message would depend on the implementation of the ErrorHandler, right?

@tanriol
Copy link

tanriol commented Feb 7, 2021

Do the consumers always need the AST and errors? May it make sense to split the interface into two separate functions?

struct Errors(Vec<Error>);
impl std::error::Error for Errors { ... }

fn parse(input: &str) -> Result<AST, Errors>; // The default for consumers who expect the resource to be correct
fn parse_despite_errors(input: &str, errors: &mut Errors) -> AST; // A separate parser for consumers who need the best-guess AST and error list

@Pike
Copy link
Contributor

Pike commented Feb 7, 2021

From my perspective, Fluent parsing is always successful. That's also how the spec is built, and both js parsers and the python parser implement it like that. Even if one shoveled a jpeg file into the Fluent parser, it should successfully parse that content.

In the same vein, the consumers of the parsing API that I worked on all ignored error entries, aside of linter functionalities.

Thus, I'd just return the resource, with Junk productions being part of the body.

@kornelski
Copy link

kornelski commented Feb 7, 2021

I've used the approach b in the past, but called them Warnings. It's not too bad, and works if you need to combine errors from multiple calls. OTOH it is borrowed exclusively, so for parallel processing it needs extra gymnastics with temporary vecs.

A trait for a error handling delegate is also nice, because then you can ask the delegate whether to abort or continue, and leave details of collecting errors to it.

@exrook The trait is a neat idea, but you can't have impl ErrorHandler for Vec<Error> specifically, because if the caller passes ownership of the vec, they won't get it back.

@zbraniecki
Copy link
Collaborator Author

Do the consumers always need the AST and errors?

From the perspective of designing an API that encourages proper use - in most cases they do. Errors should not be ignored and most apps/libraries that would silence all errors would likely result in a growing body of erroneous FTL resources and code flows that contain silenced errors.

A good model is for tooling to throw errors diligently, and for runtime to throw in automation, report in debug mode and silence in production.

May it make sense to split the interface into two separate functions?

My concern here is that this would creep throughout the whole Fluent System. Almost every method may have errors but the output is meaningful even with them. If I were to double the number of methods and have parse and parse_ignore_errors, I'd also need add_resource and add_resource_ignore_errors and add_function and add_function_ignore_errors and format_pattern and format_pattern_ignore_errors - I think it doesn't scale and is pretty inelegant.
I'd love to find a better solution.

I'm exploring the idea of shifting everything in Fluent to the ErrorHandler model, which I agree is very flexible and allows for many different scenarios to be handled as first-class citizens.

One limitation I see out of that model is that it takes a bit more effort to handle errors per-call. Here's what I mean by it:

let error_handler = ErrorHandler::new();

let resources = sources.iter().map(|s| {
    let res = FluentResource::from_str(s, &error_handler);
    if error_handler.has_errors() {
        error_handler.do_something();
    }
    error_handler.reset();
    res
});

let mut bundle = FluentBundle::new();

for res in resources {
    bundle.add_resource(res, &error_handler);
    if error_handler.has_errors() {
        error_handler.do_something();
    }
    error_handler.reset();
}

for func in functions {
    bundle.add_function(func, &error_handler);
    if error_handler.has_errors() {
        error_handler.do_something();
    }
    error_handler.reset();
}

for key in l10n_keys {
    let msg = bundle.get_message(key);
    if let Some(msg) = bundle.get_message(key) {
        if let Some(value) = msg.value {
            let v = bundle.format_pattern(value, None, &error_handler);
            if error_handler.has_errors() {
                error_handler.do_something();
            }
            error_handler.reset();
        }
        
         for attr in msg.attributes {
            let v = bundle.format_pattern(attr.value, None, &error_handler);
            if error_handler.has_errors() {
                error_handler.do_something();
            }
            error_handler.reset();
         }
    } else {
        // handle missing message
    }
}

That's more/less core flow of the mid-level API. The flexibility that comes with the error_handler is awesome and it clearly would allow for many different scenarios including silencing, handling all accumulated errors together, or per loop, or per call.
I also know one can factor out handling to their liking, but I also am concerned about coalescing all errors from different parts of the flow into a single handler and it feels like a pretty untested API design in the Rust land - I have not seen anything similar.

How does it look to everyone here? Does it feel well fit for Rust and a good compromise?

@zbraniecki
Copy link
Collaborator Author

zbraniecki commented Feb 7, 2021

Another thing that I believe this model hides is the structure of errors. In the current model I have ParseError, ResolveError, BundleError and they contain variants for types of errors that can be encountered in a particular operation.
I do have FluentError which can From for each subtype, but it seems that such ErrorReporter would have to either take dyn std::Error or have separate implementations for each type, or take FluentError.

If it was lost, my worry is that then you end up with:

let v = bundle.format_pattern(attr.value, None, &error_handler);
if error_handler.has_errors() {
    for err in error_handler.errors() {
        match err {
            FluentError::ResolveError(err) => {},
            FluentError::ParseError(_) => unreachable!(),
            FluentError::BundleError(_) => unreachable!(),
        }
    }
}

and the idea that you have an operation that can only return parse errors, or resolve errors, but you need to match on a higher level enum and then filter out the variants that this operation cannot return, feels quirky and inelegant.

Thoughts?

@kornelski
Copy link

You can have the handler receive enum with only cases that can happen. Basically, a duplicate of FluentError, but without irrelevant variants.

@zbraniecki
Copy link
Collaborator Author

Then I'd have ParserErrorHandler, ResolverErrorHandler traits. And one could implement them on a single struct that handles all types it wants to. Does that sound good to everyone?

@zbraniecki zbraniecki added this to the 0.16 milestone Feb 9, 2021
@SillyFreak
Copy link

I just saw the mention on Matrix - most of the perspective I could give has already been added by @exrook in the first reply - that is exactly as I imagined (d) to be used: instead of having Vec<Error> or None as the two possible handling strategies, it could be freely chosen by the consumer, including a direct console logging approach or streaming somewhere else. But accumulating in a Vec or ignoring are the two main cases I'd say. So I see (d) as a more flexible variant of (c).

I see conflicting views here on whether the errors are optional, but generally I'd lean they should be handled. In that case @exrook's variant of (d) (fn parse<T: ErrorHandler>(input: &str, errors: T) -> Result<AST, (AST, T)>;) looks very promising. For example, take part of your example:

let error_handler = ErrorHandler::new();

let resources = sources.iter().map(|s| {
    let res = FluentResource::from_str(s, &error_handler);
    if error_handler.has_errors() {
        error_handler.do_something();
    }
    error_handler.reset();
    res
});

which becomes:

let errors = Vec::new();

let resources = sources.iter().map(|s| {
    FluentResource::from_str(s, &mut errors).unwrap_or_else(|(res, errors)| {
        // no need to check, there were indeed errors
        do_something(errors);
        errors.clear();
        res
    })
});

or, if you don't care about allocating new vectors for every resource (that is, if there are errors, because an empty vec doesn't allocate), you could work similar to the current design:

let resources = sources.iter().map(|s| {
    FluentResource::from_str(s, Vec::new()).unwrap_or_else(|(res, errors)| {
        // no need to check, there were indeed errors
        do_something(&mut errors);
        res
    })
});

@alerque alerque removed this from the 0.16 milestone May 6, 2024
@alerque
Copy link
Collaborator

alerque commented May 7, 2024

Just a cross reference link: there is definitely a little bit of overlap with #270 here. Not the same problem, but bound to have some overlap with wanting to show source locations of various error conditions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate:fluent-syntax Issues related to fluent-syntax crate design-decision Issues pending design decision, usually Rust specific question
Projects
None yet
Development

No branches or pull requests

7 participants