Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IRJ files parser as an Mlang sublibrary #222

Merged
merged 19 commits into from
Nov 29, 2023
Merged

Conversation

mdurero
Copy link
Collaborator

@mdurero mdurero commented Jul 5, 2023

Parser for DGFiP test files, AKA "fip files", ".irj", "creneau files", "autotest files"… to load data in a straightforward OCaml record.

Compatible with both actually used file formats right now, primary and corrective computation, and cleaned from unidentified format variations.

Parser is based on original Mlang interpreter parser.

To be used by Mlang interpreter, Mlang OCaml backend, or any external OCaml application.

@mdurero mdurero force-pushed the IRJ-parser-library branch 4 times, most recently from 76d5262 to 13e7160 Compare July 5, 2023 14:02
@mdurero mdurero marked this pull request as ready for review November 9, 2023 16:53
@mdurero mdurero marked this pull request as draft November 10, 2023 17:28
@denismerigoux denismerigoux mentioned this pull request Nov 21, 2023
3 tasks
@denismerigoux denismerigoux marked this pull request as ready for review November 21, 2023 12:57
@denismerigoux
Copy link
Contributor

@noeensarguet can you push your changes here so that I can review them?

@noeensarguet
Copy link
Collaborator

@noeensarguet can you push your changes here so that I can review them?

Done
This was mostly reorganizing the directories and including the appropriate dune files to have an independent irj-parser opam package - however I'm not sure that this is very satisfactory as is, we're definitely reaching the limits of flattening the directory structure with the stanza (include_subdirs unqualified)

@denismerigoux
Copy link
Contributor

denismerigoux commented Nov 22, 2023

So what you did here is pull all the dependencies of parser_utils.ml into irj_parser but I don't think we can merge this as it destructures the architecture of Mlang completely. The files :

  • src/mlang/m_frontend/mast.ml
  • src/mlang/utils/cli.ml
  • src/mlang/utils/cli.mli
  • src/mlang/utils/pos.ml
  • src/mlang/utils/pos.mli

Should remain where they are, hence irj_parser should not depend on them. Hence you should refactor the code of irj_parser/{main.ml, irj_ast.ml, irj_file.ml, irj_parser.mly} to not depend on anything in the list of files above, so that irj_parser can really be a standalone library that does not depend on Mlang. For instance, instead of throwing error messages with Cli.raise_error, the irj_parser library should throw an OCaml exception it defines. It should also embark its mini-AST of the test files.

@Keryan-dev
Copy link
Collaborator

This is a bit of a tangent.

I agree that depending on MAst is too much, and that errors should be handled in a more library-friendly way (i.e. exceptions or result types). Pos type and functions could be reused but a different minimal version should be enough for the current purpose.

This aside, I would strongly advocate for a much more librarified architecture for Mlang as we may use some of it to build other helpful tools.

@denismerigoux
Copy link
Contributor

I agree with you Keryan, we should move to an Mlang architecture where basically each intermediate representation is its own separate library with clear dependence flow between them. That being said I think this PR is a first step toward this : if when this PR is merged irj_parser becomes its own separate library that doesn't depend on the rest of Mlang, we're progressing toward a more modular architecture no?

@Keryan-dev
Copy link
Collaborator

Yes of course. What I meant is that I would not be shocked to see some adjacent libraries depend on part of the compiler, this one included.

@denismerigoux
Copy link
Contributor

Oh yes of course. For instance we'll surely have a utils library that all the intermediate representations, and maybe irj_parser if we want, could depend on.

@noeensarguet
Copy link
Collaborator

Should be better. Still quite unsatisfied with the (wrapped false) stanza in the irj_parser library, but this was necessary to open Irj_ast in module Test_interpreter. Also opened the definition of type Pos.t to be able to make it compatible with the type t in Irj_ast

Copy link
Contributor

@denismerigoux denismerigoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @noeensarguet, thanks very much for the corrections! We're getting there, but there are a few cleaning items that should be done before merging this. This is mostly nitpicking but let's try to make the code as beautiful as possible to set the tone for your beginnings on Mlang :)

irj_parser.opam Outdated
synopsis: "Parser for the IRJ tests"
description:
"This parser is aimed for the tests used by la DGFiP to test the calculation of the French income tax"
maintainer: ["Mathieu Durero" " Noé Ensarguet"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha ha why did you remove yourself and Mathieu and put me as the maintainer ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha
I had a separate dune-project for irj_parser in the first commit, until I realised (thanks to @Keryan-dev) that you could declare both packages in the same dune-project file (and so with the same maintainers etc.)

(env
(static
(ocamlopt_flags
(-O3 -ccopt -static))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want -O3 ? That is pretty agressive

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stanza is there in a lot of places and only affect static profile builds (only uses for releases IIRC).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking, are we leaving this as is? I changed the rest following your remarks (thanks for those!) and it's ready to be pushed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you can leave this as is :) Please push!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there this empty file here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought we might want an executable at first and didn't delete it afterwards ^^'

(env
(static
(ocamlopt_flags
(-O3 -ccopt -static))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto -O3

let mk_position sloc =
{ pos_filename = (fst sloc).Lexing.pos_fname; pos_loc = sloc }

exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to duplicate the full type here as you won't use all its components in irj_parser. You can simply declare something like and later do raise TestParsingError(...).

exception TestParsingError of string * pos

in
let f =
try Irj_parser.irj_file Irj_lexer.token filebuf with
| StructuredError e ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can catch TestParsingError raised in the parser

| Irj_parser.Error ->
close_in input;
raise
(StructuredError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here raise a TestParsingError

%{ open Irj_ast

let error (sp, ep) msg =
raise (StructuredError ("Parse error : " ^ msg, [ (None, mk_position (sp, ep)) ], None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can raise a TestParsingError ...

in
close_in input;
f
let convert_pos (sloc : Irj_ast.t) : Pos.t =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes perfect!

@@ -18,7 +18,9 @@

(** {2 Source code position} *)

type t
type t = { pos_filename : string; pos_loc : Lexing.position * Lexing.position }
(* opened this type. Not very satisfactory but necessary to enforce the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of opening the type you can just use the make_position function below in test_interpreter.ml/convert_pos :)

@Keryan-dev
Copy link
Collaborator

I don't think you should need a (wrapped false) to be able to use those module elsewhere. Without it you'd have an additional indirection through the top module Irj_parser : open Irj_parser.Irj_ast didn't do the trick ?

@noeensarguet
Copy link
Collaborator

@Keryan-dev tried what you said and a couple other things to address the (wrapped false) - maybe I didn't quite wrap my head around this. If I take the stanza out, the only module I can open is Irj_file since it's the top module of the library Irj_parser (and open Irj_parser yields an Unbound module error). Neither open Irj_parser.Irj_ast or open Irj_file.Irj_ast are recognised as bound modules

Besides, if it is possible to add an indirection with open Irj_parser.Irj_ast, I don't quite see what the purpose of the stanza (wrapped _) would be in the first place - so if adding some indirection is an option, I would be down to have some clarifications about that :)

@denismerigoux denismerigoux merged commit d7fa9bb into master Nov 29, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants