Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a little compile for generating cached module parsers #1816

Merged
merged 6 commits into from
Jun 23, 2023

Conversation

jurgenvinju
Copy link
Member

@jurgenvinju jurgenvinju commented Jun 22, 2023

  • for every module file M.rsc that requires a concrete syntax parser, this PR checks if M.parsers is right there next to it. If so then it uses ParseTree::loadParsers instead of generating a parser (and loading a parser generator!).
  • to generate a bunch of .parsers file, this PR contains lang::rascal::grammars::storage::ModuleParserStorage which provides void storeParsersForModules(PathConfig pcfg). This will produce the required .parser file in the bin parameter location of the pathConfig, based on the relative file path of the rsc file in the srcs parameter of the pathConfig.
  • so to use this:
    • call storeParsersForModules once, i.e. during mvn package or just before
    • make sure the .rsc files that are loaded from the jar are copied right next to the .parsers file
    • wait and see
  • success is visible in the absence of log messages that are "loading the parser generator" modules. Also the first module loads 1000x faster and subsequent modules 20 to 30 times faster than before.
  • please give it a spin. I'm curious if this is enough to solve Prevent "bootstrapping" a full parser generator for deployed DSL implementations rascal-language-servers#267

@codecov
Copy link

codecov bot commented Jun 22, 2023

Codecov Report

Merging #1816 (b6c0b93) into main (329130b) will increase coverage by 0%.
The diff coverage is 68%.

@@           Coverage Diff           @@
##              main   #1816   +/-   ##
=======================================
  Coverage       48%     48%           
- Complexity    6045    6062   +17     
=======================================
  Files          670     670           
  Lines        58660   58668    +8     
  Branches      8546    8547    +1     
=======================================
+ Hits         28658   28674   +16     
+ Misses       27835   27823   -12     
- Partials      2167    2171    +4     
Impacted Files Coverage Δ
src/org/rascalmpl/uri/URIUtil.java 42% <62%> (+3%) ⬆️
src/org/rascalmpl/semantics/dynamic/Import.java 55% <72%> (-1%) ⬇️
...almpl/interpreter/result/SourceLocationResult.java 50% <100%> (-1%) ⬇️

... and 8 files with indirect coverage changes

@jurgenvinju jurgenvinju marked this pull request as ready for review June 22, 2023 18:02
Copy link
Member

@DavyLandman DavyLandman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in general looks good, I have some questions about the function in URIUtil.

@@ -433,4 +433,32 @@ public static ISourceLocation removeOffset(ISourceLocation prev) {
public static ISourceLocation createFromURI(String value) throws URISyntaxException {
return vf.sourceLocation(createFromEncoded(value));
}
public static ISourceLocation changeExtension(ISourceLocation location, String ext) throws URISyntaxException {
String path = location.getPath();
boolean endsWithSlash = path.endsWith(URIUtil.URI_PATH_SEPARATOR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: strange indent? mixed tab/spaces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a clone from SourceLocationResult; I'll see what I can do about that.

String path = location.getPath();
boolean endsWithSlash = path.endsWith(URIUtil.URI_PATH_SEPARATOR);
if (endsWithSlash) {
path = path.substring(0, path.length() - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why changing the extension of an entry that ends with a path seperator would then suddenly contiue doing a string operation on directory name? What's the semantics of that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a semantics for "filenames" that includes directory names. A filename is the last segment of the path list. If a URI ends with a / we interpret the characters between that slash and the previous as the "filename", because that is the name of a directory in that case. Filenames can have extensions, and so for the sake of consistency, directory names can also have extensions. From there on you get to write this kind of code to make it all consistent.

All of this was discovered some years ago while writing simple random tests for the field update feature of Rascal. If you update a file name that does not have an extension, and then ask back for the extension, (or the new filename), then you want to see the extension back (or the filename with the extension) and not "hello/.txt" or the ".txt" for the new filename, or even "", just because the URI accidentally was a directory.


if (path.length() > 1) {
int slashIndex = path.lastIndexOf(URIUtil.URI_PATH_SEPARATOR);
int index = path.substring(slashIndex).lastIndexOf('.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastIndexOf takes an second argument int fromIndex, that would make the downstream code (and this as well) a bit simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving this as-is. That code has worked for ages and is used daily. All I did was factor it out of the Result hierarchy. Later it will move to ISourceLocation implementations in vallang. Then we can have another look at optimality IMHO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I reviewed this as new code, since I didn't see a subsequent removal of the same code from somewhere else. I didn't realise it's a clone, could we add some comments to help someone that is fixing a bug do it at the two places where this code exists? Or otherwise rewrite the result side to use this function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can factor this out of SourceLocationResult now. Then there is only one copy. Next we'll move it to ISourceLocation implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nasty part of this is that URIUtil's change methods do not carry over offset, lengths and such. We have to fix that when we move them to ISourceLocation.

path = path + (!ext.startsWith(".") ? "." : "") + ext;
}
else if (!ext.isEmpty()) {
path = path.substring(0, slashIndex + index) + (!ext.startsWith(".") ? "." : "") + ext;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 4 different cases handled in this function, where for my intuition, only one of them matches the name of changeExtension. I think this function either needs a "broader" name, or some javadoc in front of it to explain what it does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I'll document it. It is the exact semantics of field updates in locations i n Rascal.

  • The update with the empty string is extension removal indeed,
  • if an extension was present it is replaced
  • if an extension was not present the new one is added
  • and that times the difference between not-ending or ending with a slash. So there are 6 cases I believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o, and then there is the two cases of not having a "." and having a "." in the replacement part. So that makes 12 cases.

This caching feature is _static_. There is no automated cache clearance.
If your grammars change, any saved `.parsers` files do not change with it.
It is advised that you programmatically execute this compiler at deployment time
to store the `.parsers` file _only_ in deployed `jar` files. That way, you can not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed :) maybe we should print a warning if the src path is the same as the binpath?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a big issue with that. Only you'd have to use a resource copier to get everything into the jar. it would help with testing the feature if you print next to the source files, since the current interpreter would pick them up immediately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I might have thought too quickly that we could detect this happening.

@DavyLandman
Copy link
Member

I'll give this a spin next week, looks good.

Is this something we could add to the rascal-maven plugin?

@jurgenvinju
Copy link
Member Author

I'll give this a spin next week, looks good.

Is this something we could add to the rascal-maven plugin?

  1. yes we could do that
  2. or you could call the function directly from the generate-sourced maven-plugin already, using some simple maven configuration kung-fu.

I'll try 2. tomorrow and see if we can include that XML in the docs. If it is too hard, I'll write another MOJO.

@jurgenvinju jurgenvinju merged commit cb35d4b into main Jun 23, 2023
7 checks passed
@jurgenvinju jurgenvinju deleted the rascal-module-parser-storage branch June 23, 2023 10:57
@jurgenvinju
Copy link
Member Author

I merged this and making a 0.32.0-RC1 release of Rascal. This is required to be able to run the tutor on the new standard library, and also helpful for experimenting with rascal-language-servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants