Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a little compile for generating cached module parsers #1816

Merged
merged 6 commits into from
Jun 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 1 addition & 24 deletions src/org/rascalmpl/interpreter/result/SourceLocationResult.java
Original file line number Diff line number Diff line change
Expand Up @@ -509,31 +509,8 @@ else if (name.equals("extension")) {
if (!replType.isString()) {
throw new UnexpectedType(getTypeFactory().stringType(), replType, ctx.getCurrentAST());
}
String ext = newStringValue;

boolean endsWithSlash = path.endsWith(URIUtil.URI_PATH_SEPARATOR);
if (endsWithSlash) {
path = path.substring(0, path.length() - 1);
}

if (path.length() > 1) {
int slashIndex = path.lastIndexOf(URIUtil.URI_PATH_SEPARATOR);
int index = path.substring(slashIndex).lastIndexOf('.');

if (index == -1 && !ext.isEmpty()) {
path = path + (!ext.startsWith(".") ? "." : "") + ext;
}
else if (!ext.isEmpty()) {
path = path.substring(0, slashIndex + index) + (!ext.startsWith(".") ? "." : "") + ext;
}
else if (index != -1) {
path = path.substring(0, slashIndex + index);
}

if (endsWithSlash) {
path = path + URIUtil.URI_PATH_SEPARATOR;
}
}
path = URIUtil.changeExtension(loc, newStringValue).getPath();
uriPartChanged = true;
}
else if (name.equals("top")) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
@synopsis{Functionality for caching module parsers}
@description{
The Rascal interpreter can take a lot of time while loading modules.
In particular in deployed situations (Eclipse and VScode plugins), the
time it takes to load the parser generator for generating the parsers
which are required for analyzing concrete syntax fragments is prohibitive (20s).
This means that the first syntax highlighting sometimes can only appear
after more than 20s after loading an extension (VScode) or plugin (Eclipse).

This "compiler" takes any number of Rascal modules and extracts a grammar
for each of them, in order to use the ((Library::ParseTree)) module's
functions ((saveParsers)) on them respectively to store each parser
in a `.parsers` file.

After that the Rascal interpreter has a special mode for using ((loadParsers))
while importing a new module if a cache `.parsers` file is present next to
the `.rsc` respective file.
}
@benefits{
* loading modules without having to first load and use a parser generator can be up 1000 times faster.
}
@pitfalls{
:::warning
This caching feature is _static_. There is no automated cache clearance.
If your grammars change, any saved `.parsers` files do not change with it.
It is advised that you programmatically execute this compiler at deployment time
to store the `.parsers` file _only_ in deployed `jar` files. That way, you can not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed :) maybe we should print a warning if the src path is the same as the binpath?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a big issue with that. Only you'd have to use a resource copier to get everything into the jar. it would help with testing the feature if you print next to the source files, since the current interpreter would pick them up immediately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I might have thought too quickly that we could detect this happening.

be bitten by a concrete syntax parser that is out of date at development time.
:::
}
@license{
Copyright (c) 2009-2023 NWO-I CWI
All rights reserved. This program and the accompanying materials
are made available under the terms of the Eclipse Public License v1.0
which accompanies this distribution, and is available at
http://www.eclipse.org/legal/epl-v10.html
}
@contributor{Jurgen J. Vinju - [email protected] - CWI}
@bootstrapParser
module lang::rascal::grammar::storage::ModuleParserStorage

import lang::rascal::grammar::definition::Modules;
import lang::rascal::\syntax::Rascal;
import util::Reflective;
import util::FileSystem;
import Location;
import ParseTree;
import Grammar;
import IO;

@synopsis{For all modules in pcfg.srcs this will produce a `.parsers` stored parser capable of parsing concrete syntax fragment in said module.}
@description{
Use ((loadParsers)) to retrieve the parsers stored by this function. In particular the
Rascal interpreter will use this instead of spinning up its own parser generator.
}
@benefits{
* the single pathConfig parameter makes it easy to wire this function into Maven scripts (see generate-sources maven plugin)
* time spent here generating parsers, once, does not have to be spent while running IDE plugins, many times.
}
@pitfalls{
* this compiler has very weak error reporting. it just crashes with stacktraces in case of trouble.
* for large projects running this can take a few minutes; it is slower than importing the same modules in the interpreter.
* this compiler assumes the grammars are all correct and can be used to parse the concrete syntax fragments in each respective module.
* this compiler may have slight differences in semantics with the way the interpreter composes grammars for modules, since
it is implemented differently. However, no such issues are currently known.
}
@examples{
Typically you would call the generate-sources MOJO from the rascal-maven-plugin, in `pom.xml`, like so:

```xml
<plugin>
<groupId>org.rascalmpl</groupId>
<artifactId>rascal-maven-plugin</artifactId>
<version>0.14.6</version>
<configuration>
<mainModule>YourMainModule</mainModule>
</configuration>
<executions>
<execution>
<id>it-compile</id>
<phase>generate-test-sources</phase>
<goals>
<goal>generate-sources</goal>
</goals>
</execution>
</executions>
</plugin>
```

And you'd write this module to make it work:

```rascal
module YourMainModule

import util::Reflective;
import lang::rascal::grammar::storage::ModuleParserStorage;

int main(list[str] args) {
pcfg = getProjectPathConfig(|project://yourProject|);
storeParsersForModules(pcfg);
}
```
}
void storeParsersForModules(PathConfig pcfg) {
storeParsersForModules({*find(src, "rsc") | src <- pcfg.srcs, bprintln("Crawling <src>")}, pcfg);
}

void storeParsersForModules(set[loc] moduleFiles, PathConfig pcfg) {
storeParsersForModules({parseModule(m) | m <- moduleFiles, bprintln("Loading <m>")}, pcfg);
}

void storeParsersForModules(set[Module] modules, PathConfig pcfg) {
for (m <- modules) {
storeParserForModule("<m.header.name>", m@\loc, modules, pcfg);
}
}

void storeParserForModule(str main, loc file, set[Module] modules, PathConfig pcfg) {
// this has to be done from scratch due to different ways combining layout definitions
// with import and extend. Each main module has a different grammar because of this.
def = modules2definition(main, modules);

// here the layout semantics comes really into action
gr = fuse(def);

// find a file in the target folder to write to
target = pcfg.bin + relativize(pcfg.srcs, file)[extension="parsers"].path;

println("Generating parser for <main> at <target>");
if (type[Tree] rt := type(sort("Tree"), gr.rules)) {
storeParsers(rt, target);
}
}
51 changes: 33 additions & 18 deletions src/org/rascalmpl/semantics/dynamic/Import.java
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@
import io.usethesource.vallang.IValue;
import io.usethesource.vallang.IValueFactory;
import io.usethesource.vallang.type.Type;
import io.usethesource.vallang.type.TypeFactory;

public abstract class Import {

Expand Down Expand Up @@ -394,7 +395,7 @@
current.setSyntaxDefined(current.definesSyntax() || module.definesSyntax());
}

public static ITree parseModuleAndFragments(char[] data, ISourceLocation location, IEvaluator<Result<IValue>> eval){
public static ITree parseModuleAndFragments(char[] data, ISourceLocation location, IEvaluator<Result<IValue>> eval) {
eval.__setInterrupt(false);
IActionExecutor<ITree> actions = new NoActionExecutor();

Expand Down Expand Up @@ -454,24 +455,38 @@

// parse the embedded concrete syntax fragments of the current module
ITree result = tree;
if (!eval.getHeap().isBootstrapper() && (needBootstrapParser(data) || (env.definesSyntax() && containsBackTick(data, 0)))) {
RascalFunctionValueFactory vf = eval.getFunctionValueFactory();
IFunction parsers = null;

if (env.getBootstrap()) {
parsers = vf.bootstrapParsers();
}
else {
IConstructor dummy = TreeAdapter.getType(tree); // I just need _any_ ok non-terminal
IMap syntaxDefinition = env.getSyntaxDefinition();
IMap grammar = (IMap) eval.getParserGenerator().getGrammarFromModules(eval.getMonitor(),env.getName(), syntaxDefinition).get("rules");
IConstructor reifiedType = vf.reifiedType(dummy, grammar);
parsers = vf.parsers(reifiedType, vf.bool(false), vf.bool(false), vf.bool(false), vf.set());
}

result = parseFragments(vf, eval.getMonitor(), parsers, tree, location, env);
}
try {
if (!eval.getHeap().isBootstrapper() && (needBootstrapParser(data) || (env.definesSyntax() && containsBackTick(data, 0)))) {
RascalFunctionValueFactory vf = eval.getFunctionValueFactory();
URIResolverRegistry reg = URIResolverRegistry.getInstance();
ISourceLocation parserCacheFile = URIUtil.changeExtension(env.getLocation(), "parsers");

IFunction parsers = null;

if (env.getBootstrap()) {
// no need to generste a parser for the Rascal language itself
parsers = vf.bootstrapParsers();
}
else if (reg.exists(parserCacheFile)) {
// if we cached a ModuleFile.parsers file, we will use the parser from that (typically after deployment time)
parsers = vf.loadParsers(parserCacheFile, vf.bool(false),vf.bool(false),vf.bool(false), vf.set());

Check warning on line 472 in src/org/rascalmpl/semantics/dynamic/Import.java

View check run for this annotation

Codecov / codecov/patch

src/org/rascalmpl/semantics/dynamic/Import.java#L472

Added line #L472 was not covered by tests
}
else {
// otherwise we have to generate a fresh parser for this module now
IConstructor dummy = TreeAdapter.getType(tree); // I just need _any_ ok non-terminal
IMap syntaxDefinition = env.getSyntaxDefinition();
IMap grammar = (IMap) eval.getParserGenerator().getGrammarFromModules(eval.getMonitor(),env.getName(), syntaxDefinition).get("rules");
IConstructor reifiedType = vf.reifiedType(dummy, grammar);
parsers = vf.parsers(reifiedType, vf.bool(false), vf.bool(false), vf.bool(false), vf.set());
}

result = parseFragments(vf, eval.getMonitor(), parsers, tree, location, env);
}
}
catch (URISyntaxException | ClassNotFoundException | IOException e) {
eval.warning("reusing parsers failed during module import: " + e.getMessage(), env.getLocation());

Check warning on line 487 in src/org/rascalmpl/semantics/dynamic/Import.java

View check run for this annotation

Codecov / codecov/patch

src/org/rascalmpl/semantics/dynamic/Import.java#L486-L487

Added lines #L486 - L487 were not covered by tests
}

return result;
}

Expand Down
28 changes: 28 additions & 0 deletions src/org/rascalmpl/uri/URIUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -433,4 +433,32 @@
public static ISourceLocation createFromURI(String value) throws URISyntaxException {
return vf.sourceLocation(createFromEncoded(value));
}
public static ISourceLocation changeExtension(ISourceLocation location, String ext) throws URISyntaxException {
String path = location.getPath();
boolean endsWithSlash = path.endsWith(URIUtil.URI_PATH_SEPARATOR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: strange indent? mixed tab/spaces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a clone from SourceLocationResult; I'll see what I can do about that.

if (endsWithSlash) {
path = path.substring(0, path.length() - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why changing the extension of an entry that ends with a path seperator would then suddenly contiue doing a string operation on directory name? What's the semantics of that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a semantics for "filenames" that includes directory names. A filename is the last segment of the path list. If a URI ends with a / we interpret the characters between that slash and the previous as the "filename", because that is the name of a directory in that case. Filenames can have extensions, and so for the sake of consistency, directory names can also have extensions. From there on you get to write this kind of code to make it all consistent.

All of this was discovered some years ago while writing simple random tests for the field update feature of Rascal. If you update a file name that does not have an extension, and then ask back for the extension, (or the new filename), then you want to see the extension back (or the filename with the extension) and not "hello/.txt" or the ".txt" for the new filename, or even "", just because the URI accidentally was a directory.

}

if (path.length() > 1) {
int slashIndex = path.lastIndexOf(URIUtil.URI_PATH_SEPARATOR);
int index = path.substring(slashIndex).lastIndexOf('.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastIndexOf takes an second argument int fromIndex, that would make the downstream code (and this as well) a bit simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving this as-is. That code has worked for ages and is used daily. All I did was factor it out of the Result hierarchy. Later it will move to ISourceLocation implementations in vallang. Then we can have another look at optimality IMHO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I reviewed this as new code, since I didn't see a subsequent removal of the same code from somewhere else. I didn't realise it's a clone, could we add some comments to help someone that is fixing a bug do it at the two places where this code exists? Or otherwise rewrite the result side to use this function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can factor this out of SourceLocationResult now. Then there is only one copy. Next we'll move it to ISourceLocation implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nasty part of this is that URIUtil's change methods do not carry over offset, lengths and such. We have to fix that when we move them to ISourceLocation.


if (index == -1 && !ext.isEmpty()) {
path = path + (!ext.startsWith(".") ? "." : "") + ext;
}
else if (!ext.isEmpty()) {
path = path.substring(0, slashIndex + index) + (!ext.startsWith(".") ? "." : "") + ext;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 4 different cases handled in this function, where for my intuition, only one of them matches the name of changeExtension. I think this function either needs a "broader" name, or some javadoc in front of it to explain what it does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I'll document it. It is the exact semantics of field updates in locations i n Rascal.

  • The update with the empty string is extension removal indeed,
  • if an extension was present it is replaced
  • if an extension was not present the new one is added
  • and that times the difference between not-ending or ending with a slash. So there are 6 cases I believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o, and then there is the two cases of not having a "." and having a "." in the replacement part. So that makes 12 cases.

}
else if (index != -1) {
path = path.substring(0, slashIndex + index);

Check warning on line 454 in src/org/rascalmpl/uri/URIUtil.java

View check run for this annotation

Codecov / codecov/patch

src/org/rascalmpl/uri/URIUtil.java#L454

Added line #L454 was not covered by tests
}

if (endsWithSlash) {
path = path + URIUtil.URI_PATH_SEPARATOR;
}
}

return URIUtil.changePath(location, path);
}
}
Loading