-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a little compile for generating cached module parsers #1816
Changes from 4 commits
278ac1e
9d39d8c
8bc2aad
9ddbda8
5c06b14
b6c0b93
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
@synopsis{Functionality for caching module parsers} | ||
@description{ | ||
The Rascal interpreter can take a lot of time while loading modules. | ||
In particular in deployed situations (Eclipse and VScode plugins), the | ||
time it takes to load the parser generator for generating the parsers | ||
which are required for analyzing concrete syntax fragments is prohibitive (20s). | ||
This means that the first syntax highlighting sometimes can only appear | ||
after more than 20s after loading an extension (VScode) or plugin (Eclipse). | ||
|
||
This "compiler" takes any number of Rascal modules and extracts a grammar | ||
for each of them, in order to use the ((Library::ParseTree)) module's | ||
functions ((saveParsers)) on them respectively to store each parser | ||
in a `.parsers` file. | ||
|
||
After that the Rascal interpreter has a special mode for using ((loadParsers)) | ||
while importing a new module if a cache `.parsers` file is present next to | ||
the `.rsc` respective file. | ||
} | ||
@benefits{ | ||
* loading modules without having to first load and use a parser generator can be up 1000 times faster. | ||
} | ||
@pitfalls{ | ||
:::warning | ||
This caching feature is _static_. There is no automated cache clearance. | ||
If your grammars change, any saved `.parsers` files do not change with it. | ||
It is advised that you programmatically execute this compiler at deployment time | ||
to store the `.parsers` file _only_ in deployed `jar` files. That way, you can not | ||
be bitten by a concrete syntax parser that is out of date at development time. | ||
::: | ||
} | ||
@license{ | ||
Copyright (c) 2009-2023 NWO-I CWI | ||
All rights reserved. This program and the accompanying materials | ||
are made available under the terms of the Eclipse Public License v1.0 | ||
which accompanies this distribution, and is available at | ||
http://www.eclipse.org/legal/epl-v10.html | ||
} | ||
@contributor{Jurgen J. Vinju - [email protected] - CWI} | ||
@bootstrapParser | ||
module lang::rascal::grammar::storage::ModuleParserStorage | ||
|
||
import lang::rascal::grammar::definition::Modules; | ||
import lang::rascal::\syntax::Rascal; | ||
import util::Reflective; | ||
import util::FileSystem; | ||
import Location; | ||
import ParseTree; | ||
import Grammar; | ||
import IO; | ||
|
||
@synopsis{For all modules in pcfg.srcs this will produce a `.parsers` stored parser capable of parsing concrete syntax fragment in said module.} | ||
@description{ | ||
Use ((loadParsers)) to retrieve the parsers stored by this function. In particular the | ||
Rascal interpreter will use this instead of spinning up its own parser generator. | ||
} | ||
@benefits{ | ||
* the single pathConfig parameter makes it easy to wire this function into Maven scripts (see generate-sources maven plugin) | ||
* time spent here generating parsers, once, does not have to be spent while running IDE plugins, many times. | ||
} | ||
@pitfalls{ | ||
* this compiler has very weak error reporting. it just crashes with stacktraces in case of trouble. | ||
* for large projects running this can take a few minutes; it is slower than importing the same modules in the interpreter. | ||
* this compiler assumes the grammars are all correct and can be used to parse the concrete syntax fragments in each respective module. | ||
* this compiler may have slight differences in semantics with the way the interpreter composes grammars for modules, since | ||
it is implemented differently. However, no such issues are currently known. | ||
} | ||
void storeParsersForModules(PathConfig pcfg) { | ||
storeParsersForModules({*find(src, "rsc") | src <- pcfg.srcs, bprintln("Crawling <src>")}, pcfg); | ||
} | ||
|
||
void storeParsersForModules(set[loc] moduleFiles, PathConfig pcfg) { | ||
storeParsersForModules({parseModule(m) | m <- moduleFiles, bprintln("Loading <m>")}, pcfg); | ||
} | ||
|
||
void storeParsersForModules(set[Module] modules, PathConfig pcfg) { | ||
for (m <- modules) { | ||
storeParserForModule("<m.header.name>", m@\loc, modules, pcfg); | ||
} | ||
} | ||
|
||
void storeParserForModule(str main, loc file, set[Module] modules, PathConfig pcfg) { | ||
// this has to be done from scratch due to different ways combining layout definitions | ||
// with import and extend. Each main module has a different grammar because of this. | ||
def = modules2definition(main, modules); | ||
|
||
// here the layout semantics comes really into action | ||
gr = fuse(def); | ||
|
||
// find a file in the target folder to write to | ||
target = pcfg.bin + relativize(pcfg.srcs, file)[extension="parsers"].path; | ||
|
||
println("Generating parser for <main> at <target>"); | ||
if (type[Tree] rt := type(sort("Tree"), gr.rules)) { | ||
storeParsers(rt, target); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -433,4 +433,32 @@ | |
public static ISourceLocation createFromURI(String value) throws URISyntaxException { | ||
return vf.sourceLocation(createFromEncoded(value)); | ||
} | ||
public static ISourceLocation changeExtension(ISourceLocation location, String ext) throws URISyntaxException { | ||
String path = location.getPath(); | ||
boolean endsWithSlash = path.endsWith(URIUtil.URI_PATH_SEPARATOR); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: strange indent? mixed tab/spaces? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's a clone from SourceLocationResult; I'll see what I can do about that. |
||
if (endsWithSlash) { | ||
path = path.substring(0, path.length() - 1); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit confused why changing the extension of an entry that ends with a path seperator would then suddenly contiue doing a string operation on directory name? What's the semantics of that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have a semantics for "filenames" that includes directory names. A filename is the last segment of the path list. If a URI ends with a All of this was discovered some years ago while writing simple random tests for the field update feature of Rascal. If you update a file name that does not have an extension, and then ask back for the extension, (or the new filename), then you want to see the extension back (or the filename with the extension) and not "hello/.txt" or the ".txt" for the new filename, or even "", just because the URI accidentally was a directory. |
||
} | ||
|
||
if (path.length() > 1) { | ||
int slashIndex = path.lastIndexOf(URIUtil.URI_PATH_SEPARATOR); | ||
int index = path.substring(slashIndex).lastIndexOf('.'); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm leaving this as-is. That code has worked for ages and is used daily. All I did was factor it out of the Result hierarchy. Later it will move to ISourceLocation implementations in vallang. Then we can have another look at optimality IMHO. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay, I reviewed this as new code, since I didn't see a subsequent removal of the same code from somewhere else. I didn't realise it's a clone, could we add some comments to help someone that is fixing a bug do it at the two places where this code exists? Or otherwise rewrite the result side to use this function? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I can factor this out of SourceLocationResult now. Then there is only one copy. Next we'll move it to ISourceLocation implementations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The nasty part of this is that URIUtil's |
||
|
||
if (index == -1 && !ext.isEmpty()) { | ||
path = path + (!ext.startsWith(".") ? "." : "") + ext; | ||
} | ||
else if (!ext.isEmpty()) { | ||
path = path.substring(0, slashIndex + index) + (!ext.startsWith(".") ? "." : "") + ext; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are 4 different cases handled in this function, where for my intuition, only one of them matches the name of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure I'll document it. It is the exact semantics of field updates in locations i n Rascal.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. o, and then there is the two cases of not having a "." and having a "." in the replacement part. So that makes 12 cases. |
||
} | ||
else if (index != -1) { | ||
path = path.substring(0, slashIndex + index); | ||
} | ||
|
||
if (endsWithSlash) { | ||
path = path + URIUtil.URI_PATH_SEPARATOR; | ||
} | ||
} | ||
|
||
return URIUtil.changePath(location, path); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed :) maybe we should print a warning if the src path is the same as the binpath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a big issue with that. Only you'd have to use a resource copier to get everything into the jar. it would help with testing the feature if you print next to the source files, since the current interpreter would pick them up immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I might have thought too quickly that we could detect this happening.