-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for #12 -- Allowed memory size of 134217728 bytes exhausted. #34
base: dev
Are you sure you want to change the base?
Conversation
…ugh accessor that does bitmap adjustment.
There doesn't seem to be any assert() calls in the PHP runtime--at all. For example, here in C# tests for some sanity in the stream, but it is missing here in PHP. How can I guarantee code safe? |
PHP does have support for assertions: But I'd recommend using exceptions for that case. |
I'm trying to fix the ATNConfigSet configLookup data structure, and realizing that what I thought was a Map isn't a Map like Java and C#. Consequently, when I tried to update Map uses an Equivalence, but this--strangly--inherits from Equatable.
Now, HashMap<> in Java uses equals(); Dictionary<> in the Dotnet runtime also uses Equals(). There is no such thing as the "Equivalence" class in C# or Java. There is no In the Antlr PHP runtime, when entering a value into a Map, it uses equivalent(), not equals() That's explains why my changes to Can someone tell me what in the world |
…and CSharp targets.
PHP does not have native support for logical value comparison. So, hash-based data structures need to know how to compare the value for equality. That's where Equivalence comes in. For example, to reproduce Java’s behavior, one needs to implement an Equivalence that just delegates the check to the equals method. But it also allows to have custom logic in case the equals doesn't meet the requirement. |
Thanks, got it. So, I know what Equivalence.equivalent() is supposed to do, and I also know what Equivalence.hash() is supposed to do. But what is Equivalence.equals() supposed to do? In fact, as a check, I just removed the inheritance in Equivalence to Equatable, removed the implementations for Equivalence.equals() and it all works fine. equals() is never called in a Map or Set. And that's the only places where Equivalence are used. Actually, Equivalence.equals() is referenced in Set here antlr-php-runtime/src/Utils/Set.php Line 173 in ac71eb3
I must say, phpstan and phpcs are really, really essential. Great tools.. |
Equivalence is an object as any other object. The |
PHPStan is one of the most advanced analysis tool for PHP. It provides static-time support for advanced types like generics, conditional types (like TS), and some other capabilities that even the Java compiler doesn't support. Reason why I maintain the PHP extension for Antlr. |
In some places, we're using anonymous classes for implementing custom equivalences. That may be related to the problem. If so, it's just a matter of extracting the class into named classes. The point is that two sets are only equivalent if they have the same elements and the same equivalence relation. That's why the equals method in Equivalence is essential. |
…ys returns an ATNConfig not null.
…es, not Equality methods.
I haven't yet found the bug with the circular pointer chain issue. But, I'm making my way through the g4 grammars. It seems that for many cases, the code is 20% faster, sometimes more. That said, there is some odd behavior in that some inputs for a grammar are very fast, e.g., ~0.3s, and then other inputs for the same grammar are really slow, like 5-100s depending on the grammar. I think it has to do with handling ambiguity. For example, for agc input ASSEMBLY_AND_OPERATION_INFORMATION.agc, the parse is a reasonable 0.4s. But, for PINBALL_GAME_BUTTONS_AND_LIGHTS.agc, it takes a staggering 88s (or 120s for current "dev" branch). The lab.antlr.org tester says "line_instruction" alt 6 is very ambiguous, with "max k"=12, but executed ~2000 times. |
…ot "-1" as in the other targets.
The code is still having some ATN trace differences: css3, bootstrap-theme.min.css. Need to fix these first before continuing with performance. |
I still do not know why, but I know where there's a divergence between C# and PHP. It is in the stack merging of sub-parsers (see Section 5.1 of https://www.antlr.org/papers/allstar-techreport.pdf). The parent chains between the two targets look the same, but are not. In other words, the merge caches differ between targets. The merge cache is implemented in CSharp as |
Maybe compare to a third, battle-tested target? I always take the Java target as the ground truth. |
Java and CSharp ATN traces are in complete agreement (~446 MB output size). |
I decided to just pull out the
|
That's fine. I will come back to this after going through the tree diffing implementation for grammars-v4. We can then worry about improving performance. |
Great! Do you have any ETA? I'm asking because I want to reserve a time slot on my side to work on this |
I'm probably going to revisit this next week. The changes for antlr/grammars-v4#3138 are nearly done, checking in today and tomorrow. The changes for antlr/grammars-v4#2991 is really easy: just remaster all the .tree files and turn off any ports that aren't producing a correct tree. |
@marcospassos Is it possible to get a release for the current 4.12.0 for the PHP runtime? That would in setting up the testing for PHP for grammars-v4 so I don't need to special case the PHP tests to use the older 4.11.x tool. |
Sure! |
Thanks!! It works! |
@kaby76, could you please share the instructions for getting the remaining diffs? You probably didn't have time to return to this, so I'll try to take over. I just need to know what is still wrong and how to reproduce. Also, did you try against the Java implementation that's the reference? We can't just compare PHP and C# and state that PHP is wrong. The Java target is the ground truth. |
Sorry, I haven't had much time to recover where I was because there are so many other issues going on in grammars-v4 (stuck on the scala grammar). The diff was in the trace for css3 as mentioned here: #34 (comment) grammar css3 To test, parse trace must be on (PHP To turn on atn tracing: PHP I expect to find more tree diff errors across targets for grammars-v4, but I haven't tried turning it on yet. |
I just came up with an idea on how to help find the investigation starting point:
I'll test this approach. Provided it works, we can test it against any grammar and quickly identify under which conditions it happens. What do you think, @kaby76? |
Summary
This is a fix for Issue #12. The main problem was due to some confusion over Map, Set, ===, hash(), and equals(), which is easy to get wrong because these data structures are pretty complicated. (In fact, I think I may have the equivalent() method wrong in the anonymous class in ATNConfigSet.)
Checklist
I've tested this with grammars-v4/aql. The parse completes for for.aql now, and the parse tree is the same between PHP and CSharp.
I will be checking the parser ATN trace for all grammars in grammars-v4 and examples.