Fix for #12 -- Allowed memory size of 134217728 bytes exhausted. #34

kaby76 · 2022-12-21T00:56:05Z

Summary

This is a fix for Issue #12. The main problem was due to some confusion over Map, Set, ===, hash(), and equals(), which is easy to get wrong because these data structures are pretty complicated. (In fact, I think I may have the equivalent() method wrong in the anonymous class in ATNConfigSet.)

Checklist

I've tested this with grammars-v4/aql. The parse completes for for.aql now, and the parse tree is the same between PHP and CSharp.

I will be checking the parser ATN trace for all grammars in grammars-v4 and examples.

…ugh accessor that does bitmap adjustment.

kaby76 · 2022-12-23T18:15:49Z

There doesn't seem to be any assert() calls in the PHP runtime--at all. For example, here in C# tests for some sanity in the stream, but it is missing here in PHP. How can I guarantee code safe?

marcospassos · 2022-12-23T20:28:13Z

PHP does have support for assertions:
https://www.php.net/manual/en/function.assert.php

But I'd recommend using exceptions for that case.

kaby76 · 2022-12-24T02:46:00Z

I'm trying to fix the ATNConfigSet configLookup data structure, and realizing that what I thought was a Map isn't a Map like Java and C#. Consequently, when I tried to update equivalent() to be something akind to pointer equivalence, nothing worked. So, I read the source code in the Antlr PHP runtime, OpenJDK HashMap, and Dotnet Dictionary.

Map uses an Equivalence, but this--strangly--inherits from Equatable.

interface Equivalence extends Equatable
{
    public function equivalent(Hashable $left, Hashable $right): bool;
    public function hash(Hashable $value): int;
}

interface Equatable
{
    public function equals(object $other): bool;
}

Now, HashMap<> in Java uses equals(); Dictionary<> in the Dotnet runtime also uses Equals(). There is no such thing as the "Equivalence" class in C# or Java. There is no equivalent() method to cloud which comparison method to use/define, and determine what is the difference between the two.

In the Antlr PHP runtime, when entering a value into a Map, it uses equivalent(), not equals() That's explains why my changes to equivalent() for phpstan broke my fix.

Can someone tell me what in the world equals() is supposed to do in relation to equivalent()? Are they different or the same? Map doesn't even use equals().

…and CSharp targets.

marcospassos · 2022-12-24T13:44:59Z

PHP does not have native support for logical value comparison. So, hash-based data structures need to know how to compare the value for equality. That's where Equivalence comes in.

For example, to reproduce Java’s behavior, one needs to implement an Equivalence that just delegates the check to the equals method. But it also allows to have custom logic in case the equals doesn't meet the requirement.

kaby76 · 2022-12-24T15:24:46Z

PHP does not have native support for logical value comparison. So, hash-based data structures need to know how to compare the value for equality. That's where Equivalence comes in.

For example, to reproduce Java’s behavior, one needs to implement an Equivalence that just delegates the check to the equals method. But it also allows to have custom logic in case the equals doesn't meet the requirement.

Thanks, got it.

So, I know what Equivalence.equivalent() is supposed to do, and I also know what Equivalence.hash() is supposed to do. But what is Equivalence.equals() supposed to do?

In fact, as a check, I just removed the inheritance in Equivalence to Equatable, removed the implementations for Equivalence.equals() and it all works fine. equals() is never called in a Map or Set. And that's the only places where Equivalence are used.

Actually, Equivalence.equals() is referenced in Set here

antlr-php-runtime/src/Utils/Set.php

Line 173 in ac71eb3

public function equals(object $other): bool

. But, it can be removed, and it still all works fine.

I must say, phpstan and phpcs are really, really essential. Great tools..

marcospassos · 2022-12-24T16:21:38Z

Equivalence is an object as any other object. The equal method just tells whether an equivalence relation is equal to another. In other words, a relation R is said to be equal to another relation W if, and only if, they are the same class and impose the same equivalence relation. In practice, for the implementation shipped in the target, it means being in the same class (since they are final classes).

marcospassos · 2022-12-24T16:25:19Z

I must say, phpstan and phpcs are really, really essential. Great tools

PHPStan is one of the most advanced analysis tool for PHP. It provides static-time support for advanced types like generics, conditional types (like TS), and some other capabilities that even the Java compiler doesn't support. Reason why I maintain the PHP extension for Antlr.

marcospassos · 2022-12-24T16:30:34Z

In some places, we're using anonymous classes for implementing custom equivalences. That may be related to the problem. If so, it's just a matter of extracting the class into named classes.

The point is that two sets are only equivalent if they have the same elements and the same equivalence relation. That's why the equals method in Equivalence is essential.

…ys returns an ATNConfig not null.

…es, not Equality methods.

kaby76 · 2022-12-26T13:49:54Z

I haven't yet found the bug with the circular pointer chain issue. But, I'm making my way through the g4 grammars. It seems that for many cases, the code is 20% faster, sometimes more. That said, there is some odd behavior in that some inputs for a grammar are very fast, e.g., ~0.3s, and then other inputs for the same grammar are really slow, like 5-100s depending on the grammar. I think it has to do with handling ambiguity.

For example, for agc input ASSEMBLY_AND_OPERATION_INFORMATION.agc, the parse is a reasonable 0.4s. But, for PINBALL_GAME_BUTTONS_AND_LIGHTS.agc, it takes a staggering 88s (or 120s for current "dev" branch). The lab.antlr.org tester says "line_instruction" alt 6 is very ambiguous, with "max k"=12, but executed ~2000 times.

…ot "-1" as in the other targets.

kaby76 · 2023-01-01T01:10:31Z

The code is still having some ATN trace differences: css3, bootstrap-theme.min.css. Need to fix these first before continuing with performance.

kaby76 · 2023-01-02T09:33:46Z

I still do not know why, but I know where there's a divergence between C# and PHP. It is in the stack merging of sub-parsers (see Section 5.1 of https://www.antlr.org/papers/allstar-techreport.pdf). The parent chains between the two targets look the same, but are not. In other words, the merge caches differ between targets. The merge cache is implemented in CSharp as MergeCache, which contains a field Dictionary<PredictionContext, Dictionary<PredictionContext, PredictionContext>>. In PHP, it is implemented as DoubleKeyMap. This code is placed strangely in src/Utils/ as though it is a general data structure, but only stores PredictionContext, and only used for subparser merging (in C#, it is in the atn/ code). I am not confident whether the C# or PHP code is working correctly yet as this code is pretty complicated.

marcospassos · 2023-01-02T11:11:31Z

Maybe compare to a third, battle-tested target? I always take the Java target as the ground truth.

kaby76 · 2023-01-02T14:25:22Z

Maybe compare to a third, battle-tested target? I always take the Java target as the ground truth.

Java and CSharp ATN traces are in complete agreement (~446 MB output size).

kaby76 · 2023-01-03T13:46:35Z

I decided to just pull out the -trace option on the parsing test and just compare the parse trees between CSharp, Java, and PHP. Yes, it looks like PHP isn't working correctly sometimes. Java and CSharp are in agreement.

~/issues/issue-3462/kaby76/css3/Generated-PHP ~/issues/issue-3462/kaby76/css3
PHP 0 ../examples/mdn-at-counter-style.css success 14.5657256
Total Time: 14.6478964
dos2unix: converting file o.txt to Unix format...
~/issues/issue-3462/kaby76/css3
~/issues/issue-3462/kaby76/css3/Generated-CSharp ~/issues/issue-3462/kaby76/css3
CSharp 0 ../examples/mdn-at-counter-style.css success 0.1630958
Total Time: 0.2249674
dos2unix: converting file o.txt to Unix format...
~/issues/issue-3462/kaby76/css3
1c1
< (stylesheet ws (nestedStatement (counterStyle @counter-style (ws  ) (ident circled-alpha) (ws  ) { (ws \n  ) (declarationList (declaration (property_ (ident system) ws) : (ws  ) (expr (term (ident fixed) ws))) ws ; (ws \n  ) (declaration (property_ (ident symbols) ws) : (ws  ) (expr (term (ident Ⓐ) (ws  )) (term (ident Ⓑ) (ws  )) (term (ident Ⓒ) (ws  )) (term (ident Ⓓ) (ws  )) (term (ident Ⓔ) (ws  )) (term (ident Ⓕ) (ws  )) (term (ident Ⓖ) (ws  )) (term (ident Ⓗ) (ws  )) (term (ident Ⓘ) (ws  )) (term (ident Ⓙ) (ws  )) (term (ident Ⓚ) (ws  )) (term (ident Ⓛ) (ws  )) (term (ident Ⓜ) (ws  )) (term (ident Ⓝ) (ws  )) (term (ident Ⓞ) (ws  )) (term (ident Ⓟ) (ws  )) (term (ident Ⓠ) (ws  )) (term (ident Ⓡ) (ws  )) (term (ident Ⓢ) (ws  )) (term (ident Ⓣ) (ws  )) (term (ident Ⓤ) (ws  )) (term (ident Ⓥ) (ws  )) (term (ident Ⓦ) (ws  )) (term (ident Ⓧ) (ws  )) (term (ident Ⓨ) (ws  )) (term (ident Ⓩ) ws))) ; (ws \n  ) (declaration (property_ (ident suffix) ws) : (ws  ) (expr (term " " ws))) ; (ws \n)) } (ws \n))) <EOF>)
---
> (stylesheet ws (nestedStatement (counterStyle @counter-style (ws  ) (ident circled-alpha) (ws  ) { (ws \n  ) (declarationList (declaration (property_ (ident system) ws) : (ws  ) (expr (term (ident fixed) ws))) ws ; (ws \n  ) (declaration (property_ (ident symbols) ws) : (ws  ) (expr (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) (ws  )) (term (ident ?) ws))) ; (ws \n  ) (declaration (property_ (ident suffix) ws) : (ws  ) (expr (term " " ws))) ; (ws \n)) } (ws \n))) <EOF>)

marcospassos · 2023-03-06T15:30:10Z

@kaby76, I just didn't invest more time here to refine the implementation because you said there are still some differences.

If you could fix these differences, I'll gladly review, test and finish everything else. I want to make sure there are no differences left before merging this.

kaby76 · 2023-03-06T15:35:58Z

That's fine. I will come back to this after going through the tree diffing implementation for grammars-v4. We can then worry about improving performance.

marcospassos · 2023-03-06T15:46:45Z

Great! Do you have any ETA? I'm asking because I want to reserve a time slot on my side to work on this

kaby76 · 2023-03-06T15:50:00Z

I'm probably going to revisit this next week. The changes for antlr/grammars-v4#3138 are nearly done, checking in today and tomorrow. The changes for antlr/grammars-v4#2991 is really easy: just remaster all the .tree files and turn off any ports that aren't producing a correct tree.

kaby76 · 2023-03-06T18:45:20Z

@marcospassos Is it possible to get a release for the current 4.12.0 for the PHP runtime? That would in setting up the testing for PHP for grammars-v4 so I don't need to special case the PHP tests to use the older 4.11.x tool.

marcospassos · 2023-03-06T19:25:48Z

Sure!

marcospassos · 2023-03-06T19:39:03Z

@kaby76 done

kaby76 · 2023-03-06T19:46:12Z

Thanks!! It works!

marcospassos · 2023-04-10T11:59:14Z

@kaby76, could you please share the instructions for getting the remaining diffs? You probably didn't have time to return to this, so I'll try to take over. I just need to know what is still wrong and how to reproduce.

Also, did you try against the Java implementation that's the reference? We can't just compare PHP and C# and state that PHP is wrong. The Java target is the ground truth.

kaby76 · 2023-04-10T13:15:40Z

Sorry, I haven't had much time to recover where I was because there are so many other issues going on in grammars-v4 (stuck on the scala grammar).

The diff was in the trace for css3 as mentioned here: #34 (comment)

grammar css3
input bootstrap-theme.min.css

To test, parse trace must be on (PHP $parser->setTrace(true);, php -d memory_limit=1G Test.php; Java parser.setTrace(true);). I think I also tested using the internal atn tracing as well to check the set computations. I don't have diffs because in general the files would are gigabytes long.

To turn on atn tracing: PHP Antlr\Antlr4\Runtime\Atn\ParserATNSimulator::$traceAtnSimulation = true;; Java ParserATNSimulator.trace_atn_sim = true;). CSharp and Java always agreed, all gigabyte lengthed files. After identifying computation divergence, I used side-by-side stepping to find data structure computation errors in PHP. I used CSharp because the CSharp tools are far, far superior in almost every way over multiple Java debugging tools. For example, there is no set program counter and go in Java.

I expect to find more tree diff errors across targets for grammars-v4, but I haven't tried turning it on yet.

marcospassos · 2023-04-10T15:24:24Z

I just came up with an idea on how to help find the investigation starting point:

Generate the tracing diff in the target used as the ground truth (i.e., CSS3 for Java).
Use a special logger implementation that, instead of writing, checks if the current line matches the entry being written
If it doesn't, throw an exception. This is precisely the diverging point.

I'll test this approach. Provided it works, we can test it against any grammar and quickly identify under which conditions it happens.

What do you think, @kaby76?

kaby76 added 7 commits December 20, 2022 19:42

Fix for antlr#12 -- Allowed memory size of 134217728 bytes exhausted.

f408ba3

Fix bug in incorrect access of field reachesIntoOuterContext not thro…

815349f

…ugh accessor that does bitmap adjustment.

Fixing parser ATN tracing to be identical with CSharp. Perfect!!

169a4d2

Fix newline characters.

566bfc5

Let's see if that works on analysis errors.

f62f637

OK, that did. Try this fix for more analysis errors.

59e8eaa

Try this too.

b47d2a5

Break out ConfigHashSet since that's exactly how it's done with Java …

cf51629

…and CSharp targets.

kaby76 added 8 commits December 24, 2022 15:41

Clean up assorted phpstand and phpcs errors.

43eddcf

phpstan and phpcs did not catch this obvious static type error.

15d156c

Remove phpcs error, add in more type checking.

3b1e1bb

Add in more type safety with phpstan and phpcs. Note, getOrAdd() alwa…

6555e81

…ys returns an ATNConfig not null.

Fundtion equivalent() should call equivalent() of child data structur…

477a44f

…es, not Equality methods.

Copying regressions.

62ed357

Simplify

3000a37

There is no equivalent() is Hashable.

583d722

kaby76 added 2 commits December 29, 2022 04:08

Default value for forcing hash computation is "null" in PHP target, n…

947247d

…ot "-1" as in the other targets.

Changes for antlr#36

c2551b4

kaby76 mentioned this pull request Jan 5, 2023

We should always test the parse trees generated antlr/grammars-v4#2991

Open

This was referenced Jan 18, 2023

Warm-up really doesn't help much, if anything at all, in performance #36

Open

ATN cannot be deserialized in PHP-runtime when using example XML-language antlr/antlr4#4075

Open

kaby76 marked this pull request as ready for review March 6, 2023 11:29

This was referenced Nov 25, 2023

Reformatting all grammars antlr/grammars-v4#3843

Merged

[various grammars] Fix targets that fail build antlr/grammars-v4#3847

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for #12 -- Allowed memory size of 134217728 bytes exhausted. #34

Fix for #12 -- Allowed memory size of 134217728 bytes exhausted. #34

kaby76 commented Dec 21, 2022

kaby76 commented Dec 23, 2022 •

edited

Loading

marcospassos commented Dec 23, 2022

kaby76 commented Dec 24, 2022 •

edited

Loading

marcospassos commented Dec 24, 2022

kaby76 commented Dec 24, 2022 •

edited

Loading

marcospassos commented Dec 24, 2022 •

edited

Loading

marcospassos commented Dec 24, 2022 •

edited

Loading

marcospassos commented Dec 24, 2022

kaby76 commented Dec 26, 2022 •

edited

Loading

kaby76 commented Jan 1, 2023

kaby76 commented Jan 2, 2023 •

edited

Loading

marcospassos commented Jan 2, 2023

kaby76 commented Jan 2, 2023

kaby76 commented Jan 3, 2023

marcospassos commented Mar 6, 2023 •

edited

Loading

kaby76 commented Mar 6, 2023

marcospassos commented Mar 6, 2023

kaby76 commented Mar 6, 2023

kaby76 commented Mar 6, 2023

marcospassos commented Mar 6, 2023

marcospassos commented Mar 6, 2023

kaby76 commented Mar 6, 2023

marcospassos commented Apr 10, 2023 •

edited

Loading

kaby76 commented Apr 10, 2023

marcospassos commented Apr 10, 2023 •

edited

Loading

Fix for #12 -- Allowed memory size of 134217728 bytes exhausted. #34

Are you sure you want to change the base?

Fix for #12 -- Allowed memory size of 134217728 bytes exhausted. #34

Conversation

kaby76 commented Dec 21, 2022

Summary

Checklist

kaby76 commented Dec 23, 2022 • edited Loading

marcospassos commented Dec 23, 2022

kaby76 commented Dec 24, 2022 • edited Loading

marcospassos commented Dec 24, 2022

kaby76 commented Dec 24, 2022 • edited Loading

marcospassos commented Dec 24, 2022 • edited Loading

marcospassos commented Dec 24, 2022 • edited Loading

marcospassos commented Dec 24, 2022

kaby76 commented Dec 26, 2022 • edited Loading

kaby76 commented Jan 1, 2023

kaby76 commented Jan 2, 2023 • edited Loading

marcospassos commented Jan 2, 2023

kaby76 commented Jan 2, 2023

kaby76 commented Jan 3, 2023

marcospassos commented Mar 6, 2023 • edited Loading

kaby76 commented Mar 6, 2023

marcospassos commented Mar 6, 2023

kaby76 commented Mar 6, 2023

kaby76 commented Mar 6, 2023

marcospassos commented Mar 6, 2023

marcospassos commented Mar 6, 2023

kaby76 commented Mar 6, 2023

marcospassos commented Apr 10, 2023 • edited Loading

kaby76 commented Apr 10, 2023

marcospassos commented Apr 10, 2023 • edited Loading

kaby76 commented Dec 23, 2022 •

edited

Loading

kaby76 commented Dec 24, 2022 •

edited

Loading

kaby76 commented Dec 24, 2022 •

edited

Loading

marcospassos commented Dec 24, 2022 •

edited

Loading

marcospassos commented Dec 24, 2022 •

edited

Loading

kaby76 commented Dec 26, 2022 •

edited

Loading

kaby76 commented Jan 2, 2023 •

edited

Loading

marcospassos commented Mar 6, 2023 •

edited

Loading

marcospassos commented Apr 10, 2023 •

edited

Loading

marcospassos commented Apr 10, 2023 •

edited

Loading