Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split and update Solidity grammar #4257

Merged
merged 10 commits into from
Sep 27, 2024
Merged

Conversation

codeFather2
Copy link
Contributor

This PR updates the Solidity grammar to match the latest version of the official documentation.
Link to official grammar: https://github.com/ethereum/solidity/tree/develop/docs/grammar

/**
* Solidity is a statically typed, contract-oriented, high-level language for implementing smart contracts on the Ethereum platform.
*/
parser grammar SolidityParser;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, it might be best to add an entirely new version of the grammar, e.g., root of repo/solidity/v0.8/SolidityParser.g4 (etc), root of repo/solidity/v0.9/SolidityParser.g4 (etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now i just updated to the last version, since previous was added in 2017, and most of breaking changes added before 2021 and there is no back compatibility between versions. For the future breaking changes it makes sense to have separate grammars

solidity/SolidityParser.g4 Outdated Show resolved Hide resolved
;

//@doc:inline
variableDeclarationList
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variableDeclarationList is unused, and it is not in the "specification" at https://docs.soliditylang.org/en/latest/grammar.html.

$ bash /c/Users/Kenne/Documents/GitHub/g4-scripts/find-unused-parser-symbols.sh
No arguments were provided.
Finding unused parser symbols in grammars...
./desc.xml
SolidityParser.g4:L713: variableDeclarationList
                        ^
09/24-06:54:01 ~/issues/g4-4257/solidity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it, since it is in the original spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create an Issue in the originating repo for this and other issues. There are many useless parentheses, which don't add anything to the clarity of the grammar.

Also, there is ambiguity in expression, etc. Ambiguity is concerning because it means this grammar was not thoroughly debugged.

* Definition of the special receive function.
*/
receiveFunctionDefinition
locals[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these locals and semantic predicates seem to force single usage of some syntax. This doesn't seem to be a large "positive" given that the checks are for a specific target, and we've been saying that semantic checking should be done post parse. Perhaps we can just remove these at this point? That said, one can easily remove all these via a single Trash command that edits the grammar prior to generating a parser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also though about it, will remove a semantics related locals and predicates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed local predicates and rewrote parameterized rules to remove parameters

@kaby76
Copy link
Contributor

kaby76 commented Sep 24, 2024

This PR updates the Solidity grammar to match the latest version of the official documentation.
Link to official grammar: https://github.com/ethereum/solidity/tree/develop/docs/grammar

OK, it makes sense to match the grammar here with that at https://github.com/ethereum/solidity/tree/develop/docs/grammar. That is why variableDeclarationList still exists. Perhaps just make a note in the readme.md that this is a copy of the official version?

@codeFather2
Copy link
Contributor Author

This PR updates the Solidity grammar to match the latest version of the official documentation.
Link to official grammar: https://github.com/ethereum/solidity/tree/develop/docs/grammar

OK, it makes sense to match the grammar here with that at https://github.com/ethereum/solidity/tree/develop/docs/grammar. That is why variableDeclarationList still exists. Perhaps just make a note in the readme.md that this is a copy of the official version?

Yeah, i added a comment that grammar is copied from docs repo

@kaby76
Copy link
Contributor

kaby76 commented Sep 24, 2024

As mentioned above, there seems to be ambiguity in expression with this input, which is the first example from https://docs.soliditylang.org/en/v0.8.27/solidity-by-example.html.

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.7.0 <0.9.0;
/// @title Voting with delegation.
contract Ballot {
    // This declares a new complex type which will
    // be used for variables later.
    // It will represent a single voter.
    struct Voter {
        uint weight; // weight is accumulated by delegation
        bool voted;  // if true, that person already voted
        address delegate; // person delegated to
        uint vote;   // index of the voted proposal
    }

    // This is a type for a single proposal.
    struct Proposal {
        bytes32 name;   // short name (up to 32 bytes)
        uint voteCount; // number of accumulated votes
    }

    address public chairperson;

    // This declares a state variable that
    // stores a `Voter` struct for each possible address.
    mapping(address => Voter) public voters;

    // A dynamically-sized array of `Proposal` structs.
    Proposal[] public proposals;

    /// Create a new ballot to choose one of `proposalNames`.
    constructor(bytes32[] memory proposalNames) {
        chairperson = msg.sender;
        voters[chairperson].weight = 1;

        // For each of the provided proposal names,
        // create a new proposal object and add it
        // to the end of the array.
        for (uint i = 0; i < proposalNames.length; i++) {
            // `Proposal({...})` creates a temporary
            // Proposal object and `proposals.push(...)`
            // appends it to the end of `proposals`.
            proposals.push(Proposal({
                name: proposalNames[i],
                voteCount: 0
            }));
        }
    }

    // Give `voter` the right to vote on this ballot.
    // May only be called by `chairperson`.
    function giveRightToVote(address voter) external {
        // If the first argument of `require` evaluates
        // to `false`, execution terminates and all
        // changes to the state and to Ether balances
        // are reverted.
        // This used to consume all gas in old EVM versions, but
        // not anymore.
        // It is often a good idea to use `require` to check if
        // functions are called correctly.
        // As a second argument, you can also provide an
        // explanation about what went wrong.
        require(
            msg.sender == chairperson,
            "Only chairperson can give right to vote."
        );
        require(
            !voters[voter].voted,
            "The voter already voted."
        );
        require(voters[voter].weight == 0);
        voters[voter].weight = 1;
    }

    /// Delegate your vote to the voter `to`.
    function delegate(address to) external {
        // assigns reference
        Voter storage sender = voters[msg.sender];
        require(sender.weight != 0, "You have no right to vote");
        require(!sender.voted, "You already voted.");

        require(to != msg.sender, "Self-delegation is disallowed.");

        // Forward the delegation as long as
        // `to` also delegated.
        // In general, such loops are very dangerous,
        // because if they run too long, they might
        // need more gas than is available in a block.
        // In this case, the delegation will not be executed,
        // but in other situations, such loops might
        // cause a contract to get "stuck" completely.
        while (voters[to].delegate != address(0)) {
            to = voters[to].delegate;

            // We found a loop in the delegation, not allowed.
            require(to != msg.sender, "Found loop in delegation.");
        }

        Voter storage delegate_ = voters[to];

        // Voters cannot delegate to accounts that cannot vote.
        require(delegate_.weight >= 1);

        // Since `sender` is a reference, this
        // modifies `voters[msg.sender]`.
        sender.voted = true;
        sender.delegate = to;

        if (delegate_.voted) {
            // If the delegate already voted,
            // directly add to the number of votes
            proposals[delegate_.vote].voteCount += sender.weight;
        } else {
            // If the delegate did not vote yet,
            // add to her weight.
            delegate_.weight += sender.weight;
        }
    }

    /// Give your vote (including votes delegated to you)
    /// to proposal `proposals[proposal].name`.
    function vote(uint proposal) external {
        Voter storage sender = voters[msg.sender];
        require(sender.weight != 0, "Has no right to vote");
        require(!sender.voted, "Already voted.");
        sender.voted = true;
        sender.vote = proposal;

        // If `proposal` is out of the range of the array,
        // this will throw automatically and revert all
        // changes.
        proposals[proposal].voteCount += sender.weight;
    }

    /// @dev Computes the winning proposal taking all
    /// previous votes into account.
    function winningProposal() public view
            returns (uint winningProposal_)
    {
        uint winningVoteCount = 0;
        for (uint p = 0; p < proposals.length; p++) {
            if (proposals[p].voteCount > winningVoteCount) {
                winningVoteCount = proposals[p].voteCount;
                winningProposal_ = p;
            }
        }
    }

    // Calls winningProposal() function to get the index
    // of the winner contained in the proposals array and then
    // returns the name of the winner
    function winnerName() external view
            returns (bytes32 winnerName_)
    {
        winnerName_ = proposals[winningProposal()].name;
    }
}

On line 91, while (voters[to].delegate != address(0)) {, this is ambiguous because there are two parses, one with the (0) associated with address, the other associated with voters[to].delegate != address. One thing I do notice are the alts | <assoc = right> expression assignOp expression, | expression callArgumentList , | Payable callArgumentList. These are classic mistakes--you cannot "hide" the operators behind a non-terminal! There is some discussion with a PR that fixes this problem here: antlr/antlr4#4480

@codeFather2
Copy link
Contributor Author

These are classic mistakes--you cannot "hide" the operators behind a non-terminal!

Let's leave it like this until antlr/antlr4#4480 will not be merged. Also i want to improve the grammar in next iterations, probably we can avoid non-left recursion in expression rule just by splitting the rule. Added this to known issues in readme

@kaby76
Copy link
Contributor

kaby76 commented Sep 24, 2024

I'll test the grammar as is with #4257 to see if, indeed, it fixes the ambiguity.

@kaby76
Copy link
Contributor

kaby76 commented Sep 24, 2024

The Antlr PR didn't fix the ambiguity. Will try restructuring the grammar slightly.

@kaby76
Copy link
Contributor

kaby76 commented Sep 24, 2024

The Antlr PR didn't fix the ambiguity. Will try restructuring the grammar slightly.

The fix is to simply unfold the symbol callArgumentList in the two alts it is used in expression. (I.e., paste the RHS of the callArgumentList where the symbol occurs in `expression.)

https://github.com/codeFather2/grammars-v4/blob/45afcb5b79380bb5aec4add590328a777b730685/solidity/SolidityParser.g4#L460-L461

With this change, the grammar has no ambiguity with regards to this particular issue.

$ dotnet trperf -- -c a ../examples/ex2.sol | grep -v '^0'
Time to parse: 00:00:00.0795611
1
09/24-19:26:01 ~/issues/g4-4257/solidity/Generated-before
$ cd ../Generated-after/
09/24-19:26:13 ~/issues/g4-4257/solidity/Generated-after
$ dotnet trperf -- -c a ../examples/ex2.sol | grep -v '^0'
Time to parse: 00:00:00.0830984
09/24-19:26:17 ~/issues/g4-4257/solidity/Generated-after

That said, it does not take care of all the ambiguity, which was discovered in examples/test.sol.

$ dotnet trperf -- -c ard ../examples/test.sol | grep -v '^0'
Time to parse: 00:00:00.1058447
1       typeName        71
6       expression      96
1       yulVariableDeclaration  136
09/24-19:39:29 ~/issues/g4-4257/solidity/Generated-after

@kaby76
Copy link
Contributor

kaby76 commented Sep 25, 2024

I added an Issue in this repo to note the ambiguities in the grammar. #4261

This PR looks fine for merging.

@teverett
Copy link
Member

@codeFather2 thanks

@teverett teverett merged commit 58f7d04 into antlr:master Sep 27, 2024
30 checks passed
@codeFather2 codeFather2 deleted the solidity-update branch November 7, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants