-
-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: create sankey
parser and integrate sankey
parser into mermaid
package
#4799
base: develop
Are you sure you want to change the base?
feat: create sankey
parser and integrate sankey
parser into mermaid
package
#4799
Conversation
✅ Deploy Preview for mermaid-js ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
The remaining issues: mermaid/packages/mermaid/src/diagrams/sankey/sankeyRenderer.ts Lines 179 to 180 in 7ca02da
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4799 +/- ##
==========================================
- Coverage 5.73% 5.70% -0.03%
==========================================
Files 278 282 +4
Lines 42019 42165 +146
Branches 516 520 +4
==========================================
- Hits 2409 2407 -2
- Misses 39610 39758 +148
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Fixed |
ae9b5ff
to
7ca02da
Compare
I think we should also remove this file https://github.com/mermaid-js/mermaid/blob/develop/packages/mermaid/src/diagrams/sankey/parser/energy.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome job with the types!
The additional complexity of langium is a bit concerning. Are you sure there is no simpler way?
export const sankeyLinkSourceRegex = /(?:"((?:""|[^"])+)")|([^\n\r,]+)/; | ||
|
||
/** | ||
* Matches sankey link target node | ||
*/ | ||
export const sankeyLinkTargetRegex = /,(?:(?:"((?:""|[^"])+)")|([^\n\r,]+))/; | ||
|
||
/** | ||
* Matches sankey link value | ||
*/ | ||
export const sankeyLinkValueRegex = /,("(?:0|[1-9]\d*)(?:\.\d+)?"|[\t ]*(?:0|[1-9]\d*)(?:\.\d+)?)/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear why this complex regex is necessary. It feels like there should be an easier way.
When comparing with the deleted .jison
file, the langium one is a lot more complex. Which makes it harder for people to create new diagrams.
Something similar is duplicated inside the .langium
file too.
terminal SANKEY_LINK_VALUE returns number: /,("(0|[1-9][0-9]*)(\.[0-9]+)?"|[\t ]*(0|[1-9][0-9]*)(\.[0-9]+)?)/;
terminal SANKEY_LINK_TARGET: /,("((?:""|[^"])+)"|[^\n\r,]+)/;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matchers (most if the time) are just regexes to extract wanted data from tokens.
So here in these matcher regexes, it should match what is between doable quotes without including the wrapper doable quotes or everything until ,
.
It could be simplified and replace the first and last char, but we'll need to make sure which pattern has been matched.
As for .langium
, I'm not sure if we could simplify it without allowing weird pattern.
So in the value regex, disallowed some patterns:
, "0" %% no space after "
,0. %% no empty `.`
,00 %% first digit 0 can't have another number after it
We can just use \d+\.?\d*
but I wouldn't recommend it.
As for the second terminal rule, I'm not if we could simplify the doable quote part.
I think state
in jison
are similar to mode
in chevrotain
, so when the use "
it should enters a state where it should consume everything until it finds another "
, but I don't know how to use the same rule to enter and exist mode.
But with modes the following code would be invalid (where is should be since sankey
supposed to be csv
like grammar):
sankey-beta
source",target,0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could change langium rules to this:
terminal SANKEY_LINK_VALUE returns number: /,"0|[1-9][0-9]*)(\.[0-9]+)?"/ | /,[\t ]*(0|[1-9][0-9]*)(\.[0-9]+)?)/;
terminal SANKEY_LINK_TARGET: /,"(""|[^"])+"/ | /,[^\n\r,]+/;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that it would throw a warning when using this approach:
src/language/sankey/sankey.langium:45:45 - Regular expression flags are only applied if the terminal is not a composition
src/language/sankey/sankey.langium:45:66 - Regular expression flags are only applied if the terminal is not a composition
I have asked the Langium team about this issue we're facing; see: eclipse-langium/langium#1429. They're suggesting to annotate the greedy rule, which in this case is What do you guys think? This approach will allow us to add a comment to the greedy rules showing why we didn't write the regex directly in the grammar and annotate it with So something like this:
|
Hi @Yokozuna59 -- You are exactly right. It is not sorting. (sigh) I will look at the greedy... approach. Thanks for continuing to work with eclipse-langium to try to figure this out! |
@weedySeaDragon I have implemented the greedy approach in this comment (6c7b0f2). Please take a look, and we could revert the comment if it's not applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start.
- needs tests so we can make sure the token order is correct. (I wrote some but can't add commits to your branch. See below)
buildTerminalTokens()
will not work if there are two or more@greedy
annotations (see my comments in code). (See below for my suggested way to fix it.)- some minor suggested changes in the code
Tests (parser/src/tests/sankey-tokenBuilder.test.ts)
import { beforeAll, describe, expect, it } from 'vitest';
import { createSankeyServices, type SankeyServices } from '../src/language/index.js';
import type { TokenType, TokenVocabulary } from 'chevrotain';
const sankeyServices: SankeyServices = createSankeyServices().Sankey;
describe('SankeyTokenBuilder', () => {
describe('token order', () => {
let tokenVocab: TokenVocabulary;
let tokenVocabNames: string[];
beforeAll(() => {
// Get the ordered tokens (the vocabulary) from the grammar
tokenVocab = sankeyServices.parser.TokenBuilder.buildTokens(sankeyServices.Grammar, {
caseInsensitive: sankeyServices.LanguageMetaData.caseInsensitive,
});
// coerce the tokenVocab to a type that can use .map
tokenVocabNames = (tokenVocab as TokenType[]).map((tokenVocabEntry: TokenType) => {
return tokenVocabEntry.name;
});
});
it('whitespace is always first', () => {
expect(tokenVocabNames[0]).toEqual('WHITESPACE');
});
it('sankey-beta comes after whitespace', () => {
expect(tokenVocabNames[1]).toEqual('sankey-beta');
});
describe('terminal rules with @greedy in comments are put at the end of the ordered list of tokens', () => {
const NUM_EXPECTED_GREEDY_RULES = 2;
let greedyGroupStartIndex = 0;
beforeAll(() => {
greedyGroupStartIndex = tokenVocabNames.length - NUM_EXPECTED_GREEDY_RULES - 1;
});
it('SANKEY_LINK_NODE rule has @greedy so it is in the last group of all @greedy terminal rules', () => {
// @ts-ignore TokenVocabulary does not have an index type, so ts complains when we use a numeric as an index type
expect(tokenVocabNames.indexOf('SANKEY_LINK_NODE')).toBeGreaterThanOrEqual(
greedyGroupStartIndex
);
});
it('SANKEY_LINK_VALUE rule has @greedy so it is in the last group of all @greedy terminal rules', () => {
// @ts-ignore TokenVocabulary does not have an index type, so ts complains when we use a numeric as an index type
expect(tokenVocabNames.indexOf('SANKEY_LINK_VALUE')).toBeGreaterThanOrEqual(
greedyGroupStartIndex
);
});
});
// console.log(`tokenVocabNames: ${tokenVocabNames}`); // @ts-ignore show this in the console for helpful debugging info
});
});
Suggested fix for AbstractMermaidTokenBuilder
1. Add a method to test if a rule has a @greedy
annotation
- extracting it into a separate method means it'll be easier to change (or move to different class) later
// TODO: This responsibility might better belong in CommentProvider (e.g. AbstractMermaidCommentProvider that is a subclass of CommentProvider).
public ruleHasGreedyComment(rule: GrammarAST.AbstractRule): boolean {
const comment = this.commentProvider.getComment(rule);
return !!comment && /@greedy/.test(comment);
}
2. Change the buildTerminalTokens
method:
protected override buildTerminalTokens(rules: Stream<GrammarAST.AbstractRule>): TokenType[] {
if (rules.some((rule: GrammarAST.AbstractRule) => this.ruleHasGreedyComment(rule))) {
const notGreedyRules: GrammarAST.AbstractRule[] = [];
const lastRules: GrammarAST.AbstractRule[] = [];
// put terminal rules with @greedy in their comment at the end of the array
rules.forEach((rule) => {
if (this.ruleHasGreedyComment(rule)) {
lastRules.push(rule);
} else {
notGreedyRules.push(rule);
}
});
return super.buildTerminalTokens(stream([...notGreedyRules, ...lastRules]));
} else {
return super.buildTerminalTokens(rules);
}
}
Co-authored-by: Ashley Engelund (weedySeaDragon @ github) <[email protected]> Signed-off-by: Reda Al Sulais <[email protected]>
@weedySeaDragon Thanks for your fast review! I have applied your suggestions locally, but one of the test cases is failing:
Anyway, I'm not entirely sure I have followed your suggestion correctly for this comment. |
@Yokozuna59 - it's failing because it's expecting the |
@weedySeaDragon Sorry to ask, but why should we annotate it with |
You're right -- it's not totally necessary for the grammar in If you add the |
I think the only way is to create sample grammar using
If both rules have the |
@weedySeaDragon I have created a separate test case for the |
@sidharthv96 @nirname @weedySeaDragon Can you please provide me with feedback regarding the current unresolved comments: |
@Yokozuna59 sorry for the long response, I'll skim through the questions ASAP |
43a6176
to
6aaa3ae
Compare
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
6aaa3ae
to
681fbd0
Compare
📑 Summary
Brief description about the content of your PR.
langium
instead ofjison
#4401📏 Design Decisions
Describe the way your implementation works or what design decisions you made if applicable.
📋 Tasks
Make sure you