Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conformance tests with JavaScript reference implementation #60

Open
skalee opened this issue Apr 8, 2021 · 4 comments
Open

Conformance tests with JavaScript reference implementation #60

skalee opened this issue Apr 8, 2021 · 4 comments

Comments

@skalee
Copy link

skalee commented Apr 8, 2021

I am thinking of adding some conformance checks which will ensure that converting given formula with AsciiMath gem gives the same result as converting it with AsciiMath's original JavaScript implementation. Unless there are some differences which have been introduced for purpose, of course…

The idea is to prepare a list of example AsciiMath formulas, the longer the better, then convert every single of them with both implementations, and then compare the results. The whole process could be written as follows:

for each formula in list_of_example_formulas
  ruby_retval = convert_with_ruby(formula)
  js_retval = convert_with_js(formula)
  assert_equal js_retval, ruby_retval
end

This kind of tests requires having some JavaScript runner in development environment (nodejs or mini_racer).

This is easy to implement and I can help with that. Hopefully, these tests will help to early detect bugs like #58.

@pepijnve
Copy link
Member

pepijnve commented Apr 8, 2021

I tried this in the past, but it turned out to be very difficult to generate exactly what the JS implementation produces. You really need to reimplement all the quirks of that implementation, some of which don't seem to make much sense. There are also places where we intentionally generate different MathML as well.

Instead of that I've made a manual approximation of that in https://github.com/asciidoctor/asciimath/blob/master/spec/parser_spec.rb
The expected MathML strings there were manually compared for equivalence against the output the JS implementation generates. Extra test cases for that test suite are more than welcome of course.

@andrewlock
Copy link

andrewlock commented Apr 2, 2024

I've been looking into AsciiMath recently, specifically I was looking for a parser for .NET so I threw together a basic port of the AsciiDoctor code. I also tried running against the asciimath.js unit tests from here: https://github.com/asciimath/asciimathml/blob/master/test/unittests.js

About 50% of the tests pass, and 50% fail. Trouble is, like you point out, it's quite a lot of effort to work out which of tests are expected/desirable and which aren't 😅

I've done some basic tests, and I think you can split them into a few broad categories. Not suggesting these need to be addressed, just documenting here for prosperity 🙂

Grouping differences

There are many cases where AsciiDoctor adds extra <mrow> compared to the reference, for example inside parentheses e.g. (2+3). Many of these seem benign, and don't significantly change the rendering as far as I can tell.

However there's also 2^(f_3(x)/5) which Asciidoctor renders as
image,

whereas the reference code renders as

image

Whether that's strictly a bug or a feature is likely more nuanced 😄

Issues with negative numbers

One good example is (-2)/-3. Asciidoctor renders as
image

whereas the reference code renders as

image

I did wonder whether the intention was for -3 to be parsed as a single number by the tokenizer given the when '-' here, even though AFAICT read_number always returns nil for '-'

when '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
read_number || read_symbol

Cases where the reference is lenient to spelling mistakes

There are some cases where the reference code "knows" additional symbols as synonyms for others

  • geq renders the same as ge
  • gte renders the same as as ge
  • leq renders the same as as le
  • lambda renders the same as lambda

These are the most clear cut "will not fix" candidates

Differences in symbol rendering

Differences in the rendering of symbols. Some of these seem reasonable, but for others I wonder which is more correct 🤔

Ascii AsciiDoctor Reference
- MINUS SIGN HYPHEN-MINUS
** ASTERISK ASTERISK OPERATOR
.../ldots HORIZONTAL ELLIPSIS ...
/ (Empty) SOLIDUS
sin <mi>sin</mi> <mo>sin</mo>
@ MEDIUM SMALL WHITE CIRCLE RING OPERATOR

The difference in <mi>/<mo> exists for all the trig functions, plus functions such as log and ln. Not sure which is "correct" though.

Differences in "invalid" rendering

There's some differences in how certain "incorrect" syntaxes are rendered. For example, 2^ renders asciidoctor as:

<math><mn>2</mn></math>

whereas the references produces:

<math><msup><mn>2</mn><mo></mo></msup></math>

This doesn't seem like a big deal seeing as the syntax is "invalid" anyway.

@pepijnve
Copy link
Member

pepijnve commented Apr 2, 2024

In general the issue is that there isn't really a formal grammar for asciimath. I based myself on the partial BNF grammar for this implementation. If you follow those rules, then afaict this parser parses 2^(f_3(x)/5) correctly. I put the grammar in ANTLR quickly to double check using this (also partial) grammar

grammar asciimath;

e : i e | i DIV i | i;
i : s UNDER s | s HAT s | s UNDER s HAT s | s;
s : v | l e r | u s | b s s;
v : ID | INT;
u : SQRT;
b : FRAC;
l : LPAREN;
r : RPAREN;

LPAREN : '(' ;
RPAREN : ')' ;
DIV : '/' ;
HAT : '^' ;
UNDER : '_' ;
SQRT : 'sqrt' ;
FRAC : 'frac' ;

INT : [0-9]+ ;
ID: [a-zA-Z] ;
WS: [ \t\n\r\f]+ -> skip ;

and that parses as

image

@pepijnve pepijnve closed this as completed Apr 2, 2024
@pepijnve pepijnve reopened this Apr 2, 2024
@pepijnve
Copy link
Member

pepijnve commented Apr 2, 2024

For the other issues most of these are clear bugs or improvements. It would be useful to split these up into distinct issues to make the changes a bit easier to track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants