Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to escape dollar signs #8

Open
eugenefischer opened this issue Jul 14, 2024 · 15 comments
Open

Unable to escape dollar signs #8

eugenefischer opened this issue Jul 14, 2024 · 15 comments

Comments

@eugenefischer
Copy link

I really appreciate this extension. I was wondering if it was possible to support escaping dollar signs within LaTeX strings? Currently, neither the default nor the brackets delimiter options seem to support \$ to render a dollar sign. I've been able to use \text{\textdollar} as a workaround, but I'm trying to create an interface where relatively nontechnical people can author math problems, many of which will talk about amounts of money. Being able to simply escape dollar signs with \$ would be very helpful.

@aarkue
Copy link
Owner

aarkue commented Jul 15, 2024

Thanks @eugenefischer !
I experimented with updating the regex, also mitigating some other potential future issues.

Could you try if the version 1.3.3-beta.1 from npm works well for you? The demo at https://aarkue.github.io/tiptap-math-extension/ is also updated.

I will need to do some more tests in the future before releasing this. Let me know if you run into any issues with that version!

@eugenefischer
Copy link
Author

Thanks for the quick beta! On initial testing, this seems to do what I need. I'll let you know if I run into any edge cases.

@eugenefischer
Copy link
Author

eugenefischer commented Jul 15, 2024

Actually, I did just hit one thing. I have my interface set to use bracket delimiters (since sometimes question authors might not wish to use LaTeX markup for questions involving money), but just as a test I switched back to dollar delimiters to look at the behavior. I think, with your new regex, it isn't possible to use LaTeX markup on decimal values when using dollar delimiters. So, someone using the default delimiter couldn't do something like:

Jordan walked $2.7$ miles to school, $1.1$ miles to the park after school, and then $1.5$ miles home. How many total miles did Jordan walk on these three trips?

I think a better way to check for spurious dollar delimiters is to require that the closing delimiter be preceded by a non-space character that isn't /, (, [, or {. You should probably also require that inline delimiters not be separated by line breaks. Those conditions should take care of all the standard English uses of $ that I can think of, anyway. (There would probably still be some edge cases... em-dashes, in prose come to mind. But at that point, someone should just use bracket delimiters anyway.)

@aarkue
Copy link
Owner

aarkue commented Jul 15, 2024

Good catch, thanks for testing!
I fixed the issues with the dot in expressions (e.g., $2.7$). Not so sure how this one slipped into the regex.
I also added some test cases which cover some of the examples you mentioned.

Something like One scoop of ice cream is $2 but two are only $3 should now also work nicely without the (unwanted) conversion to LaTeX.

I released a second beta 1.3.3-beta.2 and updated the demo at https://aarkue.github.io/tiptap-math-extension/ again.

Feel free to try out if it works well for you. I will do some more testing later and eventually create a new release.

@eugenefischer
Copy link
Author

Sweet, thanks again for the fast turnaround. I'll keep testing the updated beta, and let you know if I turn up anything else.

@eugenefischer
Copy link
Author

eugenefischer commented Jul 15, 2024

Okay, here's another case: with dollar delimiters, x \times 4 renders, but 5 \times 4 does not.

They both work with bracket delimiters, so I'm guessing the software is still trying to parse when dollar signs should be counted based on the character following the leading delimiter. It seems to me, though, that as long as I can have a legitimate math expression that begins with something that could also be an amount of money, that's going to cause problems. My intuition is still that ensuring that what precedes the trailing delimiter is a non-whitespace, non-open-grouping-symbol character will work better.

@aarkue
Copy link
Owner

aarkue commented Jul 16, 2024

Thanks, @eugenefischer, you are right!
I revised the used Regex once more and updated the demo and beta release (1.3.3-beta.5).
If you'd like, of course, feel free to try it out again. I also plan to work on more robust automated test cases in the future.

As I still expect that some edge cases escaped my (limited) tests, I will keep this issue open for now :)

@eugenefischer
Copy link
Author

Here's another edge case that fails with dollar delimiters:

I have $120 ($40 from my allowance and $80 from the card Grandma sent me on my birthday).

Currently, with dollar delimiters, the extension will try to parse 120 ( in the sentence above as LaTeX. (This is why I suggested checking for an open grouping symbol immediately before the trailing delimiter.)

@aarkue
Copy link
Owner

aarkue commented Jul 23, 2024

Thanks again for your input.
I addressed the issue and also had to work around some other problems (e.g., wrongly matching I have $120 ($40$) from the $120 part on).

I also added a few more test cases to cover these scenarios.

Version 1.3.3-beta.6 is released with these fixes and the demo is updated. You are welcome to test them as well! I plan to soon create a new non-beta release with these changes.

@eugenefischer
Copy link
Author

This is starting to look good! None of my initial round of tests broke anything. It did break on quoted text, though:

I gave Cynthia $5.00 and she said, "$5.00? That's really all you're good for?"

So double (") quotes are probably worth checking for before a closing delimeter at least. I don't think double quotes are used in math expressions as commonly as in prose, so the default should probably be to assume they aren't math. You could still get them in KaTeX with \text{\textquotedblleft} and \text{\textquotedblright}. You would have to watch out for escaped double quotes, because \"{a} is a legitimate LaTeX string to get an ä.

Single quotes are more of a question. British English often uses single quotes where American English uses double quotes, but single quotes are also more likely than double quotes to be in a LaTeX string, where they are used for accents and primes. Again, there's a shorthand for the prime usage. Instead of x' you can use x^{\prime}, but I can see that being annoying to people. You'd also, again, have to watch for escaped single quotes because \'{a} is how you get á in LaTeX.

I'm honestly not sure what the best way to handle single quotes is.

@aarkue
Copy link
Owner

aarkue commented Jul 29, 2024

I think a reasonable approach might be to only check that math expressions do not end in " (like in the nice example you gave). As an escape hatch, one could use { } around LaTeX math (e.g., ${"x"}$)

See https://regexr.com/843ub for some examples of how that would look. Of course, thorough handling of quotes would also be good, but I think for the majority of use cases this is sufficient.

As you mentioned, single quotes are more common in LaTeX, thus I would argue to not handle them specially.

@aldrinjenson
Copy link

Hi @aarkue which version should i install to get the latest changes mentioned in here

@aarkue
Copy link
Owner

aarkue commented Jan 3, 2025

Hi @aldrinjenson,
the latest changes (i.e., from my comment mentioning 1.3.3-beta.6) should be available from version 1.3.3 onwards.
If you still run into incorrect LaTeX detection, feel free to share the input text and I will try to look into it :)

@aldrinjenson
Copy link

Great. Thanks!

@aldrinjenson
Copy link

aldrinjenson commented Jan 4, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants