-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
non-ascii tag is not parsed #58
Comments
Thanks for the report.
The regex for tag names is currently It must include also unicode characters but JavaScript regexes cannot do that. Only Java has such a character class. If we invert the regex like Other ideas? Add unicode ranges next to PS: It would be interesting how org mode does this. Maybe they have a special character class for unicode chars. |
Yes. Elisp has [:multibyte:]. Chinese is not parsed in another parser orgajs implemented by javascript. Maybe we can make org-parser support java only for now? p.s. Is this one useful? |
No. This would be very much against https://github.com/200ok-ch/org-parser/#what-does-this-project-do and https://github.com/200ok-ch/org-parser/#why-is-this-project-useful--rationale. Having said that, JavaScript has "Unicode property escapes" . Maybe we can use it for the > ":标签:".match(/\p{Letter}+/gu)
[ '标签' ] |
Looks like this also works as part of a 'regular' regular expression (pardon the pun).: > ":标签:".match(/[\p{Letter}0-9_@#%]+/gu)
[ '标签' ] |
@yqu212 Do you want to make your first PR and include Chinese characters by employing above Regexp for CLJS and the equivalent for CLJ? |
It's a good idea. However, it will taks some time to write the test since I am not familar with |
Looks, like it pays off that we doing tag extraction in the transformation, not EBNF ^^ |
Describe the bug
Non-ascii tag is not parsed.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
[org-parser "0.1.24"]
The text was updated successfully, but these errors were encountered: