-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting buffers that can ignore mark requests #81
Comments
On a more technical note, as far as I understand, we would only need to make a small change to |
Let us know. That said, it seems peculiar that a regular expression would mark somewhere that you're not allowed to mark. Marking is below the level of the UI after all. |
Here's an example of a situation where a regular expression would mark where you don't want it to mark. Suppose you have the regular expression |
I don't understand your example. Is the issue that you don't want to match |
Yes, that my issue. I don't want a match to be successful in the middle of a user-perceived character. Most people would consider |
Okay, so that's not "supporting buffers that can ignore mark requests", that's noting a possible bug in the way the regexp engine works. Have you read the Unicode Consortium TR 18 document? I haven't looked at it recently, but what does it claim the correct behavior is supposed to be? |
(A quick scan of TR18 seems to indicate that what you want is an implementation of 2.2.1 Grapheme Cluster Mode but I'm not sure that's precisely what you're asking for.) |
What I'm precisely asking for: I want backtracking to be the only way to take a semantic action. I want sedlex to always mark when it is in a final DFA state, and always backtrack when it can no longer transition to new DFA states. The only case where we don't currently do this is when we are at a final DFA state and there are no transitions to new DFA states. In this case, we directly take a semantic action, instead of marking and then backtracking (see the code that #83 changes). Always using backtrack to perform semantic actions, allows custom |
My personal use case for this: I have a custom This does not give us Grapheme Cluster Mode. According to the last paragraph of 2.2.1, in Grapheme Cluster Mode, we should treat Supporting any of RL2.2 Extended Grapheme Clusters, RL2.3 Default Word Boundaries or Grapheme Cluster Mode is beyond the scope of I think always using backtracking for semantic actions is a middle ground where using custom |
Okay, given that what you wan't isn't Grapheme Cluster Mode, I have to confess, and perhaps it's just that I'm thick, that I don't understand what it is that you want to do, even having read the above a few times. Perhaps you could give an example of what the resulting syntax expressing your improvement would be in a real sedlex specification? |
Thanks for being so patient with me! I'm going to try to explain myself again. Suppose I have the following code: match%sedlex buf with
| "let" -> Token.LET
| "=" -> Token.EQ
| eof -> Token.EOF
| _ -> raise (Token.error @@ Sedlexing.positions buf) and the input stream For the above, sedlex will not call Here, instead of directly taking a semantic action at In my custom buffer, if we try to call If we had the input stream Instead of having sedlex worry about boundaries, I'm trying to move the burden to buffers. Currently, while a custom buffer can control what |
It can sometimes be useful for buffers ignore a mark action. For example, a lexbuffer can decide that it would be inappropriate to mark at the current position because it is not a grapheme cluster boundary.
Is this a use case that the maintainers of this project would be willing to support?
The text was updated successfully, but these errors were encountered: