Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement? Option to clamp to words, sentences or other tokens #86

Open
JorenSix opened this issue Apr 17, 2024 · 2 comments
Open

Enhancement? Option to clamp to words, sentences or other tokens #86

JorenSix opened this issue Apr 17, 2024 · 2 comments

Comments

@JorenSix
Copy link
Contributor

Hi Simon,

First of all thanks for developing this in the open! I am evaluating the text-annotator-js component to see if it fits for a project I am working on. It looks very promising.

One of the requirements for my project is that annotations should clamp to word boundaries. I have included the code below in the trimRange() method also mentioned in #66. Conceptually perhaps not the worst place to place such code. In relation to this I have a few questions:

  • Is there already a way to clamp to words/sentences I perhaps overlooked?
  • Is clamping functionality common enough to include in the annotator?
  • Is there a better place to modify a range based on application rules?
  • Would it be a good idea to add an optional callback to the annotator component which would then be used to clamp a range according to application rules?

Thanks in advance for considering these questions.

const word_boundaries = ['.',',',';',':','?','!',' ','\n','-'];
 
var charBeforeStart = startContainer.nodeValue.charAt(range.startOffset-1);
var newStartOffset = range.startOffset;
while(!word_boundaries.includes(charBeforeStart) && newStartOffset >= 0){
  newStartOffset = newStartOffset-1;
  charBeforeStart = startContainer.nodeValue.charAt(newStartOffset);
}
range.setStart(startContainer, newStartOffset);

var charAfterEnd = endContainer.nodeValue.charAt(range.endOffset);
var newEndOffset = range.endOffset;
while(!word_boundaries.includes(charAfterEnd) && newEndOffset < endContainer.nodeValue.length){
  newEndOffset = newEndOffset+1;
  charAfterEnd = endContainer.nodeValue.charAt(newEndOffset);
}
range.setEnd(endContainer, newEndOffset);
@rsimon
Copy link
Member

rsimon commented Apr 17, 2024

Hi,

no clamping to words is not - properly - implemented anywhere right now. (The .trimRange() method was supposed to do it, but was buggy, as @oleksandr-danylchenko writes above).

Nonetheless, I'd personally welcome such a feature! It should IMO be made a config option, and possibly off by default. But I think it would be useful.

But just to clarify: if I read your code correctly, your goal is to clamp outwards, right? I.e. I have the following:

Lorem ipsum dolor sit amet

Your use case is that I'd select orem ipsu, and the selection would clamp to lorem ipsum, right? On the other hand, if I select ipsum (with a leading white space), the selection would not clamp to ipsum? (Which is what the original trimeRange() method did.)

I think both would be useful. But, as I said, definitely as a config option.

@JorenSix
Copy link
Contributor Author

Thanks for the quick response!

The implementation is indeed outward clamping and the does indeed behave buggy if the selection starts with a space. Thanks for the notification. Perhaps there are other edge cases not covered as well e.g. with the non annotatable elements.

I do think the clamping (inward, outward, word boundary definitions) may be culture/application dependent. E.g. to count a - as a word boundry or not. Or to perhaps clamp annotations on sentence level. So I would suggest to configure it as a callback where the trimRange either does nothing (default?), is configured to use a best guess word level clamping, or a implemented in a user defined function. I think that would be the most flexible.

Thanks again! J

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants