Improve performance for large documents with many annotations #130
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello and thank you for this wonderful project. It's provided some excellent shoulders to stand on.
Context
I'm extracting footnotes embedded in markdown and converting them annotations. Some of these markdown files have over 500k characters in them and have over 100 footnotes. After a quite circuitous route, I'm using mdast/hast/remark to convert the markdown into html and then loading the html into a jsdom Document.
The basic flow is like this:
describeTextQuote
to determine the selector for that nodeThe Problem
I found that extracting footnotes for some of the larger files was taking 7 - 10 minutes to process. Running a profiler, it looked like 70% of the time was spent determining if the node intersected the document/scope.
That call is happening when the node is being converted to a chunk, which happens many times, per annotation. It is also only being used to ensure that the node is apart of the document (as far as I can tell).
The Solution
This PR removes that check. It improved the performance on my machine by 75% for the large files.
Behaviorally I think it is the same. The two things which invoke
nodeToChunk
appear to be already checking if those nodes are a part of the scope.