Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement boolean preambles #204

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

yantar92
Copy link
Contributor

@yantar92 yantar92 commented Apr 5, 2021

I have been recently playing with idea of automatic preamble calculation for (or ...), (and ...) and similar queries.

If I understand correctly, preamble is dropped on master for any boolean expression, except (and ...). However, (or ...) could also be optimised if all the sexps inside or have non-nil predicates.

This pull request is a not-yet-ready-to-merge draft where I want to ask your opinion about this idea. I implemented boolean expressions (and ...), (or ...), (when ...), (unless ...), and (not ...) as actual predicates, so that preambles can be transparently defined.

Some benchmarks with elp on my setup with complex agenda queries:

;;Function number_of_invocations total_time average_time

;; Old (agenda n)
;;org-agenda 1 45.655149553 45.655149553
;;org-ql--select 2255 37.698226351 0.0167176170
;;re-search-forward 336625 6.2799195780 1.865...e-05
;;org-ql--query-preamble 2265 0.0108973820 4.811...e-06
;;org-ql--normalize-query 2265 0.0088714750 3.916...e-06

;; New (agenda n)
;;org-agenda 1 22.979330424 22.979330424
;;org-ql--select 1261 15.829318511 0.0125529885
;;re-search-forward 256703 4.8654712150 1.895...e-05
;;org-ql--normalize-query 1274 0.0073856340 5.797...e-06
;;org-ql--query-preamble 1274 0.0045016170 3.533...e-06

Note the difference in number of re-search-forward calls: 256703 (new) vs. 336625 (old). This is mostly because (or ...) queries can now have preamble. I have seen up to several times less calls depending on what kind of query I am using.

The caveat is (sometimes) increased time spend on each re-search-forward. Combined preamble for (or ...) consists of (rx (| individual-preambles))` increasing the complexity of preamble regexp. For very small queries involving many timestamps, regexp search sometimes ate up all the speed improvement as the time for each individual regexp search increased several times.

@yantar92
Copy link
Contributor Author

yantar92 commented Apr 5, 2021

Note: While looking into this subject, I noticed that tags-local predicate is inefficient if its preamble is unused (i.e. (or (tags-local "TAG") ...). tags-local body calls slow org-ql--tags-at.

@alphapapa
Copy link
Owner

This is very interesting, thanks. I hadn't thought of using the new defpred macro to define normalizer- and preamble-only versions of the booleans.

Assuming that everything works as intended, and we don't cause any edge cases to regress in performance, this is certainly an improvement I'd like to make. Here are a few thoughts from a quick look:

  1. I see you're calling a function, rec, which is defined with cl-labels in another function. Although it may work, I think it's at least an unintended result of the implementation (since cl-labels temporarily rebinds symbol functions), if not a violation of lexical scope (again, perhaps in spirit if not technically, in Elisp). In any case, I think it shouldn't be done that way, and it makes the code hard to understand (i.e. where the rec being called is defined).
  2. In that vein, I see some lexical binding-related warnings in the tests. Sometimes those do indicate actual problems, especially possible in light of the previous point.
  3. The tests also are not passing, but I'm not sure if it's due to this PR, or if that dash-functional obsolescence warning is causing it (it shouldn't, but I haven't checked).
  4. You mentioned optimizing timestamp regexps. Not that this would obviate this PR, but FYI, I have a branch that does that explicitly, and it works pretty well. See https://github.com/alphapapa/org-ql/tree/wip/ts-optimized-regexps I haven't had time to work on it for a little while, but I think it just needs a little more polishing and testing (I think I have some notes about it in the meta/notes branch).

Note: While looking into this subject, I noticed that tags-local predicate is inefficient if its preamble is unused (i.e. (or (tags-local "TAG") ...). tags-local body calls slow org-ql--tags-at.

I'm not sure what you mean. If its preamble is unused, what else would it do? And org-ql--tags-at is a caching function, which speeds up future local-tags queries on headings.

Thanks for your work on this.

@yantar92
Copy link
Contributor Author

yantar92 commented Apr 5, 2021

I see you're calling a function, rec, which is defined with cl-labels in another function.

I agree that it is difficult to read. But what would be an alternative to get preamble/normalization recursively? The best way I can see is documenting it as you did for ,predicate-names. Also, existing org-ql code calls rec directly in ancestors and parent predicates.

In that vein, I see some lexical binding-related warnings in the tests.

Those are generated by expansion of org-ql-defpred macro. Without :body, predicates generate (cl-defun ,fn-name ,args ,docstring nil). Of course, arguments are unused there.

The tests also are not passing

It is related to another idea I have not mentioned here.

I wanted to make use of predicates :body code moving point. For now, if predicate code moves the point, org-ql continues the search from the heading next to the new position. This might be useful for additional speedup. Say, outline-path predicate could search forward for regexp (rx '(or ,@strings)) if there is no match at point and otherwise let org-ql continue from next heading.

With current org-ql code the idea would not work though. If outline-path moves point, query like (not (outline-path...)) would give wrong result. So, I added save-excursion around predicates that must not affect point position. The result is failing tests because normalized queries do not match the expected result in tests.

Anyway, It is not reletad to this particular pull request and I will remove save-excursion code later.

See https://github.com/alphapapa/org-ql/tree/wip/ts-optimized-regexps

Intresting. Thanks for the pointer.

I'm not sure what you mean. If its preamble is unused, what else would it do? And org-ql--tags-at is a caching function, which speeds up future local-tags queries on headings.

org-ql--tags-at calculates inherited tags unconditionally. Inherited tags are not needed for tags-local preamble and, in fact, created bottleneck in one of my queries where I explicitly wanted to use tags-local for performance reasons.

@alphapapa
Copy link
Owner

I see you're calling a function, rec, which is defined with cl-labels in another function.

I agree that it is difficult to read. But what would be an alternative to get preamble/normalization recursively? The best way I can see is documenting it as you did for ,predicate-names. Also, existing org-ql code calls rec directly in ancestors and parent predicates.

Ah, you're right, I think I copied that from the old, giant pcase form, which was within the rec definition, when I split it into org-ql-defpred calls.

I'm not sure of the best way to handle that. It might be an acceptable transgression for us to keep using it this way, for now, anyway, but I should probably at least add relevant comments in the code. In the long run, it should probably be cleaned up.

In that vein, I see some lexical binding-related warnings in the tests.

Those are generated by expansion of org-ql-defpred macro. Without :body, predicates generate (cl-defun ,fn-name ,args ,docstring nil). Of course, arguments are unused there.

Hmm, I may be mistaken, because I did that work a while ago, but I don't think the existing code on master generates such warnings for existing predicates. So if this new approach of defining boolean predicates causes these warnings anew, we will have to find a way to fix them.

The tests also are not passing

It is related to another idea I have not mentioned here.

I wanted to make use of predicates :body code moving point. For now, if predicate code moves the point, org-ql continues the search from the heading next to the new position. This might be useful for additional speedup. Say, outline-path predicate could search forward for regexp (rx '(or ,@strings)) if there is no match at point and otherwise let org-ql continue from next heading.

With current org-ql code the idea would not work though. If outline-path moves point, query like (not (outline-path...)) would give wrong result. So, I added save-excursion around predicates that must not affect point position. The result is failing tests because normalized queries do not match the expected result in tests.

Anyway, It is not reletad to this particular pull request and I will remove save-excursion code later.

I would certainly appreciate keeping those changes separate. (Fair warning, I doubt I would approve of that idea. It's interesting, but it would significantly increase complexity and make mistakes much more likely. I don't think it would be worth it. Rather, I think optimization efforts should focus on using preambles better.)

See https://github.com/alphapapa/org-ql/tree/wip/ts-optimized-regexps

Intresting. Thanks for the pointer.

I'm not sure what you mean. If its preamble is unused, what else would it do? And org-ql--tags-at is a caching function, which speeds up future local-tags queries on headings.

org-ql--tags-at calculates inherited tags unconditionally. Inherited tags are not needed for tags-local preamble and, in fact, created bottleneck in one of my queries where I explicitly wanted to use tags-local for performance reasons.

I see. Well, 1. why are you disabling preambles? =) 2. Feel free to open an issue suggesting that that function be improved to only get inherited tags when necessary. That might be doable without too much complexity. OTOH, that would probably mean more branching in the common case, which might reduce performance generally, so it would need to be carefully benchmarked.

@yantar92
Copy link
Contributor Author

yantar92 commented Apr 6, 2021

I'm not sure of the best way to handle that. It might be an acceptable transgression for us to keep using it this way, for now, anyway, but I should probably at least add relevant comments in the code. In the long run, it should probably be cleaned up.

As tentative solution, I changed rec to more comprehensive org-ql-normalize-query and org-ql-query-preamble and updated org-ql-defpred docstring. See 59ce0f4.

So if this new approach of defining boolean predicates causes these warnings anew, we will have to find a way to fix them.

Yep. The warning were related to the new approach. I fixed the warnings now. Defpred (...) staff is just relevant when there is :body. Can as well use (&rest _) stub. See e875db0.

I would certainly appreciate keeping those changes separate.

Sure. Done.

Fair warning, I doubt I would approve of that idea. It's interesting, but it would significantly increase complexity and make mistakes much more likely. I don't think it would be worth it. Rather, I think optimization efforts should focus on using preambles better.

That's the only idea I came up with to speed up outline-path queries. I use those frequently and they are unacceptably slow now. Is there even a way to construct preamble for outline-path predicate?

I see. Well, 1. why are you disabling preambles? =)

In this particular case, it was the result of changes I made in this pull request. However, preambles would not be used on master in queries like (or (todo) (local-tags "blah")).

  1. Feel free to open an issue suggesting that that function be improved to only get inherited tags when necessary. That might be doable without too much complexity. OTOH, that would probably mean more branching in the common case, which might reduce performance generally, so it would need to be carefully benchmarked.

What about simply using normalizer:

(org-ql-defpred (tags-local local-tags tags-l ltags) (&rest tags)
  "Return non-nil if current heading's local tags include one or more of TAGS (a list of strings).
If TAGS is nil, return non-nil if heading has any local tags."
  :normalizers ((`(,predicate-names)
		 `(regexp ,(rx-to-string `(seq bol (1+ "*") (1+ space) (1+ not-newline)
					       (1+ space) ":" (1+ (any alnum "_@#%:")) ":"
                                               (0+ space)
                                               line-end)
                                         t)))
		(`(,predicate-names . ,tags)
                 `(regexp ,(rx-to-string `(seq bol (1+ "*") (1+ space) (1+ not-newline)
					       ":" (or ,@tags) ":"
                                               (optional (1+ space))
                                               line-end)
                                         t)))))

@yantar92 yantar92 marked this pull request as ready for review April 17, 2021 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants