-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support async updating of search index #269
Comments
It's an interesting idea. However, async updates feels a bit niche, and
there's so many possible frameworks to choose from it feels like it would
be little-used.
I wonder if there's much performance advantage to performing async index
updates. Given it's all in the same DB, it feels like batching it all in
the same transaction using SearchContextMiddleware is going to be pretty
optimal for more cases. If you need async updates, it's probably better to
save the primary models AND the watson models in the background task
together.
…On Sun, 15 Mar 2020 at 14:20, valentijnscholten ***@***.***> wrote:
I'm using watson in a django app that has as one of its most important
features the importing of files to turn them into database rows, i.e.
Django ORM model instances.
Using bulk_create with django is problematic, especially in combination
with MySQL due to the ids of the created objects being unknown. So I am
thinking about ways to make the import faster, and one way would be to make
the watson search index updates asynchronous.
An issue is that some model instances are updated (saved) multiple times
within one transaction, triggering multiple watson updates.
My thoughts so far:
-
Make the post_save signal optional and allow the django app itself to
update the index in the best way possible, i.e. some celery task already
used by my app. This would need a (documented/supported) way to update one
or more model instances. This would support deduplication of updates and
could be asynchronous.
-
Then I found the (undocumented?) SearchContextMiddleware which already
seems to deduplicate model updates within the same request and batches the
index updates all together at the end of the request. This achieves
deduplication, but is not yet asynchronous.
What possible solutions could be implemented?
Could there be some support in django-watson to support this scenario? Or
would it make more sense that a django app just subclasses the middleware
and wraps the search_context_manager.end() in a celery task?
Just thinking out loud here and maybe helping others trying to achieve the
same.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#269>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABEKCALB5XZK4K4HHD2VM3RHTP2JANCNFSM4LLAX7YQ>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm using watson in a django app that has as one of its most important features the importing of files to turn them into database rows, i.e. Django ORM model instances.
Using bulk_create with django is problematic, especially in combination with MySQL due to the ids of the created objects being unknown. So I am thinking about ways to make the import faster, and one way would be to make the watson search index updates asynchronous.
An issue is that some model instances are updated (saved) multiple times within one transaction, triggering multiple watson updates.
My thoughts so far:
Make the post_save signal optional and allow the django app itself to update the index in the best way possible, i.e. some celery task already used by my app. This would need a (documented/supported) way to update one or more model instances. This would support deduplication of updates and could be asynchronous.
Something similar could be achieved by wrapping the code in the
skip_index_update
decorator.Then I found the (undocumented?)
SearchContextMiddleware
which already seems to deduplicate model updates within the same request and batches the index updates all together at the end of the request. This achieves deduplication, but is not yet asynchronous.What possible solutions could be implemented?
Could there be some support in
django-watson
to support this scenario? Or would it make more sense that a django app just subclasses the middleware and wraps the search_context_manager.end() in a celery task?Just thinking out loud here and maybe helping others trying to achieve the same.
The text was updated successfully, but these errors were encountered: