Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support async updating of search index #269

Open
valentijnscholten opened this issue Mar 15, 2020 · 1 comment
Open

Support async updating of search index #269

valentijnscholten opened this issue Mar 15, 2020 · 1 comment

Comments

@valentijnscholten
Copy link

valentijnscholten commented Mar 15, 2020

I'm using watson in a django app that has as one of its most important features the importing of files to turn them into database rows, i.e. Django ORM model instances.

Using bulk_create with django is problematic, especially in combination with MySQL due to the ids of the created objects being unknown. So I am thinking about ways to make the import faster, and one way would be to make the watson search index updates asynchronous.
An issue is that some model instances are updated (saved) multiple times within one transaction, triggering multiple watson updates.

My thoughts so far:

  • Make the post_save signal optional and allow the django app itself to update the index in the best way possible, i.e. some celery task already used by my app. This would need a (documented/supported) way to update one or more model instances. This would support deduplication of updates and could be asynchronous.
    Something similar could be achieved by wrapping the code in the skip_index_update decorator.

  • Then I found the (undocumented?) SearchContextMiddleware which already seems to deduplicate model updates within the same request and batches the index updates all together at the end of the request. This achieves deduplication, but is not yet asynchronous.

What possible solutions could be implemented?

Could there be some support in django-watson to support this scenario? Or would it make more sense that a django app just subclasses the middleware and wraps the search_context_manager.end() in a celery task?

Just thinking out loud here and maybe helping others trying to achieve the same.

@etianen
Copy link
Owner

etianen commented Mar 23, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants