New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

save_stories_from_feed performance improvement #22

Open

opme opened this issue Mar 1, 2023 · 0 comments

Labels

enhancement prio-low

opme commented Mar 1, 2023 •

edited

Loading

I was reading the code save_stories_from_feed in tasks.py and it looks to be making one database call per feed entry to check for duplicates.

normalized_url_exists could be replaced by a single call to the database to check all feed entries at once.

There could a function call getValidFeedEntries that would apply the logic existing in save_stories_from_feed that skips invalid entries.

Then a single database call to identify what is duplicate and then bulk insert and commit.

If it sounds reasonable I can give it a try. This looks to be the eventual bottleneck of this implementation?

rahulbot added enhancement prio-low labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment