Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save_stories_from_feed performance improvement #22

Open
opme opened this issue Mar 1, 2023 · 0 comments
Open

save_stories_from_feed performance improvement #22

opme opened this issue Mar 1, 2023 · 0 comments
Labels
enhancement New feature or request prio-low

Comments

@opme
Copy link

opme commented Mar 1, 2023

I was reading the code save_stories_from_feed in tasks.py and it looks to be making one database call per feed entry to check for duplicates.

normalized_url_exists could be replaced by a single call to the database to check all feed entries at once.

There could a function call getValidFeedEntries that would apply the logic existing in save_stories_from_feed that skips invalid entries.

Then a single database call to identify what is duplicate and then bulk insert and commit.

If it sounds reasonable I can give it a try. This looks to be the eventual bottleneck of this implementation?

@rahulbot rahulbot added enhancement New feature or request prio-low labels Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request prio-low
Projects
None yet
Development

No branches or pull requests

2 participants