Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mongo can handle larger databases now #174

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Halk0
Copy link

@Halk0 Halk0 commented Jan 13, 2023

Made changes to mongodb backend logic so that it no longer creates multiple queries to backend. Now mongodb backend constructs one filter from the request and queries data from mongodb just once. Previously multiple queries were made and filtering was done based on document IDs returned from those queries, this was bad practice since mongodb would have to go trough all documents multiple times filtering them using a huge list of IDs returned from previous queries. Using just one filter allows mongodb to handle the single query internally without having to match document IDs to a sometimes very large array of IDs.

Closes #173

Made changes to mongodb backend logic so that it no longer creates multiple
queries to backend. Now mongodb backend constructs one filter from the request
and queries data from mongodb just once. Previously multiple queries were made
and filtering was done based on document IDs returned from those queries, this
was bad practice since mongodb would have to go trough all documents multiple
times filtering them using a huge list of IDs returned from previous queries.
Using just one filter allows mongodb to handle the single query internally
without having to match document IDs to a sometimes very large array of IDs.
@CLAassistant
Copy link

CLAassistant commented Jan 13, 2023

CLA assistant check
All committers have signed the CLA.

@Halk0
Copy link
Author

Halk0 commented Jan 13, 2023

This pull request doesn't contain a test, since triggering the bug mentioned in Issue #173
requires generating and writing huge amounts of documents to mongodb, that would make
testing much slower and require alot of resources. If this test is seen as a necessary addition
I can write one.

@chisholm
Copy link
Contributor

Here is a sample of an alternative approach I came up with which uses windowing (this is for finding max versions):

[
    {
        "$setWindowFields": {
            "partitionBy": "$id",
            "output": {
                "_max_version": {
                    "$max": "$_manifest.version"
                }
            }
        }
    },
    {
        "$match": {
            "$expr": {
                "$eq": [
                    "$_manifest.version",
                    "$_max_version"
                ]
            }
        }
    },
    {
        "$unset": [
            "_max_version"
        ]
    }
]

I don't know how it would compare in terms of efficiency. Perhaps it is more efficient since it doesn't require sorting all of the documents first?

@scottpendlebury
Copy link

scottpendlebury commented Jan 19, 2023

I've been facing this exact same problem recently as I've had to account for a worst case configuration in Sentinel of having a connector set to pull all indicators once a minute, I've tested your change and the difference is like night and day. I'm able to pull ~11k indicators in 1.3 seconds now. Many thanks for this @Halk0

Changes to matching versions based on aproach @chisholm had taken.
Sorting documents while filtering versions was removed, this should
have a improving effect on performance. I also removed unnecessary loop
that I added, since it does the same thing as the line below it,
only difference being it was more complicated. This also should have
a performance increasing effect
@Halk0
Copy link
Author

Halk0 commented Jan 20, 2023

@chisholm thats nice, I made some changes based your approach!
I also noticed that in the first commit loop that was on lines 95-115 was completely
unnecessary, since lines below that (116-117) did the exact same thing, but it was much
more simple, and doesn't require loops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mongodb backend can't handle collections with alot of documents
4 participants