Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance bottleneck in line 975 of operations.py #1108

Open
virophagesp opened this issue Apr 4, 2024 · 6 comments
Open

performance bottleneck in line 975 of operations.py #1108

virophagesp opened this issue Apr 4, 2024 · 6 comments

Comments

@virophagesp
Copy link

virophagesp commented Apr 4, 2024

elif spamListCombinedRegex.search(combinedStringNormalized.lower()):

I am using auto-smart mode, I am unsure if this is machine dependent or the video I scanned. I benchmarked 2 copies of the function the first unaltered, the second without this elif statement. On average reply scanning was 30% faster
I changed the elif statement to print a debug message, to test how often the condition is true, it was only once in all the 3 hours of scanning
For both tests the video I scanned was the entire digital circus pilot https://www.youtube.com/watch?v=HwAPLk_sQ3w

What does this condition check?
Can this be removed?
Is there a faster way to check this condition?
Are there any conditions that guarantee that this will come out as false which can be checked before-hand to avoid running it in the first place?

@virophagesp
Copy link
Author

virophagesp commented Apr 4, 2024

if anyone other than Thiojoe knows what it does it does, information would be greatly appreciated

@ThioJoe
Copy link
Owner

ThioJoe commented Apr 5, 2024

Well that particular filter is by far the largest I believe. It basically combines all the hard coded individual spam accounts and stuff from this repo: https://github.com/ThioJoe/YT-Spam-Lists

I've considered pruning the old entries from that list but haven't gotten around to it.

On a related note, are you using the latest 2.18.0-Beta3 version? That one saves the compiled regex filters after loading them the first time, so at least it will be faster to load the filters before the scan starts.

@virophagesp
Copy link
Author

yes, i am using the beta build, after testing so many modifications and
waiting for the filters to load i wish i used it sooner

@virophagesp
Copy link
Author

virophagesp commented Apr 5, 2024

wait a minute, in line 408 of prepare_modes.py it adds the names from the list and converts them to uppercase but in this slow code it searches for lowercase string

@ThioJoe
Copy link
Owner

ThioJoe commented Apr 6, 2024

wait a minute, in line 408 of prepare_modes.py it adds the names from the list and converts them to uppercase but in this slow code it searches for lowercase string

Yea there is a reason for that but it's a bit hard to explain. The confusables library basically makes a regex expression that will search for a string of characters including anything that even looks like the letters. So even though it is all made upper case, it will still look for lower case characters. But I found that it better covered the confusable characters to search for if I started with the string as upper case instead of lower case for whatever reason.

It's not like it's trying to search for upper case only patterns in the lower case'd comments.

@virophagesp
Copy link
Author

I see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants