performance bottleneck in line 975 of operations.py #1108

virophagesp · 2024-04-04T21:46:05Z

Line 975 in 91816f3

elif spamListCombinedRegex.search(combinedStringNormalized.lower()):

I am using auto-smart mode, I am unsure if this is machine dependent or the video I scanned. I benchmarked 2 copies of the function the first unaltered, the second without this elif statement. On average reply scanning was 30% faster
I changed the elif statement to print a debug message, to test how often the condition is true, it was only once in all the 3 hours of scanning
For both tests the video I scanned was the entire digital circus pilot https://www.youtube.com/watch?v=HwAPLk_sQ3w

What does this condition check?
Can this be removed?
Is there a faster way to check this condition?
Are there any conditions that guarantee that this will come out as false which can be checked before-hand to avoid running it in the first place?

virophagesp · 2024-04-04T23:22:22Z

if anyone other than Thiojoe knows what it does it does, information would be greatly appreciated

ThioJoe · 2024-04-05T17:02:19Z

Well that particular filter is by far the largest I believe. It basically combines all the hard coded individual spam accounts and stuff from this repo: https://github.com/ThioJoe/YT-Spam-Lists

I've considered pruning the old entries from that list but haven't gotten around to it.

On a related note, are you using the latest 2.18.0-Beta3 version? That one saves the compiled regex filters after loading them the first time, so at least it will be faster to load the filters before the scan starts.

virophagesp · 2024-04-05T19:37:40Z

yes, i am using the beta build, after testing so many modifications and
waiting for the filters to load i wish i used it sooner

virophagesp · 2024-04-05T20:17:41Z

wait a minute, in line 408 of prepare_modes.py it adds the names from the list and converts them to uppercase but in this slow code it searches for lowercase string

ThioJoe · 2024-04-06T21:12:55Z

wait a minute, in line 408 of prepare_modes.py it adds the names from the list and converts them to uppercase but in this slow code it searches for lowercase string

Yea there is a reason for that but it's a bit hard to explain. The confusables library basically makes a regex expression that will search for a string of characters including anything that even looks like the letters. So even though it is all made upper case, it will still look for lower case characters. But I found that it better covered the confusable characters to search for if I started with the string as upper case instead of lower case for whatever reason.

It's not like it's trying to search for upper case only patterns in the lower case'd comments.

virophagesp · 2024-04-06T23:16:36Z

I see

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance bottleneck in line 975 of operations.py #1108

performance bottleneck in line 975 of operations.py #1108

virophagesp commented Apr 4, 2024 •

edited

Loading

virophagesp commented Apr 4, 2024 •

edited

Loading

ThioJoe commented Apr 5, 2024

virophagesp commented Apr 5, 2024

virophagesp commented Apr 5, 2024 •

edited

Loading

ThioJoe commented Apr 6, 2024

virophagesp commented Apr 6, 2024

performance bottleneck in line 975 of operations.py #1108

performance bottleneck in line 975 of operations.py #1108

Comments

virophagesp commented Apr 4, 2024 • edited Loading

virophagesp commented Apr 4, 2024 • edited Loading

ThioJoe commented Apr 5, 2024

virophagesp commented Apr 5, 2024

virophagesp commented Apr 5, 2024 • edited Loading

ThioJoe commented Apr 6, 2024

virophagesp commented Apr 6, 2024

virophagesp commented Apr 4, 2024 •

edited

Loading

virophagesp commented Apr 4, 2024 •

edited

Loading

virophagesp commented Apr 5, 2024 •

edited

Loading