string distance / fuzzy matching instead of hard substring keyword searching #9
Labels
enhancement
New feature or request
later
something that isn't the priority but it will be dealt with someday.
wontfix
This will not be worked on
It could be worthwhile to also implement some simple edit-distance based fuzzy typo allowance & fuzzy keyword matching might be set as well. And also, if a message contains too (many) characters not participating in valid words of the sentence, that would be a red flag.
Each room is limited to a single language in 99% of the cases, thus posting foreign spam is already a red flag. This is important in the dozens of local language rooms where the indiscriminate English spammer sometimes joins as well. But also, dictionaries exist (see your package manager, or Wiktionary, Wikipedia, etc). Or you could just go through the chat log to collect words and sentences used by non-troll members in the past (=ham) to help discriminate it from unusual content (spam).
The text was updated successfully, but these errors were encountered: