Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string distance / fuzzy matching instead of hard substring keyword searching #9

Open
bkil opened this issue Sep 9, 2022 · 2 comments
Labels
enhancement New feature or request later something that isn't the priority but it will be dealt with someday. wontfix This will not be worked on

Comments

@bkil
Copy link

bkil commented Sep 9, 2022

It could be worthwhile to also implement some simple edit-distance based fuzzy typo allowance & fuzzy keyword matching might be set as well. And also, if a message contains too (many) characters not participating in valid words of the sentence, that would be a red flag.

Each room is limited to a single language in 99% of the cases, thus posting foreign spam is already a red flag. This is important in the dozens of local language rooms where the indiscriminate English spammer sometimes joins as well. But also, dictionaries exist (see your package manager, or Wiktionary, Wikipedia, etc). Or you could just go through the chat log to collect words and sentences used by non-troll members in the past (=ham) to help discriminate it from unusual content (spam).

@jjj333-p
Copy link
Owner

jjj333-p commented Sep 9, 2022

this is an interesting issue, but it is far beyond my skillset. I would however love to see something like this come through, and i would love for if someone else knows how to do this they could contribute

@jjj333-p jjj333-p added enhancement New feature or request help wanted Extra attention is needed later something that isn't the priority but it will be dealt with someday. labels Sep 9, 2022
@jjj333-p jjj333-p added wontfix This will not be worked on and removed help wanted Extra attention is needed later something that isn't the priority but it will be dealt with someday. labels Jan 7, 2024
@jjj333-p
Copy link
Owner

update, this might be doable in some manner, perhaps using string distance. still on the backburner but this might be the solution i to something

@jjj333-p jjj333-p changed the title Consider how we could protect against homograph attacks string distance / fuzzy matching instead of hard substring keyword searching May 31, 2024
@jjj333-p jjj333-p added later something that isn't the priority but it will be dealt with someday. wontfix This will not be worked on and removed wontfix This will not be worked on labels May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request later something that isn't the priority but it will be dealt with someday. wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants