Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New users should have a short limit on their user profile descriptions #5098

Open
Firefishy opened this issue Aug 20, 2024 · 5 comments
Open

Comments

@Firefishy
Copy link
Member

Firefishy commented Aug 20, 2024

Problem

The user profile descriptions are commonly used by spammers to post spam.

Spammers often post extremely long descriptions within a few minutes of the user account being created.

Problems:

  • Take longer for admins to review.
  • Bad for OSM search engine ranking as the garbage text negatively affect OSM.org's keyword ranking / authority.
  • Waste of database resources.

Description

New users ( < 24 hours? ) should only be allowed to post short (512 character?) user description entries.

Screenshots

Screenshot 2024-08-20 at 09 15 23
@mmd-osm
Copy link
Contributor

mmd-osm commented Aug 20, 2024

Related: #4694 (avoiding huge texts in general)

@tomhughes
Copy link
Member

There are plenty of spammers who post short texts as well and any attempt to roll our own spam filtering is pretty much doomed to failure really - the only thing that will achieve anything is likely to be something like #4314 or using a shared service like akismet.

@Firefishy
Copy link
Member Author

There are plenty of spammers who post short texts

I don't disagree. But reducing the amount of text they can post helps with the reasons above and will likely help Bayes or other text qualifiers.

@AntonKhorev
Copy link
Collaborator

How would reducing the amount of text "help Bayes or other text qualifiers"? You've already made a classification here: long text && new account = likely spam.

@Firefishy
Copy link
Member Author

Firefishy commented Aug 20, 2024

How would reducing the amount of text "help Bayes or other text qualifiers"?

Spammers normally want to get particular words into their spam message. eg: "Buy Viagra Here" etc.
The rest of the text is often filler, AI generated or just garbage. A bayesian model would likely qualify "Buy", Viagra" & "Here" much higher than than the rest of the text. The additional garbage text might even cause the bayesian model to miss spam text. Hypothetical until tested ;-)

Regardless shorter descriptions helps admins (like me) sort through accounts and identify spam.

Discourse uses how quickly words were "typed" as a likely spam qualifier.

You've already made a classification here: long text && new account = likely spam.

There are MANY other classifications. eg: [email protected] is a very good qualifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants