XMPP Service Operators - 2021-03-19


  1. tom

    https://www.pyzor.org/en/latest/about.html

  2. tom

    » » » Discards all message headers. » If the message is greater than 4 lines in length: » » Discards the first 20% of the message. » Uses the next 3 lines. » Discards the next 40% of the message. » Uses the next 3 lines. » Discards the remainder of the message. » » Removes any ‘words’ (sequences of characters separated by whitespace) that are 10 or more characters long. » Removes anything that looks like an email address (X@Y). » Removes anything that looks like a URL. » Removes anything that looks like HTML tags. » Removes any whitespace. » Discards any lines that are fewer than 8 characters in length. » »

  3. tom

    Would this work for XMPP as well?

  4. tom

    This wouldn't trigger on 'hello' either

  5. tom

    » The central premise of Pyzor is that it converts an email message to a short digest that uniquely identifies the message. Simply hashing the entire message is an ineffective method of generating a digest, because message headers will differ when the content does not, and because spammers will often try to make a message unique by injecting random/unrelated text into their messages.

  6. tom

    Not would this exact piece of software work for XMPP as well, but would a similar implementation specificlly designed for XMPP work for fighting xmpp spam?

  7. MattJ

    My feeling is that advanced content analysis (whether manually-coded heuristics or machine-learning/"AI") won't work with IM, because there simply isn't enough content per message

  8. jonas’

    I tend to agree

  9. Holger

    Well actual spam often does contain quite a bit of content.

  10. Holger

    But yes relying on just that won't do the trick for all kinds of spam.

  11. tom

    Of course not

  12. tom

    But as part of a larger weighted system, like how spamassasin works to score with a bunch of milters maybe

  13. Holger

    Yup.

  14. Holger

    IMO we should classify based on as much data as possible, and while content alone won't be enough, it can certainly contribute to a score. The score of a "hello" message obviously won't be based on content at all but IMO that's no reasoning to ignore content altogether. E2EE can be a problem though.

  15. Holger

    (But it might also be an additional data point if the spammer's E2EE somehow differs from the user's.)

  16. moparisthebest

    tom, my spam dropped 99% when I started dropping any messages containing cyrillic

  17. tom

    lol

  18. Licaon_Kter

    moparisthebest: but what about your russian buddies?

  19. tom

    that's not really practical though

  20. Ge0rG

    moparisthebest: drop us-ascii and your spam wil go down by 100%

  21. moparisthebest

    obviously depending who your server user's talk to decides whether you can do that or not, but maybe it's a good heuristic

  22. moparisthebest

    none of my family can read them anyway so no loss :)

  23. tom

    drop all stazas

  24. tom

    trust noone, not even yourself

  25. Ge0rG

    my "family" consists of ~2000 active users, of which a significant minority is Russian.

  26. moparisthebest

    right I'm not proposing a XEP to forbid cyrillic from XMPP network wide, just a maybe-helpful heuristic depending on who your users are, useless in your case obviously :)

  27. tom

    well

  28. tom

    One of the things I did when setting up spamass was assign a higher spam possibility score for non- english, spanish, and german text

  29. Holger

    That's where user-trained Bayesian filters (with per-user DB) become useful ...

  30. frog

    Ge0rG: a bit offtopic, but I remember talking with you about DNS challenge for LE certs ages ago. What software did you use for this?

  31. Licaon_Kter

    frog: aren't all doing it? Eg. acme.sh

  32. frog

    I don't do it yet, still on http challenge. I wanted to see how others do it to migrate

  33. frog

    acme.sh was the most promising from my search, need to try it

  34. moparisthebest

    acme.sh is excellent

  35. moparisthebest

    I do DNS challenges with it

  36. tom

    moparisthebest: quark.c is a great companion to acme.sh if you don't already have a webserve