Mailing List Archive

DCC/pyzor questions
Hi,

I'm seeing a lot of DCC/pyzor mail being marked as spam that shouldn't
be, and want to see what can be done to prevent that.

For example, many emails with just an image attachment and an empty
body are hitting DCC. I thought I recalled a way to create a checksum
of these empty messages and add them to an allow list, but it seems it
is specific to the sender, based on /var/lib/dcc/testmsg-whitelist:

# empty Exchange
ok hex fuz1 e038b933 6003e07e 8e990536 110cfa90

How do I generate that signature? I've been unable to find any
instructions on how to do it. Same with pyzor?

Another example is an email I received from Pizza Hut. Their marketing
emails hit DCC and pyzor and sendgrid, making it very difficult for
that email to be delivered unless it also hits some negative bayes or
is allowlisted. Do people add them to the welcomelist? Do you train
marketing emails for bayes?

* 1.5 KAM_SENDGRID Sendgrid being exploited by scammers
* 0.3 DIGEST_MULTIPLE Message hits more than one network digest check
* 1.0 DCC_REPUT_95_98 DCC reputation between 95 and 98 % (mostly spam)
* 0.5 KAM_REALLYHUGEIMGSRC RAW: Spam with image tags with ridiculously
* huge http urls
* 1.4 PYZOR_CHECK Listed in Pyzor
* 3.0 BAYES_95 BODY: Bayes spam probability is 95 to 99%
* [score: 0.9668]
* 0.1 POISEN_SPAM_PILL_3 BODY: random spam to be learned in bayes

Is sendgrid still as big of a problem as it was a year ago?

There are a few negative rules, like TXREP and DKIMWL_WL and
RCVD_IN_SENDERSCORE_90_100, but someone really doesn't want Pizza Hut
email to be delivered.

Separately, is ExtractText broken? I have legitimate invoices that are
hitting multiple money rules. Is this the expected behavior? Any
advice on how to deal with it?
Re: DCC/pyzor questions [ In reply to ]
On Mon, Mar 14, 2022 at 08:15:49PM -0400, Alex wrote:
>
> How do I generate that signature? I've been unable to find any
> instructions on how to do it.

https://www.dcc-servers.net/dcc/dcc-tree/dcc.html

dccproc -CQ < message

Add to /var/dcc/whiteclnt

"Hex ctype cksum
starts with the string Hex followed a checksum type, and
a string of four hexadecimal numbers obtained from a DCC
log file or the dccproc(8) command using -CQ. The check-
sum type is body, Fuz1, or Fuz2 or one of the preceding
checksum types such as env_From."

> Same with pyzor?

pyzor local_whitelist < message
(which updates .pyzor/whitelist)

> Do you train marketing emails for bayes?

You teach Bayes either ham or spam. It makes no difference if it's
"marketing" or not. Just feed it.

> Separately, is ExtractText broken? I have legitimate invoices that are
> hitting multiple money rules. Is this the expected behavior? Any
> advice on how to deal with it?

Invoices contain money. ExtractText feeds the content to body rules. What
are you expecting to happen? Don't use it if it doesn't fit your profile.
Personally I don't think the concept of the plugin is good - body rules are
written with the expectation of hitting stuff from email body, not some
random attachments (which might even decode to garbage). But it's put out
there for you to decide.
Re: DCC/pyzor questions [ In reply to ]
On 14.03.22 20:15, Alex wrote:
>I'm seeing a lot of DCC/pyzor mail being marked as spam that shouldn't
>be, and want to see what can be done to prevent that.

DCC contains fuzzy checksums of bulk messages, which means they have been
seen on the internet multiple times. This includes common notifications
from big sites as social networks.
it is also possible to report message to DCC as bulk.

pyzor contains fuzzy checksums of messahes that have been reported multiple
times.

neither of these means messages are spam, but both indicate it might be.

unfortunately, short messages often hit, since the fuzzy checksums for short
messages may often match.

>For example, many emails with just an image attachment and an empty
>body are hitting DCC. I thought I recalled a way to create a checksum
>of these empty messages and add them to an allow list, but it seems it
>is specific to the sender, based on /var/lib/dcc/testmsg-whitelist:

># empty Exchange
>ok hex fuz1 e038b933 6003e07e 8e990536 110cfa90
>
>How do I generate that signature? I've been unable to find any
>instructions on how to do it. Same with pyzor?
>
>Another example is an email I received from Pizza Hut. Their marketing
>emails hit DCC and pyzor and sendgrid, making it very difficult for
>that email to be delivered unless it also hits some negative bayes or
>is allowlisted. Do people add them to the welcomelist? Do you train
>marketing emails for bayes?

I usually train many kinds of marketing messages so they don't hit BAYES_00
(BAYES_50 is usually OK) - marketing messages are very similar to typical
spam and hitting BAYES_00 may lower cause for real spam.

> * 1.5 KAM_SENDGRID Sendgrid being exploited by scammers
> * 0.3 DIGEST_MULTIPLE Message hits more than one network digest check
> * 1.0 DCC_REPUT_95_98 DCC reputation between 95 and 98 % (mostly spam)
> * 0.5 KAM_REALLYHUGEIMGSRC RAW: Spam with image tags with ridiculously
> * huge http urls
> * 1.4 PYZOR_CHECK Listed in Pyzor
> * 3.0 BAYES_95 BODY: Bayes spam probability is 95 to 99%
> * [score: 0.9668]
> * 0.1 POISEN_SPAM_PILL_3 BODY: random spam to be learned in bayes
>
>Is sendgrid still as big of a problem as it was a year ago?

if your wanted marketing messages hit BAYES_[89]*, simply train them as ham.

>There are a few negative rules, like TXREP and DKIMWL_WL and
>RCVD_IN_SENDERSCORE_90_100, but someone really doesn't want Pizza Hut
>email to be delivered.

btw I configured DKIMWL to be ignored when training, because these hit many
outlook/gmail spam.

>Separately, is ExtractText broken? I have legitimate invoices that are
>hitting multiple money rules. Is this the expected behavior? Any
>advice on how to deal with it?

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Fucking windows! Bring Bill Gates! (Southpark the movie)