Mailing List Archive: ANNOUNCE: Apache SpamAssassin 3.4.0 available

On behalf of the project, I am please to announce the availability of
Apache SpamAssassin version 3.4.0.

The Press Release is available on the ASF Blog at http://s.apache.org/G6b
& Release Notes follow. Downloads are available at
http://spamassassin.apache.org/downloads.cgi with some mirror issues
possible as mirrors continue to update for the new release.

Sincerely,

Kevin A. McGrail aka KAM
VP & Chair, ASF SpamAssassin Project

Release Notes -- Apache SpamAssassin -- Version 3.4.0

Introduction
------------

This is a major release. It introduces over two years of bug fixes and
features since the release of SpamAssassin 3.3.2 on June 16, 2011.
3.4.0 includes the Bayes Redis (http://redis.io/) back-end (bug 6879),
EDNS0 changes (bug 6910), native IPv6 support, numerous URIBL.pm changes
or features and a small API change in libspamc (bug 6562) with many other
subtle changes.

SpamAssassin was tested on perl 5.18.2, and (out of curiosity) also
on a Raspberry Pi (ARM6, Raspbian / Debian 7.2 Wheezy, perl 5.14.2)
... yes, it is 20 times slower compared to i7-960 CPU, but all tests
pass!

Overall, this release has been tested on many production-level
environments for nearly a year, including testing on an IPv6-only host.
It is highly recommended and stable.

NOTE: Complete changes are available at
http://svn.apache.org/repos/asf/spamassassin/branches/3.4/Changes

Notable Sendmail Bug
--------------------

Sendmail 8.14.5 and below contain a canonicalization misfeature / bug
that can cause DKIM failures.
See https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6462.

Compatibility with version 3.3.2
--------------------------------

* DNS queries generated by SpamAssassin now enable option EDNS0 in query
packets and specify a buffer size of 4096 bytes by default. This allows
DNS replies larger than 512 bytes to be returned in one UDP datagram,
avoiding a need for re-issuing a failed query over a TCP protocol. This
default setting is well suited if a DNS resolver (i.e. a recursive DNS
server) is located on the same LAN as a host running SpamAssassin, which
is the usual setup for all but perhaps some home uses of SpamAssassin.

The option should be disabled (by 'dns_options noedns0') when a recursive
DNS server is only reachable through some old-fashioned firewall or through
some picky router with deep packet inspection which bans DNS UDP messages
larger than 512 bytes, or blocks fragmented UDP datagrams.

The 'dns_options' setting is documented in Mail::SpamAssassin::Conf POD
or man page, more details in bug 6910 and bug 6862.

* A default setting for option 'dns_available' was changed from 'test' to
'yes' (bug 6770, bug 6769), so SpamAssassin now assumes by default that
it is running on a host with an internet connection and a working DNS
resolver. If this is not the case, please configure this option explicitly.

The change avoids surprises on an otherwise well connected host which may
experience a temporary DNS unavailability at the system startup time or a
temporary network outage when spamd was starting, and the initial failed
test would disable DNS queries permanently. The option is documented in
the Mail::SpamAssassin::Conf POD or man page.

* When Bayes classification is in use and messages are 'learned' as spam
or ham and stored in a database, the Bayes plugin generates internal
message IDs of learned messages and stores them in a 'seen' database to
avoid re-learning duplicates and accidental un-learning messages that
were not previously learned. With changes in bug 5185, the calculation
of message IDs in a bayes 'seen' database has changed, so new code can
no longer associate new messages with those learned before the change.

Note that this change does not affect recognition of old tokens and the
classification algorithm, only duplicate detection and unlearning of old
messages is affected.

Because of this change, if you use Bayes and you are upgrading from a
version prior to 3.4.0, you may consider wiping your Bayes database
and starting fresh.

However, this is not mandatory. If you choose to keep your current
database tokens, these are the ramifications:

1 - If you re-process emails that have already been learned before,
it will create duplicate entries because of the new msg_id format.
The duplicates will expire, eventually, and should cause minimal
impact unless it occurs frequently.

2 - If you try and unlearn or reclassify an email processed prior to the
upgrade, the system will be unable to do so because of the new msg_id
format. If unlearning a message (that was learned before the change)
is important, consider just clearing your Bayes store and starting
from scratch.

Dependency changes since version 3.3.2
--------------------------------------

Dependency on the following Perl modules were dropped: Net::Ident,
IP::Country::Fast and IP::Country.

Dependency on a perl module LWP::UserAgent as used by sa-update is now
made optional if any of programs curl, wget, or fetch are available.

New optional dependencies on the following Perl modules were introduced:

- new optional dependency on Geo::IP in a RelayCountry plugin (bug 6599);
for backward compatibility IP::Country::Fast is used if Geo::IP is
not installed

- new optional dependency on IO::Socket::IP for a cleaner IP support
regardless of a protocol family (IPv4 and IPv6)

- new optional dependency on Net::Patricia to speed up lookups on
internal_networks, trusted_networks or msa_networks when these lists
contain a larger number of entries

- new optional dependency on programs curl, wget, or a FreeBSD fetch.
sa-update will use any of these external programs to download rule
updates, either over IPv6 or over IPv4. Any of these three programs
suffices - the installation procedure is currently unclear on this,
its warning may be understood as if all three programs are needed,
which is not the case

- minimal required version of NetAddr::IP was bumped to 4.010

Internal changes potentially affecting third party software
using Mail::SpamAssassin library
-----------------------------------------------------------

A caller is now given a choice of calling srand() by itself (e.g. before
forking) or let a SpamAssassin library do it as before. Avoiding redundant
initialization of a perl's random number generator can prevent unnecessary
entropy loss. It is controlled by option skip_prng_reseeding in a call
to Mail::SpamAssassin::new(). The change was documented in bug 6690.

The Mail::SpamAssassin::parser can now accept a message also as a string
reference, avoiding one copy in memory. Documented in bug 6686.

A caller may pass the original mail body size to Mail::SpamAssassin::parse
through the suppl_attrib argument's field 'body_size'. This mail body size
is accessible to the eval rule check_body_length. It can be useful when a
caller only passes a truncated message to SpamAssassin. Documented in bug
6830.

A new plugin callback "prefork_init" was introduced, which should be called
by a master process (e.g. spamd) before forking multiple child processes.
For compatibility this call is currently optional, but recommended for new
versions. Currently only a Redis backend for Bayes checks will benefit from
being notified before a fork. Documented in bug 6942.

Notable bug fixes
-----------------

The sa-update program now avoids repeatedly downloading same rules if
subsequent unpacking of rules and updating fails. Documented in bug 6655.

Several incompatibilities with newer versions of a perl module Net::DNS
as used by sa-update and by the SpamAssassin library were fixed.
See Net::DNS problem [rt.cpan.org #83451].

A perl module Razor agent clobbers entropy of a random number generator by
re-initializing the generator on every call. The SpamAssassin Razor plugin
now provides a workaround, preserving entropy across calls to Razor2 agent.

A workaround in BayesStore/MySQL.pm was added for a MySQL server bug,
see http://bugs.mysql.com/bug.php?id=46675 .

Documentation was fixed: trailing dots in DNSBL zone names are not required
since version 3.1.0 of Mail::SpamAssassin (September 2005).

Notable features:
=================

Redis database backend for a Bayes database
-------------------------------------------

In addition to existing backends, the 3.4.0 introduces support for keeping
a Bayes database on a Redis server, either running locally, or accessed
over network. Similar to SQL backends, the database may be concurrently
used by several hosts running SpamAssassin.

The current implementation only supports a global Bayes database, i.e.
per-recipient sub-databases are not supported. The Redis 2.6.* server
supports access over IPv4 or over a Unix socket, starting with version
2.8.0 also IPv6 is supported. Bear in mind that Redis server only offers
limited access controls, so it is advisable to let the Redis server bind
to a loopback interface only, or to use other mechanisms to limit access,
such as local firewall rules.

The Redis backend for Bayes can put a Lua scripting support in a Redis
server to good use, improving performance. The Lua support is available
in Redis server since version 2.6. In absence of a Lua support, the Redis
backend uses batched (pipelined) traditional Redis commands, so it should
work with a Redis server version 2.4 (untested), although this is not
recommended for busy sites.

Expiration of token and 'seen' message id entries is left to the Redis
server. There is no provision for manually expiring a database, so it is
highly recommended to leave the setting bayes_auto_expire to its default
value 1 (i.e. enabled).

Example configuration:

bayes_store_module Mail::SpamAssassin::BayesStore::Redis
bayes_sql_dsn server=127.0.0.1:6379;password=foo;database=2
bayes_token_ttl 21d
bayes_seen_ttl 8d
bayes_auto_expire 1

Improved support for IPv6
-------------------------

The rules-updating program sa-update and its infrastructure is now usable
over either IPv4 or IPv6, including from an IPv6-only hosts (bug 6654).

SpamAssassin is now usable on an IPv6-only host: affects installation,
self-tests, rule updates, client, server, and a command-line spamassassin.

Command line options -4 and -6 were added to prefer/choose/force IPv4 or
IPv6 in programs spamassassin, spamd, spamc, and sa-update.

Command line options --listen and --allowed-ips in spamd can now accept
IPv6 addresses.

Preferably a perl module IO::Socket::IP is used (if it is available) for
network communication regardless of a protocol family - for DNS queries,
by spamd server side, and by a client code in Mail::SpamAssassin::Client.
As a fallback when the module IO::Socket::IP is unavailable, an older
module IO::Socket::INET6 is used, or eventually the IO::Socket::INET is
used as last resort.

If spamd fails to start with an 'Address already in use' message, please
install perl module IO::Socket::IP, or deintall IO::Socket::INET6, or
specify a socket bind address explicitly with a spamd --listen option.
See bug 6953 for details.

The spamd server can now simultaneously listen on multiple sockets,
possibly in different protocol domains (Unix sockets, INET or INET6
protocol families.

DnsResolver was updated allowing it to work on an IPv6-only host (bug 6653)

A plugin RelayCountry now uses module Geo::IP and its database of IPv6
addresses GEOIP_COUNTRY_EDITION_V6 when available.

The following configuration options were extended to accept IPv6 addresses:
dns_server, trusted_networks, internal_networks, msa_networks, (but not yet
the whitelist_from_rcvd), and their defaults were adjusted accordingly.

The parser code of Received header fields can now deal with IPv6 addresses
in a mail header section.

The AutoWhitelist plugin was updated and can now deal with IPv6 addresses.

Installation unit tests were updated to prevent them from failing on an
IPv6-only host.

New command-line options
------------------------

New command-line option for spamd: added an option --listen (or -i),
which can be specified multiple times and allows spamd to accept requests
over multiple INET (IPv4) or INET6 (IPv6) or UNIX sockets. See bug 6841,
and see also option --port.

New command-line option for spamc: -X (or --unavailable-tempfail) allows
spamc to return EX_TEMPFAIL instead of EX_UNAVAILABLE when using option -x.

As already noted in the 'Improved support for IPv6' section, options -4
and -6 were added to programs spamassassin, spamd, spamc, and sa-update.

The sa-update utility can now take multiple -v or --verbose options to
increase verbosity.

The sa-learn command has a new option --max-size .

New configuration options
-------------------------

Plugin/URIDNSBL: new tflags options 'a' and 'ns' were introduced. They are
documented in the Mail::SpamAssassin::Plugin::URIDNSBL POD or man page.

Plugin/AutoLearnThreshold: new option autolearn_force was added. It is
documented in the Mail::SpamAssassin::Plugin::AutoLearnThreshold POD or
man page.

Plugin/ASN: new options asn_prefix and clear_asn_lookups were added.
They are documented in Mail::SpamAssassin::Plugin::ASN POD or man page.

The following new options, as implemented by various plugins or by
other modules, are all documented in the Mail::SpamAssassin::Conf POD
or man page:

- Plugin/WLBLEval: new configuration options were added: enlist_uri_host,
delist_uri_host, with shorthands blacklist_uri_host and whitelist_uri_host
and an associated eval rule check_uri_host_listed.

- Configuration options dns_query_restriction (allow|deny) and
clear_dns_query_restriction were added (bug 6884).

- A 'dns_options' setting accepts new sub-options 'dns0x20' and 'edns'.

- Added option 'dns_server' which specifies an IP address of a recursive
DNS server (i.e. DNS resolver) and optionally its port number.

- Added options dns_local_ports_permit, dns_local_ports_avoid and
dns_local_ports_none to control source port local ranges available to
DNS queries

- Added the following sub-options to the tflags setting: autolearn_force,
maxhits=N, ips_only, domains_only, a, ns.

- The option whitelist_from_rcvd can now take an IP address as its second
argument (instead of a domain name), which can be useful for whitelisting
a sending mailer which has no reverse DNS mapping.

ArchiveIterator has new options opt_max_size and opt_from_regex. They are
documented in Mail::SpamAssassin::ArchiveIterator POD or man page.

A new tag (macro) _RULESVERSION_ was added. It expands to a comma-separated
list of rules versions, retrieved from an '# UPDATE version' comment in
rules files and can be used in an 'add_header' configuration setting.

New plugins
-----------

A new plugin AskDNS was introduced.

Using a DNS query template as specified in a parameter of an askdns rule,
the plugin replaces tag names as found in the template with their values
and launches DNS queries as soon as tag values become available. When DNS
responses trickle in, filters them according to the requested DNS resource
record type and an optional subrule filtering expression, yielding a rule
hit if a response meets filtering conditions.

Optimizations
-------------

Several smaller performance optimizations were introduced, among others:
bug 6508 (uses Net::Patricia if available), bug 6854 (base64 attachments),
bug 6915 (get_tag speedup).

The DNS client code module now caches queries and replies for the duration
of processing one mail message. Duplicate DNS queries by different rules
which happen to query the same DNS resource are now avoided.

Downloading and availability
----------------------------

Downloads are available from:

http://spamassassin.apache.org/downloads.cgi

md5sum of archive files:

46e99adc0affebbe5f3524b4834e0345 Mail-SpamAssassin-3.4.0.tar.bz2
5d0b50cee3bfa905cca35c33296c8c2a Mail-SpamAssassin-3.4.0.tar.gz
088a9b9bf7f3d93350f8c8920cbd2fe6 Mail-SpamAssassin-3.4.0.zip
9c15df55e9ec2a3c8376f3e15e448a2e Mail-SpamAssassin-rules-3.4.0.r1565117.tgz

sha1sum of archive files:

5bc66cd599cbe6a38a127d7813d4abc8af03b667 Mail-SpamAssassin-3.4.0.tar.bz2
4dac1384282b6201f7d80cea8295933ef08e7e28 Mail-SpamAssassin-3.4.0.tar.gz
3fa7715fb4c8b558b5fbc2e5a1288a751d8d12e3 Mail-SpamAssassin-3.4.0.zip
d71a64cab9f5454d3b164e44d3649bff9cb87f87
Mail-SpamAssassin-rules-3.4.0.r1565117.tgz

Note that the *-rules-*.tar.gz files are only necessary if you cannot,
or do not wish to, run "sa-update" after install to download the latest
fresh rules.

See the INSTALL and UPGRADE files in the distribution for important
installation notes.

GPG Verification Procedure
--------------------------
The release files also have a .asc accompanying them. The file serves
as an external GPG signature for the given release file. The signing
key is available via the wwwkeys.pgp.net key server, as well as
http://www.apache.org/dist/spamassassin/KEYS

The key information is:

pub 4096R/F7D39814 2009-12-02
Key fingerprint = D809 9BC7 9E17 D7E4 9BC2 1E31 FDE5 2F40 F7D3 9814
uid SpamAssassin Project Management Committee
<private@spamassassin.apache.org>
uid SpamAssassin Signing Key (Code Signing Key,
replacement for 1024D/265FA05B) <dev@spamassassin.apache.org>
sub 4096R/7B3265A5 2009-12-02

To verify a release file, download the file with the accompanying .asc
file and run the following commands:

gpg -v --keyserver wwwkeys.pgp.net --recv-key F7D39814
gpg --verify Mail-SpamAssassin-3.4.0.tar.bz2.asc
gpg --fingerprint F7D39814

Then verify that the key matches the signature.

Note that older versions of gnupg may not be able to complete the steps
above. Specifically, GnuPG v1.0.6, 1.0.7 & 1.2.6 failed while v1.4.11
worked flawlessly.

See http://www.apache.org/info/verification.html for more information
on verifying Apache releases.

About Apache SpamAssassin
-------------------------

Apache SpamAssassin is a mature, widely-deployed open source project
that serves as a mail filter to identify spam. SpamAssassin uses a
variety of mechanisms including mail header and text analysis, Bayesian
filtering, DNS blocklists, and collaborative filtering databases. In
addition, Apache SpamAssassin has a modular architecture that allows
other technologies to be quickly incorporated as an addition or as a
replacement for existing methods.

Apache SpamAssassin typically runs on a server, classifies and labels
spam before it reaches your mailbox, while allowing other components of
a mail system to act on its results.

Most of the Apache SpamAssassin is written in Perl, with heavily
traversed code paths carefully optimized. Benefits are portability,
robustness and facilitated maintenance. It can run on a wide variety of
POSIX platforms.

The server and the Perl library feels at home on Unix and Linux
platforms, and reportedly also works on MS Windows systems under ActivePerl.

For more information, visit http://spamassassin.apache.org/

About The Apache Software Foundation
------------------------------------

Established in 1999, The Apache Software Foundation provides
organizational, legal, and financial support for more than 100
freely-available, collaboratively-developed Open Source projects. The
pragmatic Apache License enables individual and commercial users to
easily deploy Apache software; the Foundation's intellectual property
framework limits the legal exposure of its 2,500+ contributors.

For more information, visit http://www.apache.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: announce-unsubscribe@spamassassin.apache.org
For additional commands, e-mail: announce-help@spamassassin.apache.org