Mailing List Archive

connbytes & 64bit counters
Hello,

It seems there is something wrong with connbytes and 64bit conters.

The "iptables" manual mention that counters are 64bit, so there should be
no problem with overflows, but it seems it might not be true. My firewall
puts long living ftp & http connections to a different TC class when they
reach 256MB, but aftear they reach 4GB (probably) they go back to the
default class, with no speed limit.

After some researches I found that ip_conntrack_counter structure defined
in nf_conntrack_common.h uses u_int32_t. I always thought that netfilter
has 64bit counters, hasn't it? And I'm quite sure it used to work when I
set up my firewall, about 1 year ago. Stange...

Best regards,

Krzysztof Olêdzki
Re: connbytes & 64bit counters [ In reply to ]
Krzysztof Oledzki wrote:
> Hello,
>
> It seems there is something wrong with connbytes and 64bit conters.
>
> The "iptables" manual mention that counters are 64bit, so there should
> be no problem with overflows, but it seems it might not be true. My
> firewall puts long living ftp & http connections to a different TC class
> when they reach 256MB, but aftear they reach 4GB (probably) they go back
> to the default class, with no speed limit.
>
> After some researches I found that ip_conntrack_counter structure
> defined in nf_conntrack_common.h uses u_int32_t. I always thought that
> netfilter has 64bit counters, hasn't it? And I'm quite sure it used to
> work when I set up my firewall, about 1 year ago. Stange...


It was changed to save some memory in struct ip_conntrack.
The idea was mainly that its only used for ctnetlink and
it is possible to send events before overflow. Obviously,
this wasn't true (besides the fact that events are unreliable).
Not sure what we should do about it ..
Re: connbytes & 64bit counters [ In reply to ]
On Thu, 26 Oct 2006, Patrick McHardy wrote:

> Krzysztof Oledzki wrote:
>> Hello,
>>
>> It seems there is something wrong with connbytes and 64bit conters.
>>
>> The "iptables" manual mention that counters are 64bit, so there should
>> be no problem with overflows, but it seems it might not be true. My
>> firewall puts long living ftp & http connections to a different TC class
>> when they reach 256MB, but aftear they reach 4GB (probably) they go back
>> to the default class, with no speed limit.
>>
>> After some researches I found that ip_conntrack_counter structure
>> defined in nf_conntrack_common.h uses u_int32_t. I always thought that
>> netfilter has 64bit counters, hasn't it? And I'm quite sure it used to
>> work when I set up my firewall, about 1 year ago. Stange...
>
>
> It was changed to save some memory in struct ip_conntrack.
> The idea was mainly that its only used for ctnetlink and
> it is possible to send events before overflow. Obviously,
> this wasn't true (besides the fact that events are unreliable).
> Not sure what we should do about it ..

How about attached (and inlined for easy review) patch?

[NETFILTER] Reimplementation of 64bit counters for bytes/packets accounting

Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for "connbytes" extension. Add possibility to choose
between 32 and 64bits but keep default 32bit counters.

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>

diff -Nur linux-2.6.19-rc3-orig/include/linux/netfilter/nf_conntrack_common.h linux-2.6.19-rc3/include/linux/netfilter/nf_conntrack_common.h
--- linux-2.6.19-rc3-orig/include/linux/netfilter/nf_conntrack_common.h 2006-10-24 01:02:02.000000000 +0200
+++ linux-2.6.19-rc3/include/linux/netfilter/nf_conntrack_common.h 2006-10-28 17:42:48.000000000 +0200
@@ -139,8 +139,13 @@
#ifdef __KERNEL__
struct ip_conntrack_counter
{
+#if defined(CONFIG_IP_NF_CT_ACCT_64BIT_COUNTERS) || defined(CONFIG_NF_CT_ACCT_64BIT_COUNTERS)
+ u_int64_t packets;
+ u_int64_t bytes;
+#else
u_int32_t packets;
u_int32_t bytes;
+#endif
};

struct ip_conntrack_stat
diff -Nur linux-2.6.19-rc3-orig/include/linux/netfilter/nfnetlink_conntrack.h linux-2.6.19-rc3/include/linux/netfilter/nfnetlink_conntrack.h
--- linux-2.6.19-rc3-orig/include/linux/netfilter/nfnetlink_conntrack.h 2006-10-24 01:02:02.000000000 +0200
+++ linux-2.6.19-rc3/include/linux/netfilter/nfnetlink_conntrack.h 2006-10-25 14:55:19.000000000 +0200
@@ -89,8 +89,8 @@

enum ctattr_counters {
CTA_COUNTERS_UNSPEC,
- CTA_COUNTERS_PACKETS, /* old 64bit counters */
- CTA_COUNTERS_BYTES, /* old 64bit counters */
+ CTA_COUNTERS_PACKETS, /* optional 64bit counters */
+ CTA_COUNTERS_BYTES, /* optional 64bit counters */
CTA_COUNTERS32_PACKETS,
CTA_COUNTERS32_BYTES,
__CTA_COUNTERS_MAX
diff -Nur linux-2.6.19-rc3-orig/net/ipv4/netfilter/Kconfig linux-2.6.19-rc3/net/ipv4/netfilter/Kconfig
--- linux-2.6.19-rc3-orig/net/ipv4/netfilter/Kconfig 2006-10-24 01:02:02.000000000 +0200
+++ linux-2.6.19-rc3/net/ipv4/netfilter/Kconfig 2006-10-28 18:44:16.000000000 +0200
@@ -46,6 +46,21 @@

If unsure, say `N'.

+config IP_NF_CT_ACCT_64BIT_COUNTERS
+ bool 'Enable 64bit counters for conntrack-based accounting'
+ depends on IP_NF_CT_ACCT
+ default n
+ help
+ Use 64 bit counters for bytes/packets accounting instead of default 32.
+
+ This will make every connection entry larger by 16 bytes, enlarging
+ it by about 5%.
+
+ Say Y if you have plenty of RAM and want to be able to detect
+ long-lived connections, using for example "connbytes" extension.
+
+ If unsure, say N.
+
config IP_NF_CONNTRACK_MARK
bool 'Connection mark tracking support'
depends on IP_NF_CONNTRACK
diff -Nur linux-2.6.19-rc3-orig/net/ipv4/netfilter/ip_conntrack_core.c linux-2.6.19-rc3/net/ipv4/netfilter/ip_conntrack_core.c
--- linux-2.6.19-rc3-orig/net/ipv4/netfilter/ip_conntrack_core.c 2006-10-24 01:02:02.000000000 +0200
+++ linux-2.6.19-rc3/net/ipv4/netfilter/ip_conntrack_core.c 2006-10-28 17:44:28.000000000 +0200
@@ -1148,9 +1148,15 @@
ct->counters[CTINFO2DIR(ctinfo)].packets++;
ct->counters[CTINFO2DIR(ctinfo)].bytes +=
ntohs(skb->nh.iph->tot_len);
+#ifdef CONFIG_IP_NF_CT_ACCT_64BIT_COUNTERS
+ if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x8000000000000000LL)
+ || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x8000000000000000LL))
+ event |= IPCT_COUNTER_FILLING;
+#else
if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
|| (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
event |= IPCT_COUNTER_FILLING;
+#endif
}
#endif

diff -Nur linux-2.6.19-rc3-orig/net/ipv4/netfilter/ip_conntrack_netlink.c linux-2.6.19-rc3/net/ipv4/netfilter/ip_conntrack_netlink.c
--- linux-2.6.19-rc3-orig/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-10-24 01:02:02.000000000 +0200
+++ linux-2.6.19-rc3/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-10-28 18:23:06.000000000 +0200
@@ -185,6 +185,15 @@
{
enum ctattr_type type = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG;
struct nfattr *nest_count = NFA_NEST(skb, type);
+#ifdef CONFIG_IP_NF_CT_ACCT_64BIT_COUNTERS
+ __be64 tmp;
+
+ tmp = cpu_to_be64(ct->counters[dir].packets);
+ NFA_PUT(skb, CTA_COUNTERS_PACKETS, sizeof(__be64), &tmp);
+
+ tmp = cpu_to_be64(ct->counters[dir].bytes);
+ NFA_PUT(skb, CTA_COUNTERS_BYTES, sizeof(__be64), &tmp);
+#else
__be32 tmp;

tmp = htonl(ct->counters[dir].packets);
@@ -192,6 +201,7 @@

tmp = htonl(ct->counters[dir].bytes);
NFA_PUT(skb, CTA_COUNTERS32_BYTES, sizeof(__be32), &tmp);
+#endif

NFA_NEST_END(skb, nest_count);

diff -Nur linux-2.6.19-rc3-orig/net/netfilter/Kconfig linux-2.6.19-rc3/net/netfilter/Kconfig
--- linux-2.6.19-rc3-orig/net/netfilter/Kconfig 2006-10-24 01:02:02.000000000 +0200
+++ linux-2.6.19-rc3/net/netfilter/Kconfig 2006-10-28 18:43:54.000000000 +0200
@@ -51,6 +51,21 @@

If unsure, say `N'.

+config NF_CT_ACCT_64BIT_COUNTERS
+ bool 'Enable 64bit counters for conntrack-based accounting'
+ depends on NF_CT_ACCT
+ default n
+ help
+ Use 64 bit counters for bytes/packets accounting instead of default 32.
+
+ This will make every connection entry larger by 16 bytes, enlarging
+ it by about 5%.
+
+ Say Y if you have plenty of RAM and want to be able to detect
+ long-lived connections, using for example "connbytes" extension.
+
+ If unsure, say N.
+
config NF_CONNTRACK_MARK
bool 'Connection mark tracking support'
depends on NF_CONNTRACK
diff -Nur linux-2.6.19-rc3-orig/net/netfilter/nf_conntrack_core.c linux-2.6.19-rc3/net/netfilter/nf_conntrack_core.c
--- linux-2.6.19-rc3-orig/net/netfilter/nf_conntrack_core.c 2006-10-24 01:02:02.000000000 +0200
+++ linux-2.6.19-rc3/net/netfilter/nf_conntrack_core.c 2006-10-28 17:44:11.000000000 +0200
@@ -1412,9 +1412,15 @@
ct->counters[CTINFO2DIR(ctinfo)].packets++;
ct->counters[CTINFO2DIR(ctinfo)].bytes +=
skb->len - (unsigned int)(skb->nh.raw - skb->data);
+#ifdef CONFIG_NF_CT_ACCT_64BIT_COUNTERS
+ if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x8000000000000000LL)
+ || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x8000000000000000LL))
+ event |= IPCT_COUNTER_FILLING;
+#else
if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
|| (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
event |= IPCT_COUNTER_FILLING;
+#endif
}
#endif

diff -Nur linux-2.6.19-rc3-orig/net/netfilter/nf_conntrack_netlink.c linux-2.6.19-rc3/net/netfilter/nf_conntrack_netlink.c
--- linux-2.6.19-rc3-orig/net/netfilter/nf_conntrack_netlink.c 2006-10-24 01:02:02.000000000 +0200
+++ linux-2.6.19-rc3/net/netfilter/nf_conntrack_netlink.c 2006-10-28 18:24:12.000000000 +0200
@@ -194,13 +194,23 @@
{
enum ctattr_type type = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG;
struct nfattr *nest_count = NFA_NEST(skb, type);
- u_int32_t tmp;
+#ifdef CONFIG_NF_CT_ACCT_64BIT_COUNTERS
+ __be64 tmp;
+
+ tmp = cpu_to_be64(ct->counters[dir].packets);
+ NFA_PUT(skb, CTA_COUNTERS_PACKETS, sizeof(__be64), &tmp);
+
+ tmp = cpu_to_be64(ct->counters[dir].bytes);
+ NFA_PUT(skb, CTA_COUNTERS_BYTES, sizeof(__be64), &tmp);
+#else
+ __be32 tmp;

tmp = htonl(ct->counters[dir].packets);
- NFA_PUT(skb, CTA_COUNTERS32_PACKETS, sizeof(u_int32_t), &tmp);
+ NFA_PUT(skb, CTA_COUNTERS32_PACKETS, sizeof(__be32), &tmp);

tmp = htonl(ct->counters[dir].bytes);
- NFA_PUT(skb, CTA_COUNTERS32_BYTES, sizeof(u_int32_t), &tmp);
+ NFA_PUT(skb, CTA_COUNTERS32_BYTES, sizeof(__be32), &tmp);
+#endif

NFA_NEST_END(skb, nest_count);

Best regards,

Krzysztof Olêdzki
Re: connbytes & 64bit counters [ In reply to ]
* Krzysztof Oledzki wrote, On 28/10/06 17:48:
> On Thu, 26 Oct 2006, Patrick McHardy wrote:
>> It was changed to save some memory in struct ip_conntrack.
>> The idea was mainly that its only used for ctnetlink and
>> it is possible to send events before overflow. Obviously,
>> this wasn't true (besides the fact that events are unreliable).
>> Not sure what we should do about it ..
>
> How about attached (and inlined for easy review) patch?

I prefer 64 bit counters and am going to have to put them back; so
thanks for this.

Sam
Re: connbytes & 64bit counters [ In reply to ]
* Krzysztof Oledzki wrote, On 28/10/06 17:48:
> On Thu, 26 Oct 2006, Patrick McHardy wrote:
>> It was changed to save some memory in struct ip_conntrack.
>> The idea was mainly that its only used for ctnetlink and
>> it is possible to send events before overflow. Obviously,
>> this wasn't true (besides the fact that events are unreliable).
>> Not sure what we should do about it ..
>
> How about attached (and inlined for easy review) patch?

I prefer 64 bit counters and am going to have to put them back; so
thanks for this.

Sam
Re: connbytes & 64bit counters [ In reply to ]
Krzysztof Oledzki wrote:
> [NETFILTER] Reimplementation of 64bit counters for bytes/packets accounting
>
> Initially netfilter has had 64bit counters for conntrack-based
> accounting, but
> it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit
> counters are
> still required, for example for "connbytes" extension. Add possibility
> to choose
> between 32 and 64bits but keep default 32bit counters.
>
> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
>
> +++ linux-2.6.19-rc3/net/netfilter/nf_conntrack_netlink.c 2006-10-28
> 18:24:12.000000000 +0200
> @@ -194,13 +194,23 @@
> {
> enum ctattr_type type = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG;
> struct nfattr *nest_count = NFA_NEST(skb, type);
> - u_int32_t tmp;
> +#ifdef CONFIG_NF_CT_ACCT_64BIT_COUNTERS
> + __be64 tmp;
> +
> + tmp = cpu_to_be64(ct->counters[dir].packets);
> + NFA_PUT(skb, CTA_COUNTERS_PACKETS, sizeof(__be64), &tmp);
> +
> + tmp = cpu_to_be64(ct->counters[dir].bytes);
> + NFA_PUT(skb, CTA_COUNTERS_BYTES, sizeof(__be64), &tmp);
> +#else


We can't make the API depend on config options (and I don't
like all those ifdefs). For now I would suggest to keep the
ctnetlink interface as it is and use 64 bit counters either
unconditionally or only when the connbytes match is enabled.
Re: connbytes & 64bit counters [ In reply to ]
On Mon, 30 Oct 2006, Patrick McHardy wrote:

> Krzysztof Oledzki wrote:
>> [NETFILTER] Reimplementation of 64bit counters for bytes/packets accounting
>>
>> Initially netfilter has had 64bit counters for conntrack-based
>> accounting, but
>> it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit
>> counters are
>> still required, for example for "connbytes" extension. Add possibility
>> to choose
>> between 32 and 64bits but keep default 32bit counters.
>>
>> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
>>
>> +++ linux-2.6.19-rc3/net/netfilter/nf_conntrack_netlink.c 2006-10-28
>> 18:24:12.000000000 +0200
>> @@ -194,13 +194,23 @@
>> {
>> enum ctattr_type type = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG;
>> struct nfattr *nest_count = NFA_NEST(skb, type);
>> - u_int32_t tmp;
>> +#ifdef CONFIG_NF_CT_ACCT_64BIT_COUNTERS
>> + __be64 tmp;
>> +
>> + tmp = cpu_to_be64(ct->counters[dir].packets);
>> + NFA_PUT(skb, CTA_COUNTERS_PACKETS, sizeof(__be64), &tmp);
>> +
>> + tmp = cpu_to_be64(ct->counters[dir].bytes);
>> + NFA_PUT(skb, CTA_COUNTERS_BYTES, sizeof(__be64), &tmp);
>> +#else
>
>
> We can't make the API depend on config options
But this does not change the API imho, and this situation is perfectly
handled by libnetfilter_conntrack:

static void nfct_parse_counters(struct nfattr *attr,
struct nfct_conntrack *ct,
enum ctattr_type parent)
{
(...)
if (tb[CTA_COUNTERS_PACKETS-1])
ct->counters[dir].packets
= __be64_to_cpu(*(u_int64_t *)
NFA_DATA(tb[CTA_COUNTERS_PACKETS-1]));
if (tb[CTA_COUNTERS_BYTES-1])
ct->counters[dir].bytes
= __be64_to_cpu(*(u_int64_t *)
NFA_DATA(tb[CTA_COUNTERS_BYTES-1]));
if (tb[CTA_COUNTERS32_PACKETS-1])
ct->counters[dir].packets
= htonl(*(u_int32_t *)
NFA_DATA(tb[CTA_COUNTERS32_PACKETS-1]));
if (tb[CTA_COUNTERS32_BYTES-1])
ct->counters[dir].bytes
= htonl(*(u_int32_t *)
NFA_DATA(tb[CTA_COUNTERS32_BYTES-1]));
(...)

> (and I don't like all those ifdefs).
Why?

> For now I would suggest to keep the ctnetlink interface as it is and use
> 64 bit counters either unconditionally or only when the connbytes match
> is enabled.

What is wrong in sending 64 bit counters to userspace if we already have
64 bit counters in kernel?

Best regards,

Krzysztof Olêdzki
Re: connbytes & 64bit counters [ In reply to ]
* Patrick McHardy wrote, On 30/10/06 15:54:
> We can't make the API depend on config options (and I don't
> like all those ifdefs). For now I would suggest to keep the
> ctnetlink interface as it is and use 64 bit counters either
> unconditionally or only when the connbytes match is enabled.

I like: unconditionally.
In my experience we can save more kernel space by reducing the default
conntrack timeout than by reducing counter size.

Sam
(who remembered to remove the gmane newsgroup to avoid posting twice)
Re: connbytes & 64bit counters [ In reply to ]
Krzysztof Oledzki wrote:
>
>
> On Mon, 30 Oct 2006, Patrick McHardy wrote:
>
>> Krzysztof Oledzki wrote:
>>
>>> [NETFILTER] Reimplementation of 64bit counters for bytes/packets
>>> accounting
>>>
>>> Initially netfilter has had 64bit counters for conntrack-based
>>> accounting, but
>>> it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit
>>> counters are
>>> still required, for example for "connbytes" extension. Add possibility
>>> to choose
>>> between 32 and 64bits but keep default 32bit counters.
>>>
>>> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
>>>
>>> +++ linux-2.6.19-rc3/net/netfilter/nf_conntrack_netlink.c 2006-10-28
>>> 18:24:12.000000000 +0200
>>> @@ -194,13 +194,23 @@
>>> {
>>> enum ctattr_type type = dir ? CTA_COUNTERS_REPLY:
>>> CTA_COUNTERS_ORIG;
>>> struct nfattr *nest_count = NFA_NEST(skb, type);
>>> - u_int32_t tmp;
>>> +#ifdef CONFIG_NF_CT_ACCT_64BIT_COUNTERS
>>> + __be64 tmp;
>>> +
>>> + tmp = cpu_to_be64(ct->counters[dir].packets);
>>> + NFA_PUT(skb, CTA_COUNTERS_PACKETS, sizeof(__be64), &tmp);
>>> +
>>> + tmp = cpu_to_be64(ct->counters[dir].bytes);
>>> + NFA_PUT(skb, CTA_COUNTERS_BYTES, sizeof(__be64), &tmp);
>>> +#else
>>
>>
>> We can't make the API depend on config options
>
> But this does not change the API imho, and this situation is perfectly
> handled by libnetfilter_conntrack:


It does change the API since we currently use u32. But you're
right that it shouldn't break anything (even if something
besides libnetfilter_conntrack uses the raw attributes) since
we use big endian.

>> (and I don't like all those ifdefs).
>
> Why?

Because they make the code ugly and unreadable.

>> For now I would suggest to keep the ctnetlink interface as it is and
>> use 64 bit counters either unconditionally or only when the connbytes
>> match is enabled.
>
>
> What is wrong in sending 64 bit counters to userspace if we already have
> 64 bit counters in kernel?

Nothing - but changing the API based on config options is bad design.
I am fine with sending 64 bit unconditionally. But you need to make
sure you don't send (32 bit) overflow events to userspace anymore.

Mhh .. this hole thing is a mess:

enum ctattr_counters {
CTA_COUNTERS_UNSPEC,
CTA_COUNTERS_PACKETS, /* old 64bit counters */
CTA_COUNTERS_BYTES, /* old 64bit counters */
CTA_COUNTERS32_PACKETS,
CTA_COUNTERS32_BYTES,

__CTA_COUNTERS_MAX
};

So apparently we already broke compatibility. My prefered solution would
be to get rid of this mess and return to 64 bit counters unconditionally
and everywhere.
Re: connbytes & 64bit counters [ In reply to ]
Patrick McHardy wrote:
>>What is wrong in sending 64 bit counters to userspace if we already have
>>64 bit counters in kernel?
>
> Nothing - but changing the API based on config options is bad design.
> I am fine with sending 64 bit unconditionally. But you need to make
> sure you don't send (32 bit) overflow events to userspace anymore.

They could be used to notify 64 bits overflow events. Userspace can
differenciate if the kernel is using 32 or 64 bits and interpret the
overflow event appropiately.

> Mhh .. this hole thing is a mess:
>
> enum ctattr_counters {
> CTA_COUNTERS_UNSPEC,
> CTA_COUNTERS_PACKETS, /* old 64bit counters */
> CTA_COUNTERS_BYTES, /* old 64bit counters */
> CTA_COUNTERS32_PACKETS,
> CTA_COUNTERS32_BYTES,
>
> __CTA_COUNTERS_MAX
> };
>
> So apparently we already broke compatibility. My prefered solution would
> be to get rid of this mess and return to 64 bit counters unconditionally
> and everywhere.

Hm, why not default on 64 bits counters but still give the choice to
select 32 bits at compilation time? Some advanced users could still want
to compile 32 bits to save some memory (perhaps embedded stuff guys), so
I have the impression that this issue will be revisited sooner or later.
I can give a hand to Krzysztof and improve the patch to make it look
cleaner, although we will still need some ifdef's. Still not convinced?

--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
Re: connbytes & 64bit counters [ In reply to ]
Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>
>>> What is wrong in sending 64 bit counters to userspace if we already have
>>> 64 bit counters in kernel?
>>
>>
>> Nothing - but changing the API based on config options is bad design.
>> I am fine with sending 64 bit unconditionally. But you need to make
>> sure you don't send (32 bit) overflow events to userspace anymore.
>
>
> They could be used to notify 64 bits overflow events. Userspace can
> differenciate if the kernel is using 32 or 64 bits and interpret the
> overflow event appropiately.


Mhh .. even with 100gbit it will take about 50 years to overflow.
I don't think we really need this.

> Hm, why not default on 64 bits counters but still give the choice to
> select 32 bits at compilation time? Some advanced users could still want
> to compile 32 bits to save some memory (perhaps embedded stuff guys), so
> I have the impression that this issue will be revisited sooner or later.
> I can give a hand to Krzysztof and improve the patch to make it look
> cleaner, although we will still need some ifdef's. Still not convinced?


Config options for things like this are silly in my opinion, if we
really want to save memory these counters should be selectable at
runtime, most people don't need them, but distributions will probably
enable them anyway (if not already then once programs using them
appear). And people using them don't save anything, they have to
keep copies in userspace to handle overflows, which needs even more
memory.

Given that their use of the 32 bit counters in libnetfilter_conntrack
is broken anyway (incorrect byte order conversion) I think we should
just get rid of them.
Re: connbytes & 64bit counters [ In reply to ]
On Fri, 24 Nov 2006, Patrick McHardy wrote:

> Pablo Neira Ayuso wrote:
>> Patrick McHardy wrote:
>>
>>>> What is wrong in sending 64 bit counters to userspace if we already have
>>>> 64 bit counters in kernel?
>>>
>>>
>>> Nothing - but changing the API based on config options is bad design.
>>> I am fine with sending 64 bit unconditionally. But you need to make
>>> sure you don't send (32 bit) overflow events to userspace anymore.
>>
>>
>> They could be used to notify 64 bits overflow events. Userspace can
>> differenciate if the kernel is using 32 or 64 bits and interpret the
>> overflow event appropiately.
>
>
> Mhh .. even with 100gbit it will take about 50 years to overflow.
> I don't think we really need this.
>
>> Hm, why not default on 64 bits counters but still give the choice to
>> select 32 bits at compilation time? Some advanced users could still want
>> to compile 32 bits to save some memory (perhaps embedded stuff guys), so
>> I have the impression that this issue will be revisited sooner or later.
>> I can give a hand to Krzysztof and improve the patch to make it look
>> cleaner, although we will still need some ifdef's. Still not convinced?
>
>
> Config options for things like this are silly in my opinion, if we
> really want to save memory these counters should be selectable at
> runtime, most people don't need them, but distributions will probably
> enable them anyway (if not already then once programs using them
> appear). And people using them don't save anything, they have to
> keep copies in userspace to handle overflows, which needs even more
> memory.
>
> Given that their use of the 32 bit counters in libnetfilter_conntrack
> is broken anyway (incorrect byte order conversion) I think we should
> just get rid of them.

OK, maybe something like this can be accepted - no ifdefs, 64 bit
counters unconditionally for both ip_conntrack/nf_conntrack. If someone
needs to save memory it is IMHO better to simply disable accounting at
all.

[NETFILTER] Reimplementation of 64bit counters for bytes/packets accounting

Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example with "connbytes" extension.

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>

diff -Nur linux-2.6.19.1-orig/include/linux/netfilter/nf_conntrack_common.h linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nf_conntrack_common.h
--- linux-2.6.19.1-orig/include/linux/netfilter/nf_conntrack_common.h 2006-12-11 20:32:53.000000000 +0100
+++ linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nf_conntrack_common.h 2006-12-22 21:15:10.000000000 +0100
@@ -122,7 +122,7 @@
IPCT_NATINFO_BIT = 10,
IPCT_NATINFO = (1 << IPCT_NATINFO_BIT),

- /* Counter highest bit has been set */
+ /* Counter highest bit has been set - UNUSED */
IPCT_COUNTER_FILLING_BIT = 11,
IPCT_COUNTER_FILLING = (1 << IPCT_COUNTER_FILLING_BIT),

@@ -139,8 +139,8 @@
#ifdef __KERNEL__
struct ip_conntrack_counter
{
- u_int32_t packets;
- u_int32_t bytes;
+ u_int64_t packets;
+ u_int64_t bytes;
};

struct ip_conntrack_stat
diff -Nur linux-2.6.19.1-orig/include/linux/netfilter/nfnetlink_conntrack.h linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nfnetlink_conntrack.h
--- linux-2.6.19.1-orig/include/linux/netfilter/nfnetlink_conntrack.h 2006-12-11 20:32:53.000000000 +0100
+++ linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nfnetlink_conntrack.h 2006-12-22 21:15:26.000000000 +0100
@@ -89,10 +89,10 @@

enum ctattr_counters {
CTA_COUNTERS_UNSPEC,
- CTA_COUNTERS_PACKETS, /* old 64bit counters */
- CTA_COUNTERS_BYTES, /* old 64bit counters */
- CTA_COUNTERS32_PACKETS,
- CTA_COUNTERS32_BYTES,
+ CTA_COUNTERS_PACKETS, /* 64bit counters */
+ CTA_COUNTERS_BYTES, /* 64bit counters */
+ CTA_COUNTERS32_PACKETS, /* unused */
+ CTA_COUNTERS32_BYTES, /* unused */
__CTA_COUNTERS_MAX
};
#define CTA_COUNTERS_MAX (__CTA_COUNTERS_MAX - 1)
diff -Nur linux-2.6.19.1-orig/net/ipv4/netfilter/ip_conntrack_core.c linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_core.c
--- linux-2.6.19.1-orig/net/ipv4/netfilter/ip_conntrack_core.c 2006-12-11 20:32:53.000000000 +0100
+++ linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_core.c 2006-12-22 21:13:35.000000000 +0100
@@ -1148,9 +1148,6 @@
ct->counters[CTINFO2DIR(ctinfo)].packets++;
ct->counters[CTINFO2DIR(ctinfo)].bytes +=
ntohs(skb->nh.iph->tot_len);
- if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
- || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
- event |= IPCT_COUNTER_FILLING;
}
#endif

diff -Nur linux-2.6.19.1-orig/net/ipv4/netfilter/ip_conntrack_netlink.c linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_netlink.c
--- linux-2.6.19.1-orig/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-12-11 20:32:53.000000000 +0100
+++ linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-12-22 21:12:15.000000000 +0100
@@ -186,13 +186,13 @@
{
enum ctattr_type type = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG;
struct nfattr *nest_count = NFA_NEST(skb, type);
- __be32 tmp;
+ __be64 tmp;

- tmp = htonl(ct->counters[dir].packets);
- NFA_PUT(skb, CTA_COUNTERS32_PACKETS, sizeof(__be32), &tmp);
+ tmp = cpu_to_be64(ct->counters[dir].packets);
+ NFA_PUT(skb, CTA_COUNTERS_PACKETS, sizeof(__be64), &tmp);

- tmp = htonl(ct->counters[dir].bytes);
- NFA_PUT(skb, CTA_COUNTERS32_BYTES, sizeof(__be32), &tmp);
+ tmp = cpu_to_be64(ct->counters[dir].bytes);
+ NFA_PUT(skb, CTA_COUNTERS_BYTES, sizeof(__be64), &tmp);

NFA_NEST_END(skb, nest_count);

diff -Nur linux-2.6.19.1-orig/net/netfilter/nf_conntrack_core.c linux-2.6.19.1-64bitconntrack/net/netfilter/nf_conntrack_core.c
--- linux-2.6.19.1-orig/net/netfilter/nf_conntrack_core.c 2006-12-11 20:32:53.000000000 +0100
+++ linux-2.6.19.1-64bitconntrack/net/netfilter/nf_conntrack_core.c 2006-12-22 21:13:55.000000000 +0100
@@ -1411,9 +1411,6 @@
ct->counters[CTINFO2DIR(ctinfo)].packets++;
ct->counters[CTINFO2DIR(ctinfo)].bytes +=
skb->len - (unsigned int)(skb->nh.raw - skb->data);
- if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
- || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
- event |= IPCT_COUNTER_FILLING;
}
#endif

diff -Nur linux-2.6.19.1-orig/net/netfilter/nf_conntrack_netlink.c linux-2.6.19.1-64bitconntrack/net/netfilter/nf_conntrack_netlink.c
--- linux-2.6.19.1-orig/net/netfilter/nf_conntrack_netlink.c 2006-12-11 20:32:53.000000000 +0100
+++ linux-2.6.19.1-64bitconntrack/net/netfilter/nf_conntrack_netlink.c 2006-12-22 21:13:11.000000000 +0100
@@ -195,13 +195,13 @@
{
enum ctattr_type type = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG;
struct nfattr *nest_count = NFA_NEST(skb, type);
- u_int32_t tmp;
+ __be64 tmp;

- tmp = htonl(ct->counters[dir].packets);
- NFA_PUT(skb, CTA_COUNTERS32_PACKETS, sizeof(u_int32_t), &tmp);
+ tmp = cpu_to_be64(ct->counters[dir].packets);
+ NFA_PUT(skb, CTA_COUNTERS_PACKETS, sizeof(__be64), &tmp);

- tmp = htonl(ct->counters[dir].bytes);
- NFA_PUT(skb, CTA_COUNTERS32_BYTES, sizeof(u_int32_t), &tmp);
+ tmp = cpu_to_be64(ct->counters[dir].bytes);
+ NFA_PUT(skb, CTA_COUNTERS_BYTES, sizeof(__be64), &tmp);

NFA_NEST_END(skb, nest_count);



Best regards,

Krzysztof Olêdzki
Re: connbytes & 64bit counters [ In reply to ]
Krzysztof Oledzki wrote:
> OK, maybe something like this can be accepted - no ifdefs, 64 bit
> counters unconditionally for both ip_conntrack/nf_conntrack. If someone
> needs to save memory it is IMHO better to simply disable accounting at all.


Or ideally have a runtime option. Something to keep in mind when
redesigning the nf_conntrack allocation scheme.

> [NETFILTER] Reimplementation of 64bit counters for bytes/packets accounting
>
> Initially netfilter has had 64bit counters for conntrack-based
> accounting, but
> it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit
> counters are
> still required, for example with "connbytes" extension.
>
> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
>
> diff -Nur
> linux-2.6.19.1-orig/include/linux/netfilter/nf_conntrack_common.h
> linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nf_conntrack_common.h
> --- linux-2.6.19.1-orig/include/linux/netfilter/nf_conntrack_common.h
> 2006-12-11 20:32:53.000000000 +0100
> +++
> linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nf_conntrack_common.h
> 2006-12-22 21:15:10.000000000 +0100
> @@ -122,7 +122,7 @@
> IPCT_NATINFO_BIT = 10,
> IPCT_NATINFO = (1 << IPCT_NATINFO_BIT),
>
> - /* Counter highest bit has been set */
> + /* Counter highest bit has been set - UNUSED */
> IPCT_COUNTER_FILLING_BIT = 11,
> IPCT_COUNTER_FILLING = (1 << IPCT_COUNTER_FILLING_BIT),


No need to keep this.

> @@ -139,8 +139,8 @@
> #ifdef __KERNEL__
> struct ip_conntrack_counter
> {
> - u_int32_t packets;
> - u_int32_t bytes;
> + u_int64_t packets;
> + u_int64_t bytes;
> };
>
> struct ip_conntrack_stat
> diff -Nur
> linux-2.6.19.1-orig/include/linux/netfilter/nfnetlink_conntrack.h
> linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nfnetlink_conntrack.h
> --- linux-2.6.19.1-orig/include/linux/netfilter/nfnetlink_conntrack.h
> 2006-12-11 20:32:53.000000000 +0100
> +++
> linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nfnetlink_conntrack.h
> 2006-12-22 21:15:26.000000000 +0100
> @@ -89,10 +89,10 @@
>
> enum ctattr_counters {
> CTA_COUNTERS_UNSPEC,
> - CTA_COUNTERS_PACKETS, /* old 64bit counters */
> - CTA_COUNTERS_BYTES, /* old 64bit counters */
> - CTA_COUNTERS32_PACKETS,
> - CTA_COUNTERS32_BYTES,
> + CTA_COUNTERS_PACKETS, /* 64bit counters */
> + CTA_COUNTERS_BYTES, /* 64bit counters */
> + CTA_COUNTERS32_PACKETS, /* unused */
> + CTA_COUNTERS32_BYTES, /* unused */
> __CTA_COUNTERS_MAX
> };
> #define CTA_COUNTERS_MAX (__CTA_COUNTERS_MAX - 1)
> diff -Nur linux-2.6.19.1-orig/net/ipv4/netfilter/ip_conntrack_core.c
> linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_core.c
> --- linux-2.6.19.1-orig/net/ipv4/netfilter/ip_conntrack_core.c
> 2006-12-11 20:32:53.000000000 +0100
> +++
> linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_core.c
> 2006-12-22 21:13:35.000000000 +0100
> @@ -1148,9 +1148,6 @@
> ct->counters[CTINFO2DIR(ctinfo)].packets++;
> ct->counters[CTINFO2DIR(ctinfo)].bytes +=
> ntohs(skb->nh.iph->tot_len);
> - if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
> - || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
> - event |= IPCT_COUNTER_FILLING;


This was actually broken before, since the counters are not
reset (they just overflow) an event was generated for every
packet until the overflow once they reached 2^31. Anyway,
I'm not sure how ulogd2 uses these counters, Harald, is it
necessary to receive period updates?
Re: connbytes & 64bit counters [ In reply to ]
On Wed, 10 Jan 2007, Patrick McHardy wrote:

> Krzysztof Oledzki wrote:
>> OK, maybe something like this can be accepted - no ifdefs, 64 bit
>> counters unconditionally for both ip_conntrack/nf_conntrack. If someone
>> needs to save memory it is IMHO better to simply disable accounting at all.
>
> Or ideally have a runtime option. Something to keep in mind when
> redesigning the nf_conntrack allocation scheme.
Indeed.

>> [NETFILTER] Reimplementation of 64bit counters for bytes/packets accounting
>>
>> Initially netfilter has had 64bit counters for conntrack-based
>> accounting, but
>> it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit
>> counters are
>> still required, for example with "connbytes" extension.
>>
>> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
>>
>> diff -Nur
>> linux-2.6.19.1-orig/include/linux/netfilter/nf_conntrack_common.h
>> linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nf_conntrack_common.h
>> --- linux-2.6.19.1-orig/include/linux/netfilter/nf_conntrack_common.h
>> 2006-12-11 20:32:53.000000000 +0100
>> +++
>> linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nf_conntrack_common.h
>> 2006-12-22 21:15:10.000000000 +0100
>> @@ -122,7 +122,7 @@
>> IPCT_NATINFO_BIT = 10,
>> IPCT_NATINFO = (1 << IPCT_NATINFO_BIT),
>>
>> - /* Counter highest bit has been set */
>> + /* Counter highest bit has been set - UNUSED */
>> IPCT_COUNTER_FILLING_BIT = 11,
>> IPCT_COUNTER_FILLING = (1 << IPCT_COUNTER_FILLING_BIT),
>
>
> No need to keep this.

OK, I'll remove both IPCT_COUNTER_FILLING_BIT & IPCT_COUNTER_FILLING.

>> @@ -139,8 +139,8 @@
>> #ifdef __KERNEL__
>> struct ip_conntrack_counter
>> {
>> - u_int32_t packets;
>> - u_int32_t bytes;
>> + u_int64_t packets;
>> + u_int64_t bytes;
>> };
>>
>> struct ip_conntrack_stat
>> diff -Nur
>> linux-2.6.19.1-orig/include/linux/netfilter/nfnetlink_conntrack.h
>> linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nfnetlink_conntrack.h
>> --- linux-2.6.19.1-orig/include/linux/netfilter/nfnetlink_conntrack.h
>> 2006-12-11 20:32:53.000000000 +0100
>> +++
>> linux-2.6.19.1-64bitconntrack/include/linux/netfilter/nfnetlink_conntrack.h
>> 2006-12-22 21:15:26.000000000 +0100
>> @@ -89,10 +89,10 @@
>>
>> enum ctattr_counters {
>> CTA_COUNTERS_UNSPEC,
>> - CTA_COUNTERS_PACKETS, /* old 64bit counters */
>> - CTA_COUNTERS_BYTES, /* old 64bit counters */
>> - CTA_COUNTERS32_PACKETS,
>> - CTA_COUNTERS32_BYTES,
>> + CTA_COUNTERS_PACKETS, /* 64bit counters */
>> + CTA_COUNTERS_BYTES, /* 64bit counters */
>> + CTA_COUNTERS32_PACKETS, /* unused */
>> + CTA_COUNTERS32_BYTES, /* unused */
>> __CTA_COUNTERS_MAX
>> };
>> #define CTA_COUNTERS_MAX (__CTA_COUNTERS_MAX - 1)
>> diff -Nur linux-2.6.19.1-orig/net/ipv4/netfilter/ip_conntrack_core.c
>> linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_core.c
>> --- linux-2.6.19.1-orig/net/ipv4/netfilter/ip_conntrack_core.c
>> 2006-12-11 20:32:53.000000000 +0100
>> +++
>> linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_core.c
>> 2006-12-22 21:13:35.000000000 +0100
>> @@ -1148,9 +1148,6 @@
>> ct->counters[CTINFO2DIR(ctinfo)].packets++;
>> ct->counters[CTINFO2DIR(ctinfo)].bytes +=
>> ntohs(skb->nh.iph->tot_len);
>> - if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
>> - || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
>> - event |= IPCT_COUNTER_FILLING;
>
>
> This was actually broken before, since the counters are not
> reset (they just overflow) an event was generated for every
> packet until the overflow once they reached 2^31. Anyway,
> I'm not sure how ulogd2 uses these counters, Harald, is it
> necessary to receive period updates?

If we remove IPCT_COUNTER_FILLING there is no other solution than also
drop this part of the code.

BTW: what about CTA_COUNTERS32_*? Should we also drop that attributes?

Best regards,

Krzysztof Olêdzki
Re: connbytes & 64bit counters [ In reply to ]
Krzysztof Oledzki wrote:
> On Wed, 10 Jan 2007, Patrick McHardy wrote:
>
>>> linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_core.c
>>> 2006-12-22 21:13:35.000000000 +0100
>>> @@ -1148,9 +1148,6 @@
>>> ct->counters[CTINFO2DIR(ctinfo)].packets++;
>>> ct->counters[CTINFO2DIR(ctinfo)].bytes +=
>>> ntohs(skb->nh.iph->tot_len);
>>> - if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
>>> - || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
>>> - event |= IPCT_COUNTER_FILLING;
>>
>>
>>
>> This was actually broken before, since the counters are not
>> reset (they just overflow) an event was generated for every
>> packet until the overflow once they reached 2^31. Anyway,
>> I'm not sure how ulogd2 uses these counters, Harald, is it
>> necessary to receive period updates?
>
>
> If we remove IPCT_COUNTER_FILLING there is no other solution than also
> drop this part of the code.

Let's wait what Harald says about ulogd2, I'm not sure whether we need
to keep this.

> BTW: what about CTA_COUNTERS32_*? Should we also drop that attributes?

They should be kept to make sure the value are not reused. Please just
add a comment stating that they are not used anymore.
Re: connbytes & 64bit counters [ In reply to ]
On Mon, 15 Jan 2007, Patrick McHardy wrote:

> Krzysztof Oledzki wrote:
>> On Wed, 10 Jan 2007, Patrick McHardy wrote:
>>
>>>> linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_core.c
>>>> 2006-12-22 21:13:35.000000000 +0100
>>>> @@ -1148,9 +1148,6 @@
>>>> ct->counters[CTINFO2DIR(ctinfo)].packets++;
>>>> ct->counters[CTINFO2DIR(ctinfo)].bytes +=
>>>> ntohs(skb->nh.iph->tot_len);
>>>> - if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
>>>> - || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
>>>> - event |= IPCT_COUNTER_FILLING;
>>>
>>>
>>>
>>> This was actually broken before, since the counters are not
>>> reset (they just overflow) an event was generated for every
>>> packet until the overflow once they reached 2^31. Anyway,
>>> I'm not sure how ulogd2 uses these counters, Harald, is it
>>> necessary to receive period updates?
>>
>>
>> If we remove IPCT_COUNTER_FILLING there is no other solution than also
>> drop this part of the code.
>
> Let's wait what Harald says about ulogd2, I'm not sure whether we need
> to keep this.
>
>> BTW: what about CTA_COUNTERS32_*? Should we also drop that attributes?
>
> They should be kept to make sure the value are not reused. Please just
> add a comment stating that they are not used anymore.
>

Getting back to this old thread... How about attached (and inlined) patch?

- for 2.6.22 and current linux-2.6.git kernels with ip_conntrack removed
- removes IPCT_COUNTER_FILLING
- comments CTA_COUNTERS32_*


[NETFILTER]: Reimplementation of 64bit counters for bytes/packets accounting

Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for the "connbytes" extension.

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>

diff --git a/include/linux/netfilter/nf_conntrack_common.h b/include/linux/netfilter/nf_conntrack_common.h
index 9e0dae0..7a29936 100644
--- a/include/linux/netfilter/nf_conntrack_common.h
+++ b/include/linux/netfilter/nf_conntrack_common.h
@@ -122,10 +122,6 @@ enum ip_conntrack_events
IPCT_NATINFO_BIT = 10,
IPCT_NATINFO = (1 << IPCT_NATINFO_BIT),

- /* Counter highest bit has been set */
- IPCT_COUNTER_FILLING_BIT = 11,
- IPCT_COUNTER_FILLING = (1 << IPCT_COUNTER_FILLING_BIT),
-
/* Mark is set */
IPCT_MARK_BIT = 12,
IPCT_MARK = (1 << IPCT_MARK_BIT),
@@ -139,8 +135,8 @@ enum ip_conntrack_expect_events {
#ifdef __KERNEL__
struct ip_conntrack_counter
{
- u_int32_t packets;
- u_int32_t bytes;
+ u_int64_t packets;
+ u_int64_t bytes;
};

struct ip_conntrack_stat
diff --git a/include/linux/netfilter/nfnetlink_conntrack.h b/include/linux/netfilter/nfnetlink_conntrack.h
index d7c3503..75f3df8 100644
--- a/include/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/linux/netfilter/nfnetlink_conntrack.h
@@ -93,10 +93,10 @@ enum ctattr_protoinfo_tcp {

enum ctattr_counters {
CTA_COUNTERS_UNSPEC,
- CTA_COUNTERS_PACKETS, /* old 64bit counters */
- CTA_COUNTERS_BYTES, /* old 64bit counters */
- CTA_COUNTERS32_PACKETS,
- CTA_COUNTERS32_BYTES,
+ CTA_COUNTERS_PACKETS, /* 64bit counters */
+ CTA_COUNTERS_BYTES, /* 64bit counters */
+ CTA_COUNTERS32_PACKETS, /* 32bit counters, unused */
+ CTA_COUNTERS32_BYTES, /* 32bit counters, unused */
__CTA_COUNTERS_MAX
};
#define CTA_COUNTERS_MAX (__CTA_COUNTERS_MAX - 1)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 3d14110..7fd9f54 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -773,9 +773,6 @@ void __nf_ct_refresh_acct(struct nf_conn *ct,
ct->counters[CTINFO2DIR(ctinfo)].bytes +=
skb->len - skb_network_offset(skb);

- if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
- || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
- event |= IPCT_COUNTER_FILLING;
}
#endif

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 6f89b10..1967fed 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -204,13 +204,13 @@ ctnetlink_dump_counters(struct sk_buff *skb, const struct nf_conn *ct,
{
enum ctattr_type type = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG;
struct nfattr *nest_count = NFA_NEST(skb, type);
- __be32 tmp;
+ __be64 tmp;

- tmp = htonl(ct->counters[dir].packets);
- NFA_PUT(skb, CTA_COUNTERS32_PACKETS, sizeof(u_int32_t), &tmp);
+ tmp = cpu_to_be64(ct->counters[dir].packets);
+ NFA_PUT(skb, CTA_COUNTERS_PACKETS, sizeof(__be64), &tmp);

- tmp = htonl(ct->counters[dir].bytes);
- NFA_PUT(skb, CTA_COUNTERS32_BYTES, sizeof(u_int32_t), &tmp);
+ tmp = cpu_to_be64(ct->counters[dir].bytes);
+ NFA_PUT(skb, CTA_COUNTERS_BYTES, sizeof(__be64), &tmp);

NFA_NEST_END(skb, nest_count);



Best regards,

Krzysztof Olêdzki
Re: connbytes & 64bit counters [ In reply to ]
On Sat, 21 Jul 2007, Krzysztof Oledzki wrote:

>
>
> On Mon, 15 Jan 2007, Patrick McHardy wrote:
>
>> Krzysztof Oledzki wrote:
>>> On Wed, 10 Jan 2007, Patrick McHardy wrote:
>>>
>>>>> linux-2.6.19.1-64bitconntrack/net/ipv4/netfilter/ip_conntrack_core.c
>>>>> 2006-12-22 21:13:35.000000000 +0100
>>>>> @@ -1148,9 +1148,6 @@
>>>>> ct->counters[CTINFO2DIR(ctinfo)].packets++;
>>>>> ct->counters[CTINFO2DIR(ctinfo)].bytes +=
>>>>> ntohs(skb->nh.iph->tot_len);
>>>>> - if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
>>>>> - || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
>>>>> - event |= IPCT_COUNTER_FILLING;
>>>>
>>>>
>>>>
>>>> This was actually broken before, since the counters are not
>>>> reset (they just overflow) an event was generated for every
>>>> packet until the overflow once they reached 2^31. Anyway,
>>>> I'm not sure how ulogd2 uses these counters, Harald, is it
>>>> necessary to receive period updates?
>>>
>>>
>>> If we remove IPCT_COUNTER_FILLING there is no other solution than also
>>> drop this part of the code.
>>
>> Let's wait what Harald says about ulogd2, I'm not sure whether we need
>> to keep this.
>>
>>> BTW: what about CTA_COUNTERS32_*? Should we also drop that attributes?
>>
>> They should be kept to make sure the value are not reused. Please just
>> add a comment stating that they are not used anymore.
>>
>
> Getting back to this old thread... How about attached (and inlined) patch?
>
> - for 2.6.22 and current linux-2.6.git kernels with ip_conntrack removed
> - removes IPCT_COUNTER_FILLING
> - comments CTA_COUNTERS32_*

Errr, incomplete patch, sorry... This one should work.

[NETFILTER]: Reimplementation of 64bit counters for bytes/packets accounting

Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for the "connbytes" extension.

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>

diff --git a/include/linux/netfilter/nf_conntrack_common.h b/include/linux/netfilter/nf_conntrack_common.h
index 9e0dae0..7a29936 100644
--- a/include/linux/netfilter/nf_conntrack_common.h
+++ b/include/linux/netfilter/nf_conntrack_common.h
@@ -122,10 +122,6 @@ enum ip_conntrack_events
IPCT_NATINFO_BIT = 10,
IPCT_NATINFO = (1 << IPCT_NATINFO_BIT),

- /* Counter highest bit has been set */
- IPCT_COUNTER_FILLING_BIT = 11,
- IPCT_COUNTER_FILLING = (1 << IPCT_COUNTER_FILLING_BIT),
-
/* Mark is set */
IPCT_MARK_BIT = 12,
IPCT_MARK = (1 << IPCT_MARK_BIT),
@@ -139,8 +135,8 @@ enum ip_conntrack_expect_events {
#ifdef __KERNEL__
struct ip_conntrack_counter
{
- u_int32_t packets;
- u_int32_t bytes;
+ u_int64_t packets;
+ u_int64_t bytes;
};

struct ip_conntrack_stat
diff --git a/include/linux/netfilter/nfnetlink_conntrack.h b/include/linux/netfilter/nfnetlink_conntrack.h
index d7c3503..75f3df8 100644
--- a/include/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/linux/netfilter/nfnetlink_conntrack.h
@@ -93,10 +93,10 @@ enum ctattr_protoinfo_tcp {

enum ctattr_counters {
CTA_COUNTERS_UNSPEC,
- CTA_COUNTERS_PACKETS, /* old 64bit counters */
- CTA_COUNTERS_BYTES, /* old 64bit counters */
- CTA_COUNTERS32_PACKETS,
- CTA_COUNTERS32_BYTES,
+ CTA_COUNTERS_PACKETS, /* 64bit counters */
+ CTA_COUNTERS_BYTES, /* 64bit counters */
+ CTA_COUNTERS32_PACKETS, /* 32bit counters, unused */
+ CTA_COUNTERS32_BYTES, /* 32bit counters, unused */
__CTA_COUNTERS_MAX
};
#define CTA_COUNTERS_MAX (__CTA_COUNTERS_MAX - 1)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index aa086c8..25e027f 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -806,9 +806,6 @@ void __nf_ct_refresh_acct(struct nf_conn *ct,
ct->counters[CTINFO2DIR(ctinfo)].bytes +=
skb->len - skb_network_offset(skb);

- if ((ct->counters[CTINFO2DIR(ctinfo)].packets & 0x80000000)
- || (ct->counters[CTINFO2DIR(ctinfo)].bytes & 0x80000000))
- event |= IPCT_COUNTER_FILLING;
}
#endif

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 6f89b10..b53284f 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -204,13 +204,13 @@ ctnetlink_dump_counters(struct sk_buff *skb, const struct nf_conn *ct,
{
enum ctattr_type type = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG;
struct nfattr *nest_count = NFA_NEST(skb, type);
- __be32 tmp;
+ __be64 tmp;

- tmp = htonl(ct->counters[dir].packets);
- NFA_PUT(skb, CTA_COUNTERS32_PACKETS, sizeof(u_int32_t), &tmp);
+ tmp = cpu_to_be64(ct->counters[dir].packets);
+ NFA_PUT(skb, CTA_COUNTERS_PACKETS, sizeof(__be64), &tmp);

- tmp = htonl(ct->counters[dir].bytes);
- NFA_PUT(skb, CTA_COUNTERS32_BYTES, sizeof(u_int32_t), &tmp);
+ tmp = cpu_to_be64(ct->counters[dir].bytes);
+ NFA_PUT(skb, CTA_COUNTERS_BYTES, sizeof(__be64), &tmp);

NFA_NEST_END(skb, nest_count);

@@ -397,10 +397,6 @@ static int ctnetlink_conntrack_event(struct notifier_block *this,
goto nfattr_failure;
#endif

- if (events & IPCT_COUNTER_FILLING &&
- (ctnetlink_dump_counters(skb, ct, IP_CT_DIR_ORIGINAL) < 0 ||
- ctnetlink_dump_counters(skb, ct, IP_CT_DIR_REPLY) < 0))
- goto nfattr_failure;
}

nlh->nlmsg_len = skb->tail - b;


Best regards,

Krzysztof Olêdzki
Re: connbytes & 64bit counters [ In reply to ]
Krzysztof Oledzki wrote:
> Errr, incomplete patch, sorry... This one should work.
>
> [NETFILTER]: Reimplementation of 64bit counters for bytes/packets
> accounting


I thought about this today, but came to no conclusion. We have ct_extend
now, allowing to allocate the counters dynamically for new conntracks,
and the only reasonable thing IMO (considering distributors that waste
16 bytes per conntrack for a feature pratically *nobody* uses, with your
patch that actually fixes a regression 32 byte) would be to make
accounting optional and triggered by a target in the raw table. Its so
far used by default and the counters are visible through /proc and
ctnetlink though. So this would break compatibility. OTOH I think its
totally ridiculous to waste this much memory for something pratically
nobody uses. So my current opinion is between just breaking
compatibility (with some grace period perhaps) and trying to behave
half-way compatible when ctnetlink is loaded. I don't think the second
option will work though. And frankly, the current code it totally
broken, it will send a COUNTER_OVERFLOW event *for each packet* when
the counters exceed 2^31. So its really questionable if anyone actually
uses this, without further patches.

Opinions are welcome.
Re: connbytes & 64bit counters [ In reply to ]
On Jul 21 2007 06:18, Patrick McHardy wrote:
>
>(considering distributors that waste 16 bytes per conntrack for a feature
>pratically *nobody* uses,
>
>Opinions are welcome.

Break compatibility -- if there are any users (you assume there are none),
they will surely turn up.


Jan
--
Re: connbytes & 64bit counters [ In reply to ]
From: Patrick McHardy <kaber@trash.net>
Date: Sat, 21 Jul 2007 06:18:46 +0200

> Krzysztof Oledzki wrote:
> > Errr, incomplete patch, sorry... This one should work.
> >
> > [NETFILTER]: Reimplementation of 64bit counters for bytes/packets
> > accounting
>
>
> I thought about this today, but came to no conclusion. We have ct_extend
> now, allowing to allocate the counters dynamically for new conntracks,
> and the only reasonable thing IMO (considering distributors that waste
> 16 bytes per conntrack for a feature pratically *nobody* uses, with your
> patch that actually fixes a regression 32 byte) would be to make
> accounting optional and triggered by a target in the raw table. Its so
> far used by default and the counters are visible through /proc and
> ctnetlink though. So this would break compatibility. OTOH I think its
> totally ridiculous to waste this much memory for something pratically
> nobody uses. So my current opinion is between just breaking
> compatibility (with some grace period perhaps) and trying to behave
> half-way compatible when ctnetlink is loaded. I don't think the second
> option will work though. And frankly, the current code it totally
> broken, it will send a COUNTER_OVERFLOW event *for each packet* when
> the counters exceed 2^31. So its really questionable if anyone actually
> uses this, without further patches.
>
> Opinions are welcome.

Well, we cannot add ct_extend area to confirmed conntrack. Then connbytes
match would not be able to get counters of such conntracks if connbytes is
loaded after loading nf_conntrack.

I begin to feel that the lock issue of ct_extend limits us more strictly
than I thought.

-- Yasuyuki Kozakai
Re: connbytes & 64bit counters [ In reply to ]
On Mon, 23 Jul 2007, Yasuyuki KOZAKAI wrote:

> From: Patrick McHardy <kaber@trash.net>
> Date: Sat, 21 Jul 2007 06:18:46 +0200
>
>> Krzysztof Oledzki wrote:
>>> Errr, incomplete patch, sorry... This one should work.
>>>
>>> [NETFILTER]: Reimplementation of 64bit counters for bytes/packets
>>> accounting
>>
>>
>> I thought about this today, but came to no conclusion. We have ct_extend
>> now, allowing to allocate the counters dynamically for new conntracks,
>> and the only reasonable thing IMO (considering distributors that waste
>> 16 bytes per conntrack for a feature pratically *nobody* uses, with your
>> patch that actually fixes a regression 32 byte) would be to make
>> accounting optional and triggered by a target in the raw table. Its so
>> far used by default and the counters are visible through /proc and
>> ctnetlink though. So this would break compatibility. OTOH I think its
>> totally ridiculous to waste this much memory for something pratically
>> nobody uses. So my current opinion is between just breaking
>> compatibility (with some grace period perhaps) and trying to behave
>> half-way compatible when ctnetlink is loaded. I don't think the second
>> option will work though. And frankly, the current code it totally
>> broken, it will send a COUNTER_OVERFLOW event *for each packet* when
>> the counters exceed 2^31. So its really questionable if anyone actually
>> uses this, without further patches.
>>
>> Opinions are welcome.
>
> Well, we cannot add ct_extend area to confirmed conntrack. Then connbytes
> match would not be able to get counters of such conntracks if connbytes is
> loaded after loading nf_conntrack.
>
> I begin to feel that the lock issue of ct_extend limits us more strictly
> than I thought.

So maybe we can add a sysctl option to enable/disable counters for new
connection? Then distributors may setup a kernel with counters compiled
in, but disabled a sysctl.conf?

Best regards,

Krzysztof Olêdzki
Re: connbytes & 64bit counters [ In reply to ]
On Mon, 6 Aug 2007, Krzysztof Oledzki wrote:

>
>
> On Mon, 23 Jul 2007, Yasuyuki KOZAKAI wrote:
>
>> From: Patrick McHardy <kaber@trash.net>
>> Date: Sat, 21 Jul 2007 06:18:46 +0200
>>
>>> Krzysztof Oledzki wrote:
>>>> Errr, incomplete patch, sorry... This one should work.
>>>>
>>>> [NETFILTER]: Reimplementation of 64bit counters for bytes/packets
>>>> accounting
>>>
>>>
>>> I thought about this today, but came to no conclusion. We have ct_extend
>>> now, allowing to allocate the counters dynamically for new conntracks,
>>> and the only reasonable thing IMO (considering distributors that waste
>>> 16 bytes per conntrack for a feature pratically *nobody* uses, with your
>>> patch that actually fixes a regression 32 byte) would be to make
>>> accounting optional and triggered by a target in the raw table. Its so
>>> far used by default and the counters are visible through /proc and
>>> ctnetlink though. So this would break compatibility. OTOH I think its
>>> totally ridiculous to waste this much memory for something pratically
>>> nobody uses. So my current opinion is between just breaking
>>> compatibility (with some grace period perhaps) and trying to behave
>>> half-way compatible when ctnetlink is loaded. I don't think the second
>>> option will work though. And frankly, the current code it totally
>>> broken, it will send a COUNTER_OVERFLOW event *for each packet* when
>>> the counters exceed 2^31. So its really questionable if anyone actually
>>> uses this, without further patches.
>>>
>>> Opinions are welcome.
>>
>> Well, we cannot add ct_extend area to confirmed conntrack. Then connbytes
>> match would not be able to get counters of such conntracks if connbytes is
>> loaded after loading nf_conntrack.
>>
>> I begin to feel that the lock issue of ct_extend limits us more strictly
>> than I thought.
>
> So maybe we can add a sysctl option to enable/disable counters for new
> connection? Then distributors may setup a kernel with counters compiled in,
> but disabled a sysctl.conf?

Patric,

Do you find this idea acceptable or do you want a different solution?

Best regards,

Krzysztof Olêdzki