Mailing List Archive

ipset performance question
Greetings.

I'm working on a rules compiler that takes advantage of ipset and I'd
like to hear opinions on the following subject:

When firewall rules are expressed in conventional way (like in
fwbuilder), source and destination fields may be a mix of singular ip
adresses and subnets.

Ipset offer different types of sets but no one allows to express such a mixture.
A work seems to be in progress to allow unions of sets but for time
being we need to use multiple iptables rules to accomodate this
restriction:

iptables -A A_CHAIN -m set --set nethash_src src -j TEST_DST
iptables -A A_CHAIN -m set --set iphash_src src -j TEST_DST
iptables -A TEST_DST -m set --set nethash_dst dst -j A_TARGET
iptables -A TEST_DST -m set --set iphash_dst dst -j A_TARGET


There is however another way that allows to match subnets and IPs in one rule.
The trick is actually simple: using a match module multiple times
leads to a logical AND in the rule while what we need is a logical OR
so we do the following transformation:
{ A union B } = {{A union C} inter ~{C\B}}
where
{A inter B } is empty
C is a superset of B
C\B means complement of B to C
{{X} inter ~{Y}} means all elements of X that are not in {Y}
we chose C of same type as A so it's easy to compute {A union C}

And it looks like this (let's suppose you want match source adresses
from {subnet1/mask1,..subnetN/maskN,IP1..ipN}):

ipset -N nh nethash
ipset -A nh <subnet1/mask>
...
ipset -A nh <subnet1/mask>
ipset -A nh <ip1/31>
...
ipset -A nh <ipN/31>

ipset -N iph iphash
ipset -A iph <complement_ip_in_slash31_subnet(ip1)>
...
ipset -A iph <complement_ip_in_slash31_subnet(ipN)>

Where complement_ip_in_slash31_subnet function works like this:
complement_ip_in_slash31_subnet function(10.0.0.1) => 10.0.0.0
complement_ip_in_slash31_subnet function(172.31.255.4) => 172.31.255.5
and so on.

Then we could issue the following rule
iptables -A A_CHAIN -m set --set nh src \! --set iph src -j A_TARGET

Of course, first we need to aggregate IPs in subnets when possible and
cleanup {subnet1/mask1,..subnetN/maskN,IP1..ipN} in such way that
there would no inclusion between subnets and singular adresses ie
enforce "{A inter B} empty".

Now comes the questions I have trouble to answer because i lack
understanding of iptables/ipset internals:

- is this worth doing at all ?
we may have one rule with four matches (three if we resort to binding
but it seems obsolete):
iptables -A A_CHAIN -m set --set snh src \! --set siph src --set dnh
dst \! --set diph dst -j A_TARGET
instead of four rules with one match each (see example above).

- wouldn't it be more efficient to use ipmap instead of iphash ? (I think yes)

- what about multiple ipmap sets if {C\B} is too sparse ?

- using similar tricks sometimes it may be possible to use in the same
rule two nethash sets of total size smaller than the size of the
original one nethash could it be efficient?

Thanks in advance for your comments

Best regards
Michel
Re: ipset performance question [ In reply to ]
Hi,

On Tue, 21 Aug 2007, michel banguerski wrote:

> I'm working on a rules compiler that takes advantage of ipset and I'd
> like to hear opinions on the following subject:
>
> When firewall rules are expressed in conventional way (like in
> fwbuilder), source and destination fields may be a mix of singular ip
> adresses and subnets.
>
> Ipset offer different types of sets but no one allows to express such a mixture.
> A work seems to be in progress to allow unions of sets but for time
> being we need to use multiple iptables rules to accomodate this
> restriction:
>
> iptables -A A_CHAIN -m set --set nethash_src src -j TEST_DST
> iptables -A A_CHAIN -m set --set iphash_src src -j TEST_DST
> iptables -A TEST_DST -m set --set nethash_dst dst -j A_TARGET
> iptables -A TEST_DST -m set --set iphash_dst dst -j A_TARGET
>
>
> There is however another way that allows to match subnets and IPs in one rule.
> The trick is actually simple: using a match module multiple times
> leads to a logical AND in the rule while what we need is a logical OR
> so we do the following transformation:
> { A union B } = {{A union C} inter ~{C\B}}
> where
> {A inter B } is empty
> C is a superset of B
> C\B means complement of B to C
> {{X} inter ~{Y}} means all elements of X that are not in {Y}
> we chose C of same type as A so it's easy to compute {A union C}
>
> And it looks like this (let's suppose you want match source adresses
> from {subnet1/mask1,..subnetN/maskN,IP1..ipN}):
>
> ipset -N nh nethash
> ipset -A nh <subnet1/mask>
> ...
> ipset -A nh <subnet1/mask>
> ipset -A nh <ip1/31>
> ...
> ipset -A nh <ipN/31>
>
> ipset -N iph iphash
> ipset -A iph <complement_ip_in_slash31_subnet(ip1)>
> ...
> ipset -A iph <complement_ip_in_slash31_subnet(ipN)>
>
> Where complement_ip_in_slash31_subnet function works like this:
> complement_ip_in_slash31_subnet function(10.0.0.1) => 10.0.0.0
> complement_ip_in_slash31_subnet function(172.31.255.4) => 172.31.255.5
> and so on.
>
> Then we could issue the following rule
> iptables -A A_CHAIN -m set --set nh src \! --set iph src -j A_TARGET
>
> Of course, first we need to aggregate IPs in subnets when possible and
> cleanup {subnet1/mask1,..subnetN/maskN,IP1..ipN} in such way that
> there would no inclusion between subnets and singular adresses ie
> enforce "{A inter B} empty".
>
> Now comes the questions I have trouble to answer because i lack
> understanding of iptables/ipset internals:
>
> - is this worth doing at all ?
> we may have one rule with four matches (three if we resort to binding
> but it seems obsolete):
> iptables -A A_CHAIN -m set --set snh src \! --set siph src --set dnh
> dst \! --set diph dst -j A_TARGET
> instead of four rules with one match each (see example above).

Hard to say much. From ipset point of view, iff there are identical number
of set lookups, then it does not matter, but ipmap types are faster than
hashes. From iptables point of view, due to the implicit internal
checkings (src/dst IP addresses and interfaces) you should minimize the
number of the iptables rules.

> - wouldn't it be more efficient to use ipmap instead of iphash ? (I think yes)

Yes.

> - what about multiple ipmap sets if {C\B} is too sparse ?

That'd involve multiple iptables rules as you cannot express OR matches in
one rule, just AND ones (and there is no "union" set type yet).

> - using similar tricks sometimes it may be possible to use in the same
> rule two nethash sets of total size smaller than the size of the
> original one nethash could it be efficient?

Real testing could answer it best.

But please take into account how much resource you use to compute the
"best" set/rule combination. Does it worth if you cannot gain really much
or if you loose the possibility to quickly add or delete IP addresses from
anywhere?

Best regards,
Jozsef
-
E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
Re: ipset performance question [ In reply to ]
Thank You for the answer Jozsef.
[...]
> > - is this worth doing at all ?
> > we may have one rule with four matches (three if we resort to binding
> > but it seems obsolete):
> > iptables -A A_CHAIN -m set --set snh src \! --set siph src --set dnh
> > dst \! --set diph dst -j A_TARGET
> > instead of four rules with one match each (see example above).
>
> Hard to say much. From ipset point of view, iff there are identical number
> of set lookups, then it does not matter, but ipmap types are faster than
> hashes. From iptables point of view, due to the implicit internal
> checkings (src/dst IP addresses and interfaces) you should minimize the
> number of the iptables rules.

As I understand it, here's the choice:

a) 4 rules; 2 ipmap and 2 nethash tests with hope to skip nethash(s)
and 1(2) rules if the address(es) of inspected packet are in ipmap(s)
b) 1 rule; 2 ipmap and 2 nethash tests with no hope to skip any of them

To recap I'd say it depends of the ratio <number of subnets>/<number of IPs>
if the ratio is high and most packets are probably going through
nethash test anyway, choice (b) makes sense. Otherwise stick with (a)


>
> > - wouldn't it be more efficient to use ipmap instead of iphash ? (I think yes)
>
> Yes.
>
> > - what about multiple ipmap sets if {C\B} is too sparse ?
>
> That'd involve multiple iptables rules as you cannot express OR matches in
> one rule, just AND ones (and there is no "union" set type yet).
True, "OR" is our enemy here.
But since we need ~{C\B} actually, it becomes an AND:
~{C1\B} AND ... AND ~{Cn\B} = ~({C1\B} OR .. OR {Cn\B})
so it's OK.
The question here is: how N*ipmap (used in the same rule) compares to 1*iphash?
What are the values of N when one iphash is faster?

>
> > - using similar tricks sometimes it may be possible to use in the same
> > rule two nethash sets of total size smaller than the size of the
> > original one nethash could it be efficient?
>
> Real testing could answer it best.
>
> But please take into account how much resource you use to compute the
> "best" set/rule combination. Does it worth if you cannot gain really much
> or if you loose the possibility to quickly add or delete IP addresses from
> anywhere?
Yes, this is an issue I am aware of.
In my case sets will be treated as read-only.

The fact is that I deal with hierarchical groups and later I will add
intersections subtractions between them.
Any change to the original rule-base/groups will lead to a
recompilation of a new iptables/ipset configuration.
This means that I could not take advantage of some cool features like
changing sets from rules or swap them for faster configurations.
To make things even worse I plan to generate in the future a BDD
structure of chains that will "refine" the packet before applying only
a subset of real rules, making the whole mess totally unreadable and
unmodifiable by a human (a single IP change could lead to a whole new
BDD).

To cope with limitations above, my compiler generates versioned chains
subtrees that could coexist in the same table allowing an instant
roll-out and roll-back and even to process a packet by multiple
rule-bases.
In addition this approach leaves room for independent uses of iptables
where ipset could be fully exploited.

In this perspective, resources needed to generate required sets are to
estimate on another scale (I'd say that for a complex ruleset
1-2minutes to compile are perfectly acceptable).

BTW I saw it is planned to replace binding with something else, is it
possible to have some insight on what it will look like?
>
> Best regards,
> Jozsef
[...]

Thank You
Best regards,
Michel