Mailing List Archive

[RFC] [PATCH] ipset: New set type fullipmap
Hi,

I'd like to get your initial feedback on a new set type called fullipmap.

The fullipmap type uses dynamically allocated bitmaps to represent every
single IPv4 address as a bit. Matching speed has O(1) performance and
worst case memory usage is 512 MiB for all bitmaps, not accounting the
overhead of slab coloring, and a constant amount of memory for the index
table. It represents single IP addresses, but has a way to add and delete
a range of addresses at once. Currently an add or delete operation of a
range of two or more addresses doesn't check if the complete range is
already present or not present in the set.

This set type has been written, because we need to match against lists
that contain both single addresses and networks. With 6 million single
addresses the iphash set type with a low probe value has a quite high
memory usage of a couple of hundred MiB, while the fullipmap needs a
couple of ten MiB. The nethash set type with many different CIDR sized
networks uses quite a high amount of CPU for matching, especially for the
hashing part of it.

I'm responding with two patches, one for the kernel part and one for the
userspace part, containing the code in its current state. The code has
grown over time and probably needs some cleanup and comments.

Sven
Re: [RFC] [PATCH] ipset: New set type fullipmap [ In reply to ]
On Fri, 17 Aug 2007, Jan Engelhardt wrote:

>> The fullipmap type uses dynamically allocated bitmaps to represent
>> every single IPv4 address as a bit. Matching speed has O(1) performance
>> and worst case memory usage is 512 MiB for all bitmaps, not accounting
>> the overhead of slab coloring, and a constant amount of memory for the
>> index table. It represents single IP addresses, but has a way to add
>> and delete a range of addresses at once. Currently an add or delete
>> operation of a range of two or more addresses doesn't check if the
>> complete range is already present or not present in the set.
>
> 512 MB, I am a little choking on that. What will you do when IPv6 is
> commonplace? I do not think you'd be wanting to try creating a machine
> with 36893488147419103232 EB.

512 MiB is the worst case and this set type, in its current form, is
probably only good for IPv4. The main problem for IPv6 is the size of the
index table for the bitmaps. The code already handles the two special
cases (no bits and all bits set) by reusing two bitmaps that get allocated
on module load. There is also a kind of garbage collector that checks
whether a bitmap can be replaced by one of the two bitmaps. So a set that
only contains bitmaps with all or no bits set does not use any additional
memory for the bitmaps, just the memory for the index table and the dirty
bitmap.

> Here is an improvement: use a dynamically allocated 4-level digital
> trie, that way, unused subnets do not take any memory. Once allocated,
> a lookup is O(1) too.

For IPv4 the default size of the index table is 8 MiB (2097152 slots * 4
byte pointer on x86) with bitmaps of 256 bytes. The dirty bitmap is
2097152 / 8 byte = 256 KiB. We can decrease the size of the index table
and the dirty bitmap by increasing the size of the bitmaps. But for todays
hardware I consider 8 MiB not too much memory to justify much work to
lower it, because for the worst case we will then take more memory than
just a single table to store the 2097152 pointers.

True, for IPv6 all is different, because this one single index table
approach does not scale.

Sven
Re: [RFC] [PATCH] ipset: New set type fullipmap [ In reply to ]
On Aug 17 2007 14:53, Sven Wegener wrote:
>
>> Here is an improvement: use a dynamically allocated 4-level digital
>> trie, that way, unused subnets do not take any memory. Once allocated,
>> a lookup is O(1) too.
>
> For IPv4 the default size of the index table is 8 MiB (2097152
> slots * 4 byte pointer on x86) with bitmaps of 256 bytes.

Why 2097152 (256*8192) slots? With a 4-level you have 256 slots at each level.

union ipv4addr {
uint32_t fullthing;
struct {
uint8_t octet0, octet1, octet2, octet3;
};
};
lookup(union ipv4addr addr)
{
const char ****level1 = global_l1_map;
const char ***level2;
const char **level3;
const char *level4;

level2 = level1[addr.octet0];
if (level2 == NULL)
return false;
level3 = level2[addr.octet1];
if (level3 == NULL)
return false;
level4 = level3[addr.octet2];
if (level4 == NULL)
return false;

return test_bit(level4, addr.octet3);
}

So in the worst case that only one single IP address is set, you take
up 256 pointers + 256 pointers + 256 pointers + 256 bits = 6176 bytes
on x86_64.

> The dirty bitmap is 2097152 / 8 byte = 256 KiB. We can decrease the
> size of the index table and the dirty bitmap by increasing the size
> of the bitmaps. But for todays hardware I consider 8 MiB not too
> much memory to justify much work to lower it, because for the worst
> case we will then take more memory than just a single table to
> store the 2097152 pointers.

Jan
--
Re: [RFC] [PATCH] ipset: New set type fullipmap [ In reply to ]
On Fri, 17 Aug 2007, Jan Engelhardt wrote:

>
> On Aug 17 2007 14:53, Sven Wegener wrote:
>>
>>> Here is an improvement: use a dynamically allocated 4-level digital
>>> trie, that way, unused subnets do not take any memory. Once allocated,
>>> a lookup is O(1) too.
>>
>> For IPv4 the default size of the index table is 8 MiB (2097152
>> slots * 4 byte pointer on x86) with bitmaps of 256 bytes.
>
> Why 2097152 (256*8192) slots? With a 4-level you have 256 slots at each level.
>
> [...]
>
> So in the worst case that only one single IP address is set, you take
> up 256 pointers + 256 pointers + 256 pointers + 256 bits = 6176 bytes
> on x86_64.

Uhm, yeah, like that we would slightly increase the worst case memory
usage for the worst case I was thinking of, a lot of single IPs in
different bitmaps, but would lower the memory we need for an empty set.
Worst case would be 576 MiB on x86 and 640 MiB on x86_64 for the whole
thing, compared to 520 MiB and 528 MiB for my approach. Your approach is
the iptree set type moved to bitmaps, something like iptreemap. I'll think
about it.

Sven
Re: [RFC] [PATCH] ipset: New set type fullipmap [ In reply to ]
On Aug 17 2007 15:36, Sven Wegener wrote:
>> On Aug 17 2007 14:53, Sven Wegener wrote:
>> >
>> > > Here is an improvement: use a dynamically allocated 4-level digital
>> > > trie, that way, unused subnets do not take any memory. Once allocated,
>> > > a lookup is O(1) too.
>> >
>> > For IPv4 the default size of the index table is 8 MiB (2097152
>> > slots * 4 byte pointer on x86) with bitmaps of 256 bytes.
>>
>> Why 2097152 (256*8192) slots? With a 4-level you have 256 slots at each
>> level.
>>
>> [...]
>>
>> So in the worst case that only one single IP address is set, you take
>> up 256 pointers + 256 pointers + 256 pointers + 256 bits = 6176 bytes
>> on x86_64.
>
> Uhm, yeah, like that we would slightly increase the worst case memory usage
> for the worst case I was thinking of, a lot of single IPs in different
> bitmaps, but would lower the memory we need for an empty set. Worst case
> would be 576 MiB on x86 and 640 MiB on x86_64 for the whole thing, compared
> to 520 MiB and 528 MiB for my approach. Your approach is the iptree set type
> moved to bitmaps, something like iptreemap. I'll think about it.

I hardly think you will ever see the worst case. Or will you? Well, no idea,
but I am sure you certainly do not need to mark 1.0.0.0/8 to 17.0.0.0/8 ;-)


Hm, here is another suggestion (does iptree do that too?)

struct treething {
void *point_there;
int marked;
};

and if point_there==NULL, then "marked" will apply to the whole subnet that
would have been pointed to.



Jan
--
Re: [RFC] [PATCH] ipset: New set type fullipmap [ In reply to ]
On Fri, 17 Aug 2007, Jan Engelhardt wrote:

>
> On Aug 17 2007 15:36, Sven Wegener wrote:
>>> On Aug 17 2007 14:53, Sven Wegener wrote:
>>>>
>>>>> Here is an improvement: use a dynamically allocated 4-level digital
>>>>> trie, that way, unused subnets do not take any memory. Once allocated,
>>>>> a lookup is O(1) too.
>>>>
>>>> For IPv4 the default size of the index table is 8 MiB (2097152
>>>> slots * 4 byte pointer on x86) with bitmaps of 256 bytes.
>>>
>>> Why 2097152 (256*8192) slots? With a 4-level you have 256 slots at each
>>> level.
>>>
>>> [...]
>>>
>>> So in the worst case that only one single IP address is set, you take
>>> up 256 pointers + 256 pointers + 256 pointers + 256 bits = 6176 bytes
>>> on x86_64.
>>
>> Uhm, yeah, like that we would slightly increase the worst case memory usage
>> for the worst case I was thinking of, a lot of single IPs in different
>> bitmaps, but would lower the memory we need for an empty set. Worst case
>> would be 576 MiB on x86 and 640 MiB on x86_64 for the whole thing, compared
>> to 520 MiB and 528 MiB for my approach. Your approach is the iptree set type
>> moved to bitmaps, something like iptreemap. I'll think about it.
>
> I hardly think you will ever see the worst case. Or will you? Well, no idea,
> but I am sure you certainly do not need to mark 1.0.0.0/8 to 17.0.0.0/8 ;-)

Yeah, the worst case is in most cases an uncommon case.

> Hm, here is another suggestion (does iptree do that too?)
>
> struct treething {
> void *point_there;
> int marked;
> };
>
> and if point_there==NULL, then "marked" will apply to the whole subnet that
> would have been pointed to.

No, iptree does not do anything like that.

The question is, are 8 MiB less memory required for an empty set worth the
"overhead" the management of a tree structure creates. The all bits set
case, that the large index table does with the 8 MiB without needing
additional memory, would require a marker like you suggested or we need to
use a global allocated tree element that has the special meaning to mark
the whole subtree as included. But that would only work on octet boundary.
As always, two sides of a coin, but I'll give it a shot. :)

Sven
Re: [RFC] [PATCH] ipset: New set type fullipmap [ In reply to ]
On Aug 17 2007 16:53, Sven Wegener wrote:
>
> The question is, are 8 MiB less memory required for an empty set
> worth the "overhead" the management of a tree structure creates.

Yes! Because you read the bitmap much more often than you modify it
(the only place where (de)allocations are done).

> The all bits set case, that the large index table does with the 8
> MiB without needing additional memory, would require a marker like
> you suggested or we need to use a global allocated tree element
> that has the special meaning to mark the whole subtree as included.
> But that would only work on octet boundary. As always, two sides of
> a coin, but I'll give it a shot. :)

The octet boundary is a good comprimise, I think.



Jan
--