Mailing List Archive

BCP38 on public-facing Ubuntu servers
Not every uplink service implements BCP38. When putting up servers
connected more-or-less directly to the Internet through these uplinks,
it would be nice if the servers themselves were able to implement
ingress and egress filtering according to BCP38. (Sorry about the typo
in the subject lines of my previous message -- not everyone can get a
BGP feed.)

(Or, when using Ubuntu server edition to implement edge routers.)

My earlier query was asking if anyone has encoded the blackhole routes
in YAML for inserting in netplan(5). My prior message contains the
routes to be blackholed. That takes care of egress routing.

(I think I can write a Python program to take my list and convert it to
the YAML that netplan(5) wants to see. That way, the routes are
inserted when the public interface is up, and removed when the public
interface is down.)

Ingress routing appears to be one-line addition. IPTABLES can be told
to weed out packets with unroutable source addresses. My experiments
will add something like this line to the firewall:

# iptables -A INPUT -m addrtype -i enp1s0 --src-type BLACKHOLE -j DROP

THIS HAS NOT BEEN VERIFIED. I'm building a web server that will
integrate this idea, and try it out.
RE: BCP38 on public-facing Ubuntu servers [ In reply to ]
Maybe you can explore the in kernel feature call RP filter or reverse path filter. In router gear it's called uRPF.

cat /proc/sys/net/ipv4/conf/default/rp_filter

There are 2 modes: Loose or strict.

If your server is BGP multi-homed, then you must use loose. Loose is still very powerful and useful.

Basically, RP is doing what a router does, but the opposite way. When a packet arrives on your server, it checks the routing table for destination next-hop and RP also check whether the frames arrived from the good source interface. If your routing is asymmetric or spoofed, then RP drops it.
It's a nice feature, but it's doing a double route checkup so for sure, it's slightly slower. I'm not sure we can say that it's twice slower though.

I assume your network is not asymmetric, so RP would help you for ingress traffic. For egress, then add blackholes routes to /dev/null interface or with the bogon scripts in python. I wouldn't use iptables for that as it's purely routing, but there are many ways to achieve the same goal.

I recommend to explore the rp_filter as it might do what you're looking for.

As a side note, iptables is super slow when under attack and/or under heavy load.
There are a lot of limitations, like the kernel can only forward ~1.4 Mpps per cpu/socket with iptables. It's too slow slow in my opinion and this was still true recently, but I can't confirm with the latest 5.x kernel. It could have been fix or improve.

Finally, can you share with us which provider doesn't filter BCP38 in their uplink? #JustCurious. ????

Jean



-----Original Message-----
From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Stephen Satchell
Sent: June 2, 2021 12:41 AM
To: nanog@nanog.org; satch@ine.com
Subject: BCP38 on public-facing Ubuntu servers

Not every uplink service implements BCP38. When putting up servers connected more-or-less directly to the Internet through these uplinks, it would be nice if the servers themselves were able to implement ingress and egress filtering according to BCP38. (Sorry about the typo in the subject lines of my previous message -- not everyone can get a BGP feed.)

(Or, when using Ubuntu server edition to implement edge routers.)

My earlier query was asking if anyone has encoded the blackhole routes in YAML for inserting in netplan(5). My prior message contains the routes to be blackholed. That takes care of egress routing.

(I think I can write a Python program to take my list and convert it to the YAML that netplan(5) wants to see. That way, the routes are inserted when the public interface is up, and removed when the public interface is down.)

Ingress routing appears to be one-line addition. IPTABLES can be told to weed out packets with unroutable source addresses. My experiments will add something like this line to the firewall:

# iptables -A INPUT -m addrtype -i enp1s0 --src-type BLACKHOLE -j DROP

THIS HAS NOT BEEN VERIFIED. I'm building a web server that will integrate this idea, and try it out.
Re: BCP38 on public-facing Ubuntu servers [ In reply to ]
    And by that he means: "only a few" =D.

-----
Alain Hebert ahebert@pubnix.net
PubNIX Inc.
50 boul. St-Charles
P.O. Box 26770 Beaconsfield, Quebec H9W 6G7
Tel: 514-990-5911 http://www.pubnix.net Fax: 514-990-9443

On 6/2/21 12:40 AM, Stephen Satchell wrote:
> Not every uplink service implements BCP38.  When putting up servers
> connected more-or-less directly to the Internet through these uplinks,
> it would be nice if the servers themselves were able to implement
> ingress and egress filtering according to BCP38.  (Sorry about the
> typo in the subject lines of my previous message -- not everyone can
> get a BGP feed.)
>
> (Or, when using Ubuntu server edition to implement edge routers.)
>
> My earlier query was asking if anyone has encoded the blackhole routes
> in YAML for inserting in netplan(5).  My prior message contains the
> routes to be blackholed.  That takes care of egress routing.
>
> (I think I can write a Python program to take my list and convert it
> to the YAML that netplan(5) wants to see.  That way, the routes are
> inserted when the public interface is up, and removed when the public
> interface is down.)
>
> Ingress routing appears to be one-line addition.  IPTABLES can be told
> to weed out packets with unroutable source addresses.  My experiments
> will add something like this line to the firewall:
>
> # iptables -A INPUT -m addrtype -i enp1s0 --src-type BLACKHOLE -j DROP
>
> THIS HAS NOT BEEN VERIFIED.  I'm building a web server that will
> integrate this idea, and try it out.
Re: BCP38 on public-facing Ubuntu servers [ In reply to ]
On 6/2/21 4:35 AM, Jean St-Laurent via NANOG wrote:
> Maybe you can explore the in kernel feature call RP filter or reverse
> path filter. In router gear it's called uRPF.
>
> cat /proc/sys/net/ipv4/conf/default/rp_filter

+100 to rp_filter

> There are 2 modes: Loose or strict.
>
> If your server is BGP multi-homed, then you must use loose. Loose is
> still very powerful and useful.

I think loose with any default will fail to do what you want. If you
are running your router without a default, then loose would probably be
okay.

> Basically, RP is doing what a router does, but the opposite way. When
> a packet arrives on your server, it checks the routing table for
> destination next-hop and RP also check whether the frames arrived from
> the good source interface.

For strict mode, the router allows the incoming packet if the incoming
interface would be the outgoing interface when sending a packet to the
incoming packet's source IP.

> If your routing is asymmetric or spoofed, then RP drops it. It's a
> nice feature, but it's doing a double route checkup so for sure, it's
> slightly slower. I'm not sure we can say that it's twice slower though.

I'm confident that it is at least some slower. However ...

I have a lowly AMD E-350 APU (lscpu says it's at 918 MHz) processing
multiple hundred Mbps on GPON against a full DFZ feed with no noticeable
delay. (I've never felt the need nor desire to instrument it.)

As such, I'm confident that any system that would be used in a
greenfield deployment will be able to *easily* handle the traffic that
most servers will see.

> I assume your network is not asymmetric, so RP would help you for
> ingress traffic. For egress, then add blackholes routes to /dev/null
> interface or with the bogon scripts in python. I wouldn't use iptables
> for that as it's purely routing, but there are many ways to achieve
> the same goal.

"unreachable" routes (in Linux parlance) or "null" routes (in Cisco
parlance) combined with Reverse Path Filtering (RPF) is a HUGE win in my
book.

I've expanded this methodology to federate Fail2Ban between multiple
systems. EBGP via bird to trade fail2ban specific tables between
machines and ip rule to make sure the fail2ban table is processed.
Works great in my opinion.

> I recommend to explore the rp_filter as it might do what you're
> looking for.

+100

> As a side note, iptables is super slow when under attack and/or under
> heavy load. There are a lot of limitations, like the kernel can only
> forward ~1.4 Mpps per cpu/socket with iptables. It's too slow slow
> in my opinion and this was still true recently, but I can't confirm
> with the latest 5.x kernel. It could have been fix or improve.

That may be the case. However, that's Apples (iptables) to walnuts
(RPF). They are both food (processing packets), but they are
significantly different.



--
Grant. . . .
unix || die
Re: BCP38 on public-facing Ubuntu servers [ In reply to ]
On Wed, Jun 2, 2021 at 2:04 PM Grant Taylor via NANOG <nanog@nanog.org> wrote:
> On 6/2/21 4:35 AM, Jean St-Laurent via NANOG wrote:
> > Maybe you can explore the in kernel feature call RP filter or reverse
> > path filter. In router gear it's called uRPF.
> >
> > cat /proc/sys/net/ipv4/conf/default/rp_filter
>
> +100 to rp_filter

rp_filter is great until your network is slightly less than a perfect
hierarchy. Then your Linux "router" starts mysteriously dropping
packets and, as with allow_local, Linux doesn't have any way to
generate logs about it so you end up with these mysteriously
unexplained packet discards matching no conceivable rule in
iptables... This failure has too often been the bane of my existence
when using Linux for advanced networking.

Regards,
Bill Herrin


--
William Herrin
bill@herrin.us
https://bill.herrin.us/
Re: BCP38 on public-facing Ubuntu servers [ In reply to ]
On 6/3/21 8:44 AM, William Herrin wrote:
> rp_filter is great until your network is slightly less than a
> perfect hierarchy. Then your Linux "router" starts mysteriously
> dropping packets and, as with allow_local, Linux doesn't have any
> way to generate logs about it so you end up with these mysteriously
> unexplained packet discards matching no conceivable rule in
> iptables... This failure has too often been the bane of my existence
> when using Linux for advanced networking.

I don't remember the particulars, but I thought that was the domain of
log_martians (net.ipv4.conf.*.log_martians).

Without log_martians or explicitly looking for such, no, you won't get
any indication of such drops.



--
Grant. . . .
unix || die
Re: BCP38 on public-facing Ubuntu servers [ In reply to ]
Grant Taylor via NANOG <nanog@nanog.org> wrote:

>On 6/3/21 8:44 AM, William Herrin wrote:
>> rp_filter is great until your network is slightly less than a perfect
>> hierarchy. Then your Linux "router" starts mysteriously dropping packets
>> and, as with allow_local, Linux doesn't have any way to generate logs
>> about it so you end up with these mysteriously unexplained packet
>> discards matching no conceivable rule in iptables... This failure has
>> too often been the bane of my existence when using Linux for advanced
>> networking.
>
>I don't remember the particulars, but I thought that was the domain of
>log_martians (net.ipv4.conf.*.log_martians).
>
>Without log_martians or explicitly looking for such, no, you won't get any
>indication of such drops.

Yes, enabling the log_martians sysctl will generate a kernel log
message for each rp_filter failure (subject to rate limiting). There
are also stat counters in /proc/net/stat/rt_cache (one line per CPU) for
in_martian_dst and in_martian_src which increment regardless of the
log_martians setting.

The rp_filter sysctl defaults to strict mode (== 1) on Ubuntu,
but can be set to loose mode (== 2); the difference is, essentially, in
strict mode the reverse path must be the same interface as the ingress
interface, whereas in loose mode the reverse path can be any interface
(as long as the source address is reachable).

https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst

-J

---
-Jay Vosburgh, jay.vosburgh@canonical.com
Re: BCP38 on public-facing Ubuntu servers [ In reply to ]
Hey,



to my knowledge there is no IPv6 equivalent for net.ipv4.conf.all.rp_filter.

Therefore I use netfilter to do the RP filtering for both address families.



ip(6)tables -t raw -I PREROUTING -m rpfilter --invert -j DROP



Using the raw tables less resources are used, but you could also choose other tables.
Details abour rpfilter can be found here [1].


This can also be achieved using nftables [2].


Best

Fran

[1] https://ipset.netfilter.org/iptables-extensions.man.html#lbBX
[2] https://wiki.nftables.org/wiki-nftables/index.php/Matching_routing_information



On 04.06.21 20:43, Jay Vosburgh wrote:
> Grant Taylor via NANOG <nanog@nanog.org> wrote:
>
>> On 6/3/21 8:44 AM, William Herrin wrote:
>>> rp_filter is great until your network is slightly less than a perfect
>>> hierarchy. Then your Linux "router" starts mysteriously dropping packets
>>> and, as with allow_local, Linux doesn't have any way to generate logs
>>> about it so you end up with these mysteriously unexplained packet
>>> discards matching no conceivable rule in iptables... This failure has
>>> too often been the bane of my existence when using Linux for advanced
>>> networking.
>>
>> I don't remember the particulars, but I thought that was the domain of
>> log_martians (net.ipv4.conf.*.log_martians).
>>
>> Without log_martians or explicitly looking for such, no, you won't get any
>> indication of such drops.
>
> Yes, enabling the log_martians sysctl will generate a kernel log
> message for each rp_filter failure (subject to rate limiting). There
> are also stat counters in /proc/net/stat/rt_cache (one line per CPU) for
> in_martian_dst and in_martian_src which increment regardless of the
> log_martians setting.
>
> The rp_filter sysctl defaults to strict mode (== 1) on Ubuntu,
> but can be set to loose mode (== 2); the difference is, essentially, in
> strict mode the reverse path must be the same interface as the ingress
> interface, whereas in loose mode the reverse path can be any interface
> (as long as the source address is reachable).
>
> https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst
>
> -J
>
> ---
> -Jay Vosburgh, jay.vosburgh@canonical.com
>
Re: BCP38 on public-facing Ubuntu servers [ In reply to ]
On 6/8/21 2:38 PM, Fran via NANOG wrote:
> Hey,
>
> to my knowledge there is no IPv6 equivalent for
> net.ipv4.conf.all.rp_filter.
>
> Therefore I use netfilter to do the RP filtering for both address families.
>
> ip(6)tables -t raw -I PREROUTING -m rpfilter --invert -j DROP
>
> Using the raw tables less resources are used, but you could also choose
> other tables.
> Details abour rpfilter can be found here [1].
>
> This can also be achieved using nftables [2].

I've been in discussions on how to filter packets with bad source
addresses on several mailing lists, including this one. For the last
few weeks, I've been search for all the information I can find for how
Linux implements rp_filter...which appears to have some holes.

Looking at /proc/sys/net/ipv6, there is no knob for rp_filter, so if
your system is IPv6 enabled you have to use the built-in firewall.

For IPv4, I found kernel documentation, but it doesn't tell the whole
story. For that, I had to comb the kernel sources to find out all the
details of rp_filter. I've prepared a RFC letter of what I think I
found, to be sent to the kernel developers. Here is the text of what
I'll be sending, with any constructive criticism I get from here:

Letter begins:

After looking at the source that appears to implement rp_filter
linux/net/ipv4/fib_frontend.c
I believe that I now understand the tests rp_filter performs to
validate the source address when net.ipv4.conf.*.rp_filter is
set to one or two for a given interface.

Does the new paragraph I have written accurately reflect what
happens? If so, then I find out how to submit a patch to add the
clarification to the kernel document.

Description of rp_filter from
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
--------------------------------------------------------------------
rp_filter - INTEGER
0 - No source validation.
1 - Strict mode as defined in RFC3704 Strict Reverse Path
Each incoming packet is tested against the FIB and if the
interface is not the best reverse path the packet check will
fail. By default failed packets are discarded.
2 - Loose mode as defined in RFC3704 Loose Reverse Path
Each incoming packet's source address is also tested against
the FIB and if the source address is not reachable via any
interface the packet check will fail.

[*new text here]

Current recommended practice in RFC3704 is to enable strict mode
to prevent IP spoofing from DDos attacks. If using asymmetric
routing or other complicated routing, then loose mode is
recommended.

The max value from conf/{all,interface}/rp_filter is used
when doing source validation on the {interface}.

Default value is 0. Note that some distributions enable it
in startup scripts.
--------------------------------------------------------------------

Recommended addition where marked with "[*new text here]":
rp_filter will examine the source address of an incoming IP
packet by performing an FIB lookup. In loose mode (value 2),
the packet is rejected if the source address is neither
UNICAST nor LOCAL nor IPSEC. For strict mode (value 1) the
interface indicated by the FIB entry must also match the
interface on which the packet arrived.
RE: BCP38 on public-facing Ubuntu servers [ In reply to ]
Bingo!

With the -t raw, you can bypass the 1.2 Mpps limitation in iptables per cpusocket, because it's doing a very early drop without crossing the full iptables kernel modules.

You can reach close to wrirespeed with the -t raw compare to using the same iptables without -t raw.

Jean

-----Original Message-----
From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Fran via NANOG
Sent: June 8, 2021 5:39 PM
To: nanog@nanog.org
Subject: Re: BCP38 on public-facing Ubuntu servers

Hey,



to my knowledge there is no IPv6 equivalent for net.ipv4.conf.all.rp_filter.

Therefore I use netfilter to do the RP filtering for both address families.



ip(6)tables -t raw -I PREROUTING -m rpfilter --invert -j DROP



Using the raw tables less resources are used, but you could also choose other tables.
Details abour rpfilter can be found here [1].


This can also be achieved using nftables [2].


Best

Fran

[1] https://ipset.netfilter.org/iptables-extensions.man.html#lbBX
[2] https://wiki.nftables.org/wiki-nftables/index.php/Matching_routing_information



On 04.06.21 20:43, Jay Vosburgh wrote:
> Grant Taylor via NANOG <nanog@nanog.org> wrote:
>
>> On 6/3/21 8:44 AM, William Herrin wrote:
>>> rp_filter is great until your network is slightly less than a
>>> perfect hierarchy. Then your Linux "router" starts mysteriously
>>> dropping packets and, as with allow_local, Linux doesn't have any
>>> way to generate logs about it so you end up with these mysteriously
>>> unexplained packet discards matching no conceivable rule in
>>> iptables... This failure has too often been the bane of my existence
>>> when using Linux for advanced networking.
>>
>> I don't remember the particulars, but I thought that was the domain
>> of log_martians (net.ipv4.conf.*.log_martians).
>>
>> Without log_martians or explicitly looking for such, no, you won't
>> get any indication of such drops.
>
> Yes, enabling the log_martians sysctl will generate a kernel log
> message for each rp_filter failure (subject to rate limiting). There
> are also stat counters in /proc/net/stat/rt_cache (one line per CPU)
> for in_martian_dst and in_martian_src which increment regardless of
> the log_martians setting.
>
> The rp_filter sysctl defaults to strict mode (== 1) on Ubuntu, but
> can be set to loose mode (== 2); the difference is, essentially, in
> strict mode the reverse path must be the same interface as the ingress
> interface, whereas in loose mode the reverse path can be any interface
> (as long as the source address is reachable).
>
> https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst
>
> -J
>
> ---
> -Jay Vosburgh, jay.vosburgh@canonical.com
>