Mailing List Archive

ASR920: egress ACL on BDIs
Hi,

quick question to the group - ACLs on BDIs on ASR920s, is this something
known as something you want to stay away from?

I'm trying to get rid of one of our remaining 6500/Sup720s - most VLANs
got moved to Aristas, but a few of them have egress ACLs on the SVI/BDI
(which does not really work well with the default Arista TCAM carving,
only 1000 entries...) - so I decided "make good use of the ASR920 on
that site, which isn't really doing much" and moved the three (3!) BDIs
over.

B?m.

*Some* packets that are supposed to be permitted by very simple IPv4
ACLs are just not arriving. Like, TCP SYNs that should be matched
by a "permit ip host $source host $dest" rule, right at the top of
the ACL in question. Or ping, which is permitted in all our ACLs
with a "permit icmp any any" rule.

Removing and re-adding the ACLs (and checking with a sniffer port) has
confirmed that it's indeed the egress ACLs, not routing or anything else
which might eat packets.

Interesting enough, the pattern shifts - so when you change something,
a non-working ACL entry "A" starts working, but something in ACL B
starts failing. Nothing interesting in the logs, ever.

This is an ASR920-12CZ with "Cisco IOS XE Software, Version 16.06.05a".

I have a TAC case open, which has proceeded nicely to "I will have a
look at your logs, but first I go on vacation".



I'm not looking for debugging advise right now, more for experience from
the field - like "yes, we've done egress ACLs with 16.06, and it just
does not work!" or "there is a hidden switch to make the ACL compiler
work correctly if you have <foo>" or maybe even "there is <this> hidden
command to force re-programming of ACLs, it is needed because <that>"...


This box does IPv4, IPv6 routing (BGP, EIGRP, OSPFv3) and EoMPLS/VPLS
things (LDP), on a fairly small scale (~250 IPv4 routes, ~900 IPv6 routes,
~8 bridge-domains, 2 VPLS groups and 2 EoMPLS circuits). So this should
be well within the limits of the architecture...

(I'm tempted to move these VLANs to an old 7301 - it's the backup uplinks
anyway, so falling down to ~500 Mbit/s in case the primary router fails
would be acceptable. But it irks me that I have this new and shiny box
which is not behaving...)

gert

--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany gert@greenie.muc.de
Re: ASR920: egress ACL on BDIs [ In reply to ]
Hi,

We have been using small (<300 ACEs) egress ACLs under BDIs without any apparent issues until now.

Maybe have a look at the following outputs:

show platform hardware pp active tcam utilization acl detail 0
show platform hardware pp active tcam utilization egress-acl detail 0

Also check the limitations of your SDM template (i.e. https://www.cisco.com/c/en/us/td/docs/routers/asr920/configuration/guide/sdm/16-11-1/b-sys-sdm-xe-16-11-1-asr-920.html)

--
Tassos

Gert Doering wrote on 30/12/19 12:57:
> Hi,
>
> quick question to the group - ACLs on BDIs on ASR920s, is this something
> known as something you want to stay away from?
>
> I'm trying to get rid of one of our remaining 6500/Sup720s - most VLANs
> got moved to Aristas, but a few of them have egress ACLs on the SVI/BDI
> (which does not really work well with the default Arista TCAM carving,
> only 1000 entries...) - so I decided "make good use of the ASR920 on
> that site, which isn't really doing much" and moved the three (3!) BDIs
> over.
>
> Bäm.
>
> *Some* packets that are supposed to be permitted by very simple IPv4
> ACLs are just not arriving. Like, TCP SYNs that should be matched
> by a "permit ip host $source host $dest" rule, right at the top of
> the ACL in question. Or ping, which is permitted in all our ACLs
> with a "permit icmp any any" rule.
>
> Removing and re-adding the ACLs (and checking with a sniffer port) has
> confirmed that it's indeed the egress ACLs, not routing or anything else
> which might eat packets.
>
> Interesting enough, the pattern shifts - so when you change something,
> a non-working ACL entry "A" starts working, but something in ACL B
> starts failing. Nothing interesting in the logs, ever.
>
> This is an ASR920-12CZ with "Cisco IOS XE Software, Version 16.06.05a".
>
> I have a TAC case open, which has proceeded nicely to "I will have a
> look at your logs, but first I go on vacation".
>
>
>
> I'm not looking for debugging advise right now, more for experience from
> the field - like "yes, we've done egress ACLs with 16.06, and it just
> does not work!" or "there is a hidden switch to make the ACL compiler
> work correctly if you have <foo>" or maybe even "there is <this> hidden
> command to force re-programming of ACLs, it is needed because <that>"...
>
>
> This box does IPv4, IPv6 routing (BGP, EIGRP, OSPFv3) and EoMPLS/VPLS
> things (LDP), on a fairly small scale (~250 IPv4 routes, ~900 IPv6 routes,
> ~8 bridge-domains, 2 VPLS groups and 2 EoMPLS circuits). So this should
> be well within the limits of the architecture...
>
> (I'm tempted to move these VLANs to an old 7301 - it's the backup uplinks
> anyway, so falling down to ~500 Mbit/s in case the primary router fails
> would be acceptable. But it irks me that I have this new and shiny box
> which is not behaving...)
>
> gert
>
>
>
> _______________________________________________
> cisco-nsp mailing list cisco-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/


_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: ASR920: egress ACL on BDIs [ In reply to ]
Hi,

On Tue, Dec 31, 2019 at 12:00:00AM +0200, Tassos Chatzithomaoglou wrote:
> We have been using small (<300 ACEs) egress ACLs under BDIs without any apparent issues until now.

Which version of IOS XE? How many BDIs?

> Maybe have a look at the following outputs:
>
> show platform hardware pp active tcam utilization acl detail 0
> show platform hardware pp active tcam utilization egress-acl detail 0

That all looks reasonable... (15% and 57%)

> Also check the limitations of your SDM template (i.e. https://www.cisco.com/c/en/us/td/docs/routers/asr920/configuration/guide/sdm/16-11-1/b-sys-sdm-xe-16-11-1-asr-920.html)

We're on "default", and that should have enough of everything for
what the box is doing.

gert
--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany gert@greenie.muc.de
Re: ASR920: egress ACL on BDIs [ In reply to ]
Hi,

replying to myself with a few... interesting... discoveries we've made
in the meantime...

On Mon, Dec 30, 2019 at 11:57:54AM +0100, Gert Doering wrote:
> quick question to the group - ACLs on BDIs on ASR920s, is this something
> known as something you want to stay away from?

TAC was not exactly helpful ("can you add a line to that ACL, and take
another one away, does it work now?" - I'm still waiting for a single
"let's see what is programmed in the hardware!" question...) - but that
uncovered quite an interesting effect...

Namely:

- if I type in the ACL in question, line by line (or remove and re-add
the non-working line from "conf term") things *work*

- if I "bulk-config" the ACL by "copy tftp:$source running-config" or
"rcp $source router:running-config" - which is what our ACL provisioning
tool uses - things *fail*

So my gut says "it's related to the speed of updates" - push in changes
too fast (like, 100 lines in basically "a single instant"), and "something
gets overrun". We've now changed our ACL uploader to use SSH and put
the ACLs in line by line, and that seems to have fixed it for v4. Maybe.


Now, IPv6 ACLs are not working right either, but they fail in different
ways - short ACLs seem to be working right, long ACLs fail-open, as in
"the platform claims it has been programmed, but all packets pass". Yay.

Haven't figured out the trigger on that one yet - like "a certain
combination of protocol/port matches creates a pass-all rule instead"
(but didn't have much time). Should be somewhat easy to bisect, "just
need time"...

gert

--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany gert@greenie.muc.de
Re: ASR920: egress ACL on BDIs [ In reply to ]
> On 20/01/2020, at 12:22 AM, Gert Doering <gert@greenie.muc.de> wrote:
>
>
> Now, IPv6 ACLs are not working right either, but they fail in different
> ways - short ACLs seem to be working right, long ACLs fail-open, as in
> "the platform claims it has been programmed, but all packets pass". Yay.

This is what happens on J ACX boxes.. stunningly bad behaviour :-(

--
Nathan Ward

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: ASR920: egress ACL on BDIs [ In reply to ]
Hi,

On Sun 19. Jan 2020 at 12:23, Gert Doering <gert@greenie.muc.de> wrote:

> replying to myself with a few... interesting... discoveries we've made
> in the meantime...
>
> On Mon, Dec 30, 2019 at 11:57:54AM +0100, Gert Doering wrote:
> > quick question to the group - ACLs on BDIs on ASR920s, is this something
> > known as something you want to stay away from?
>
> TAC was not exactly helpful ("can you add a line to that ACL, and take
> another one away, does it work now?" - I'm still waiting for a single
> "let's see what is programmed in the hardware!" question...) - but that
> uncovered quite an interesting effect...
>
> Namely:
>
> - if I type in the ACL in question, line by line (or remove and re-add
> the non-working line from "conf term") things *work*
>
> - if I "bulk-config" the ACL by "copy tftp:$source running-config" or
> "rcp $source router:running-config" - which is what our ACL provisioning
> tool uses - things *fail*
>
> So my gut says "it's related to the speed of updates" - push in changes
> too fast (like, 100 lines in basically "a single instant"), and "something
> gets overrun". We've now changed our ACL uploader to use SSH and put
> the ACLs in line by line, and that seems to have fixed it for v4. Maybe.
>
>
> Now, IPv6 ACLs are not working right either, but they fail in different
> ways - short ACLs seem to be working right, long ACLs fail-open, as in
> "the platform claims it has been programmed, but all packets pass". Yay.
>
> Haven't figured out the trigger on that one yet - like "a certain
> combination of protocol/port matches creates a pass-all rule instead"
> (but didn't have much time). Should be somewhat easy to bisect, "just
> need time"...


if you use „copy src dst“ then a „no $something“ line right in the
beginning of a new block of configuration lines (eg. for being used to
first deconfigure the whole ACL block and then to reapply it again) might
miss to apply the „no ...“ initially first, which will lead to a merge
behavior instead of a full ACL replace.

This bug not only affects ACLs but other commands as well. Unsure if it is
fixed in newest XE versions. Could this also affect you?

Cheers
Chris

>
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: ASR920: egress ACL on BDIs [ In reply to ]
Hi,

On Mon, Jan 20, 2020 at 12:28:25AM +1300, Nathan Ward wrote:
> > On 20/01/2020, at 12:22 AM, Gert Doering <gert@greenie.muc.de> wrote:
> >
> > Now, IPv6 ACLs are not working right either, but they fail in different
> > ways - short ACLs seem to be working right, long ACLs fail-open, as in
> > "the platform claims it has been programmed, but all packets pass". Yay.
>
> This is what happens on J ACX boxes.. stunningly bad behaviour :-(

Ewww. Does it at least warn in a clearly visible way?

Our Aristas also like to run out of TCAM, but if that happens, a very
clear message is printed *and* the ACL config is not applied to the
interface (= you can see it in your RANCID diffs).

gert

--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany gert@greenie.muc.de
Re: ASR920: egress ACL on BDIs [ In reply to ]
Hi,

On Sun, Jan 19, 2020 at 12:39:18PM +0100, Christian Meutes wrote:
> if you use ???copy src dst??? then a ???no $something??? line right in the
> beginning of a new block of configuration lines (eg. for being used to
> first deconfigure the whole ACL block and then to reapply it again) might
> miss to apply the ???no ...??? initially first, which will lead to a merge
> behavior instead of a full ACL replace.
>
> This bug not only affects ACLs but other commands as well. Unsure if it is
> fixed in newest XE versions. Could this also affect you?

Our ACL config snippets do have

no ip access-list extended FOOBAR
ip access-list extended FOOBAR
permit ...
permit ...
deny ...
end

in them, so yes, this effect would result in "merge" behaviour (which
would very much puzzle me afterwards when looking at the resulting
config diff, I think :-) ).

It does not explain what we currently see - these ACLs have been installed
"from zero", and the resulting running- and startup-config have all the
lines "in". Just the filtering hardware doesn't...

gert

--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany gert@greenie.muc.de
Re: ASR920: egress ACL on BDIs [ In reply to ]
>
>
> This bug not only affects ACLs but other commands as well. Unsure if it is
> fixed in newest XE versions. Could this also affect you?
>
>
Aside from this behavior, XE in the enterprise access layer is full of bugs
related to ACLs. We've recently begun a practice of maintaining two
distinct versions of every ACL so we can swap them on interfaces after
modifying the unused one. Modifying a used one in-place results in some
degree of data plane failure on affected interfaces, i.e. they stop passing
all or some subset of traffic. Even on "fixed" code, the problem persists,
though less frequently.
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: ASR920: egress ACL on BDIs [ In reply to ]
> On 20 Jan 2020, at 00:15, Nathan Lannine <nathan.lannine@gmail.com> wrote:
>
> ?
>>
>>
>>
>> This bug not only affects ACLs but other commands as well. Unsure if it is
>> fixed in newest XE versions. Could this also affect you?
>>
>>
> Aside from this behavior, XE in the enterprise access layer is full of bugs
> related to ACLs. We've recently begun a practice of maintaining two
> distinct versions of every ACL so we can swap them on interfaces after
> modifying the unused one. Modifying a used one in-place results in some
> degree of data plane failure on affected interfaces, i.e. they stop passing
> all or some subset of traffic. Even on "fixed" code, the problem persists,
> though less frequently.

Do you happen to have a bug reference for this? We’ve been seeing this behaviour intermittently on some csr 1ks and haven’t had the time/energy to debate it with TAC yet.
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: ASR920: egress ACL on BDIs [ In reply to ]
On Mon, 27 Jan 2020 at 23:30, Chris Jones <chrisj@aprole.com> wrote:

> > Aside from this behavior, XE in the enterprise access layer is full of bugs
> > related to ACLs. We've recently begun a practice of maintaining two
> > distinct versions of every ACL so we can swap them on interfaces after
> > modifying the unused one. Modifying a used one in-place results in some
> > degree of data plane failure on affected interfaces, i.e. they stop passing
> > all or some subset of traffic. Even on "fixed" code, the problem persists,
> > though less frequently.
>
> Do you happen to have a bug reference for this? We’ve been seeing this behaviour intermittently on some csr 1ks and haven’t had the time/energy to debate it with TAC yet.

Somewhat related, IOS (all flavours) do in-place ACL unless you do
object ACLs. In-place ACL update behaviour essentially doubles your
ACL scale, if you are running exactly 1 large ACL but it's
unpredictable what happens when ACL is changed.
Many other devices, such as Juniper program new copy and then switch
the ACL pointer to new copy and delete old, making it predictable but
halving the ACL size if you are running exactly 1 large ACL as you
need double space during reprogramming.

Consider old ACL

100 deny host 1.1.1.1
200 deny host 2.2.2.2
300 permit any

Consider new ACL

100 deny host 1.1.1.1
200 deny host 2.2.2.2
300 deny host 3.3.3.3
400 permit any

this change would cause interruption of traffic if implied default is
deny (IOS-XR) because the ACL solver has to remove the '300 permit
deny' to fit the new rules and during this delta all packets are
hitting implied deny. The implicit default thus optimizes for security
rather than hitlessness.
If instead of 300 permit any, you had used 100000 permit any, during
reprogramming you might have permitted something you should not have
(not in this case), but you would not have dropped anything you should
not, which may be much more desirable behaviour for example iACL
updates, you'd rather let packets pass for few microseconds than drop
what should not be dropped.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: ASR920: egress ACL on BDIs [ In reply to ]
>
> Do you happen to have a bug reference for this? We’ve been seeing this
> behaviour intermittently on some csr 1ks and haven’t had the time/energy to
> debate it with TAC yet.


Sorry, just saw this.

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCuw19907 . That's for the
Catalyst 4500x, which is just a 4500 Sup7L and has its own set of limiting
problems for us. We were on 15.2(4)E6 and were advised to update to
15.2(4)E8. After the update we still saw a subset of the problem. You'll
see that bug specifically references the use of the "log" keyword on an
ACE, which was true to our config. I would doubt a relationship between
our experience and yours because of the significant difference in the
platforms.
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: ASR920: egress ACL on BDIs [ In reply to ]
>
> Somewhat related, IOS (all flavours) do in-place ACL unless you do
> object ACLs. In-place ACL update behaviour essentially doubles your
>

FWIW we are actually using object ACLs. What's the behavior then?
Copy-swap? Is there a real name for that which I'm not remembering?
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: ASR920: egress ACL on BDIs [ In reply to ]
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/