Mailing List Archive

Next SPF run wait
 Hi, Any ideas, what "Next SPF run wait" actually means in the following output? It is a large negative value which is increasing by absolute value (will never reach zero). It looks like it never runs SFP as the box doesn't install new routes. I see that it learns the LSAs but it never uses them as routes: show ip routes / show ip ospf routes don't show theses routes, while I see the corresponding LSAs in the OSPF DB. All the neighbor state machinery works well. But active routes are never updated even if an interfaces and a corresponding neighbor fail. SSH@CER#show ip ospfOSPF Version                  Version 2Router Id                     10.x.x.x             ASBR Status                   No        ABR Status                    No         (0)Redistribute Ext Routes from   Initial SPF schedule delay    0          (msecs)Minimum hold time for SPFs    0          (msecs)Maximum hold time for SPFs    0          (msecs)Next SPF run wait (msecs)     -177170950External LSA Counter          0         External LSA Checksum Sum     00000000  Originate New LSA Counter     89        Rx New LSA Counter            3260      External LSA Limit            14447047  Database Overflow Interval    0         Database Overflow State :     NOT OVERFLOWED    RFC 1583 Compatibility :      Enabled           NSSA Translator:              Enabled           Nonstop Routing:              Disabled          Graceful Restart:             Disabled,   timer 120Graceful Restart Helper:      EnabledBFD:                          Enabled  I checked other NetIron boxes (CERs and MLXes) but this line is just not present in the output. We saw this behavior on another CERs, fixed this by a reboot. Unfortunately I don't have "show ip ospf" before the reboot from those boxes, now this line (Next SFP run wait) is just not present on the rebooted boxes. Looks like a bug, but maybe I am missing something. Can't it be something like SPF throttling and maybe there is a way to clear this state less disruptively? On the other hand, "clear ip ospf all" did not help yesterday.  --Kind regards,Pavel   
  
Re: Next SPF run wait [ In reply to ]
Hi,

this looks like an integer overflow and could be a bug. SPF will never
run with minus timer and I wonder which condition will trigger this.

What versions are your running?

I would start with
debug ospf <spf, adj, bfd, error)
show cpu-utilization detail | include ospf
show ip os neighbor

and maybe try and set different timers and then restart the router?
E.g.
router ospf
timers throttle spf 5 1000 90000

Jörg


On 2 Aug 2017, at 16:06, Pave Lunin wrote:

> Hi,
>
>
>
> Any ideas, what "Next SPF run wait" actually means in the following
> output? It is a large negative value which is increasing by absolute
> value (will never reach zero).
>
>
>
> It looks like it never runs SFP as the box doesn't install new routes.
> I see that it learns the LSAs but it never uses them as routes: show
> ip routes / show ip ospf routes don't show theses routes, while I see
> the corresponding LSAs in the OSPF DB. All the neighbor state
> machinery works well. But active routes are never updated even if an
> interfaces and a corresponding neighbor fail.
>
>
>
> SSH@CER#show ip ospf
>
> OSPF Version Version 2
>
> Router Id 10.x.x.x
>
> ASBR Status No
>
> ABR Status No (0)
>
> Redistribute Ext Routes from
>
> Initial SPF schedule delay 0 (msecs)
>
> Minimum hold time for SPFs 0 (msecs)
>
> Maximum hold time for SPFs 0 (msecs)
>
> **Next SPF run wait (msecs) -177170950**
>
> External LSA Counter 0
>
> External LSA Checksum Sum 00000000
>
> Originate New LSA Counter 89
>
> Rx New LSA Counter 3260
>
> External LSA Limit 14447047
>
> Database Overflow Interval 0
>
> Database Overflow State : NOT OVERFLOWED
>
> RFC 1583 Compatibility : Enabled
>
> NSSA Translator: Enabled
>
> Nonstop Routing: Disabled
>
> Graceful Restart: Disabled, timer 120
>
> Graceful Restart Helper: Enabled
>
> BFD: Enabled
>
>
>
>
>
> I checked other NetIron boxes (CERs and MLXes) but this line is just
> not present in the output.
>
>
>
> We saw this behavior on another CERs, fixed this by a reboot.
> Unfortunately I don't have "show ip ospf" before the reboot from those
> boxes, now this line (Next SFP run wait) is just not present on the
> rebooted boxes.
>
>
>
> Looks like a bug, but maybe I am missing something. Can't it be
> something like SPF throttling and maybe there is a way to clear this
> state less disruptively? On the other hand, "clear ip ospf all" did
> not help yesterday.
>
>
>
>
>
> \--
>
> Kind regards,
>
> Pavel
Re: Next SPF run wait [ In reply to ]
 Hi Jörg, Thanks for confirming my hypothesis. We run 5.2.0gT183, which is quite old, I know, but in fact we are decommissioning most of Brocade gear and prefer not to bother with testing and validation of the new bugs^W software. Things have been reasonably stable for us, but our NOC has seen this issue a few times during last year or two. Normally guys don't bother to troubleshoot, they just go ahead with planned/urgent reboot, thanks to the fact that CER reboots tremendously quickly. But at some point I came around to enjoy the hunt for a less brutal way to fix this state. Didn't succeed a lot though :) >debug ospf <spf, adj, bfd, error) I haven't found any debug option for SPF throttling. As SPF never runs, "debug ip ospf spf" shows nothing. >timers throttle spf 5 1000 90000 Thanks for the idea, I tried to play around with these timers but it doesn't affect the broken negative delay. Not sure I completely understood what you meant by "set different timers and reboot". Why reboot? You think that we might reproduce it this way? We don't have any SPF throttling configured, so, as can be seen from my output, these timers are set to 0 by default. This means, if I correctly understand the Cisco-like logic of NetIron software, it should run SPF on demand as frequently as it receives LSA updates. It doesn't care for any CPU spikes during SPF calculation, even if we had them. But we don't as of "show cpu-utilization detail | include ospf". This router doesn't struggle a lot with OSPF nor with anything else, it just carries less than 100 LSAa, pure MPLS core IGP (with TE), signle area, just two neighbors (uplinks) + BFD. As I said above, all the LSA flooding and neighbor state machinery works well. I see that it updates LSAs in the OSPF database as appropriate but the routes are stuck even if the corresponding interface/neighbor goes down. Well, I am not expecting any real help, rather just sharing my observations :) What is strange, however, is that routes are still active even if the next-hop interface goes down. AFAR, this should be treated at the pure data plane level: no forwarding next-hop — no FIB entry. However the routes are still active, both as of "show ip route", and "show ip ospf routes". This makes me think that it might be something beyond OSPF. I am not an expert in Brocade implementation details though. Regards,Pavel   03.08.2017, 18:21, "Jörg Kost" <jk@ip-clear.de>:


Hi,

this looks like an integer overflow and could be a bug. SPF will never run with minus timer and I wonder which condition will trigger this.

What versions are your running?

I would start with
debug ospf <spf, adj, bfd, error)
show cpu-utilization detail | include ospf
show ip os neighbor

and maybe try and set different timers and then restart the router?
E.g.
router ospf
timers throttle spf 5 1000 90000

Jörg