Mailing List Archive

BGP crashes on SLX-OS
Hi @all,

Since June, we have been seeing BGP crashes starting on SLX-OS 20.2.2 up to the latest release, 20.5.2. Since we had no service for the lab device and had to use it at short notice for a defective SLX with service (everything goes in circles), I had to convince the vendor to even work on the bug. One bug in BGP is simply one too many. Of course, I previously rejected a friendly offer from the vendor to help me with this BGP error in exchange for additional payments.

Evidence of crashes:
Wed Jun 28 2023 09:20:01 AM CEST /core_files/bgpdd/core.18139
Fri Aug 18, 2023 12:14:43 AM CEST /core_files/bgpdd/core.5934
Thu 02 Nov 2023 02:05:02 AM CET /core_files/bgpdd/core.18924
Thu Aug 17, 2023 11:01:07 AM CEST /core_files/bgpdd/core.28373
Wed Jun 14 2023 02:45:50 AM CEST /core_files/bgpdd/core.4426
Tue Aug 22 2023 12:15:55 PM CEST /core_files/bgpdd/core.23704
Wed 08 Nov 2023 03:18:02 AM CET /core_files/bgpdd/core.30966
Thu Oct 26, 2023 03:16:13 AM CEST /core_files/bgpdd/core.28370
Thu Nov 23, 2023 03:54:05 AM CET /core_files/bgpdd/core.4716

The BGP daemon crashes when processing BGP attributes; maybe someone is playing with BGP attributes; my proof that I sent to the vendor, for example, in the GNU debugger:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".

btCore was generated by `bgpd -S ethsw -s 0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000008e67de in bgp_allocate_memory()
[Current thread is 1 (Thread 0x7f8bb22f2680 (LWP 5584))]
(gdb)
(gdb) bt
#0 0x00000000008e67de in bgp_allocate_memory()
#1 0x000000000079e32d in bgp_general_counter_string_allocate_extra()
#2 0x0000000000838fe6 in bgp_process_community_attribute()
#3 0x0000000000833658 in bgp_process_path_attribute()
#4 0x0000000000830840 in bgp_process_update_message_field_by_field()

or

Core was generated by `bgpd -S ethsw -s 0 -R 1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000087d770 in bgp_free_memory ()
[Current thread is 1 (Thread 0x7feba5fa3700 (LWP 31007))]
(gdb)
(gdb) bt
#0 0x000000000087d770 in bgp_free_memory ()
#1 0x0000000000758194 in bgp_general_counter_string_insert ()
#2 0x00000000007e3027 in bgp_process_community_attribute ()
#3 0x00000000007dd65b in bgp_process_path_attribute ()
#4 0x00000000007daf1b in bgp_process_update_message_field_by_field ()


Now we have the following problem: I can no longer reproduce the bug because - you guessed it - this SLX switch also broke spontaneously, and we no longer want to set up the RMAed SLX at this critical position in the network (CEO decision).

That's why my question here is: Has anyone had similar crashes with their BGP daemon since June 2023 and perhaps already has a ticket open so that they might be able to reach their goal faster? Or are we the “only one” so far? I am concerned that this error may be closed.


BR

Jörg
_______________________________________________
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp