Mailing List Archive

Issues with Routes in FreeBSD / PfSense New to Release 1.0
Hello.


Can anybody review this and see what they think ? https://forum.pfsense.org/index.php?topic=111108.0


When triggering failover ... the failover link does not work with version 1 .... reverting back to .99 no problems. Pfsense Team seems to think it's something regarding Zebra restart... Several users have confirmed this issue. See thread for further info.


Thanks !
Re: Issues with Routes in FreeBSD / PfSense New to Release 1.0 [ In reply to ]
Just seeing this now…

On 3 Oct 2016, at 18:13, Reqlez Guy wrote:

> Hello.
>
> Can anybody review this and see what they think ?
> https://forum.pfsense.org/index.php?topic=111108.0
>
> When triggering failover ... the failover link does not work with
> version 1 .... reverting back to .99 no problems. Pfsense Team seems
> to think it's something regarding Zebra restart... Several users have
> confirmed this issue. See thread for further info.

What is the “normal” version of Quagga in PfSense? (i.e. output of
“zebra -v”)
Is this 1.0.20160315 ?

Can you post the output of a “zebra -v” from it (it should give some
compile options as well)
And what is the base OS? FreeBSD 10?

We are just about trying to get a new version out. If you are able to
compile your own
version, then it might be worthwhile to download and build from the
latest git master.

- Martin Winter


_______________________________________________
Quagga-users mailing list
Quagga-users@lists.quagga.net
https://lists.quagga.net/mailman/listinfo/quagga-users
Re: Issues with Routes in FreeBSD / PfSense New to Release 1.0 [ In reply to ]
Sorry it seems I'm a lists noob and didnt realize this stuff was not going into list... i'm assuming that i should be emailing the list only ? and not Martin or anybody else ? I CCed Martin just in case.



So as per Martin, he thinks what is triggering the issue is the use of -9 to terminate quagga process in pfsense rc scripts. I did submit the debug logs to Martin... not sure if you need more. And no, I have no tested the routers yet while eliminating -9 from pfsense scripts.


Martin: Please see below response from a person in pfsense forum:


I see Martin's reply to you on Oct. 10, but I don't see anything after that. Are you emailing him off-list?

I was looking through the Quagga code last night, and found something that I'm wondering whether or not could be the problem. Quagga (zebra daemon) puts routes into the kernel with flag "1" (RTF_PROTO1, see netstat man page). When zebra starts up it's supposed to ignore (filter out) any kernel routes with flag "1" because it should assume it put those there to begin with. I think before Quagga version 1 this was working, and in version >= 1 it pulls in those kernel routes into the zebra RIB.

If I reboot a firewall and go to OSPF -> Status -> Zebra routes, I see a bunch of OSPF routes but barely any K (kernel) routes. If I make any change on the Global Settings or Interface Settings tab quagga restarts, and then when looking at the zebra routes it is filled with kernel routes (one for each OSPF route).

Can you ask Martin to look at this:
Commit: https://github.com/Quagga/quagga/commit/0d0686f98e64017415071e590bde262f0ab5a4c9

File: zebra/zebra_rib.c
Function: rib_sweep_table

This function is commented out starting in version 1, but it was used in version 0.99.24. There is a block of code in it:

Code: [Select]

if (rib->type == ZEBRA_ROUTE_KERNEL &&
CHECK_FLAG (rib->flags, ZEBRA_FLAG_SELFROUTE))
{
ret = rib_uninstall_kernel (rn, rib);
if (! ret)
rib_delnode (rn, rib);
}

The rib_weed_tables function that is still being used doesn't seem to do this same thing, from what I can tell. This URL shows them side-by-side: https://fossies.org/diffs/quagga/0.99.24.1_vs_1.0.20160315/zebra/zebra_rib.c-diff.html

If you can point me to the thread where you are discussing this with Martin, I can pass this along to him if you prefer.


________________________________
From: Martin Winter <mwinter@opensourcerouting.org>
Sent: 10 October 2016 04:42
To: Reqlez Guy
Cc: quagga-users@lists.quagga.net
Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD / PfSense New to Release 1.0

Just seeing this now...

On 3 Oct 2016, at 18:13, Reqlez Guy wrote:

> Hello.
>
> Can anybody review this and see what they think ?
> https://forum.pfsense.org/index.php?topic=111108.0
Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0)<https://forum.pfsense.org/index.php?topic=111108.0>
forum.pfsense.org
Author Topic: Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0) (Read 3006 times)



>
> When triggering failover ... the failover link does not work with
> version 1 .... reverting back to .99 no problems. Pfsense Team seems
> to think it's something regarding Zebra restart... Several users have
> confirmed this issue. See thread for further info.

What is the "normal" version of Quagga in PfSense? (i.e. output of
"zebra -v")
Is this 1.0.20160315 ?

Can you post the output of a "zebra -v" from it (it should give some
compile options as well)
And what is the base OS? FreeBSD 10?

We are just about trying to get a new version out. If you are able to
compile your own
version, then it might be worthwhile to download and build from the
latest git master.

- Martin Winter
Re: Issues with Routes in FreeBSD / PfSense New to Release 1.0 [ In reply to ]
So here are the responses from Pfsense and another user in the forum:



----------

Spydre13:


I looked at the changelog too, and didn't see anything that would fix this. The main problem is that when Quagga restarts, it doesn't recognize the routes that it previously put in there, so it pulls them in as "kernel" routes and they will always take precedence. That's why it works fine until Quagga is restarted (which is basically kill & start, there is no graceful restart in Quagga). Since the rib_sweep_table() function isn't used anymore, when it starts up it doesn't remove routes from the list of kernel routes that it previously put there (which it flags as RTF_PROTO1, or "1" in netstat -r). I don't see how they aren't having more issues with this, unless the common scenario is that Quagga never gets restarted unless the whole OS is restarted.

I don't see why kill -9 matters here, because it worked fine before v1.0, and there is no graceful restart capability in Quagga. Ideally pfSense could use the Quagga VTY to make changes live without restarting, and then write changes to the config files for the next time it starts up, but I doubt anyone wants to take on a project like that.

If you want more details let me know, but it would probably make more sense to discuss on the Quagga list instead of here.
----------



----------

Jimp from pfsense team:


That sounds like the issue. Preventing it from restarting is a hackish workaround no matter what signal is used. It will get restarted at some point and failing to recover gracefully is a regression in quagga's behavior in 1.x.

It needs to recognize the flags it sets on routes in the table, and it isn't. Hopefully someone at Quagga can pick up and run with that on their list.
----------


Can anybody comment ? Since this bug has been in the code for 8 months now ... or more ...


________________________________
From: Reqlez Guy
Sent: 18 October 2016 19:46
To: quagga-users@lists.quagga.net
Cc: Martin Winter
Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD / PfSense New to Release 1.0


Sorry it seems I'm a lists noob and didnt realize this stuff was not going into list... i'm assuming that i should be emailing the list only ? and not Martin or anybody else ? I CCed Martin just in case.



So as per Martin, he thinks what is triggering the issue is the use of -9 to terminate quagga process in pfsense rc scripts. I did submit the debug logs to Martin... not sure if you need more. And no, I have no tested the routers yet while eliminating -9 from pfsense scripts.


Martin: Please see below response from a person in pfsense forum:


I see Martin's reply to you on Oct. 10, but I don't see anything after that. Are you emailing him off-list?

I was looking through the Quagga code last night, and found something that I'm wondering whether or not could be the problem. Quagga (zebra daemon) puts routes into the kernel with flag "1" (RTF_PROTO1, see netstat man page). When zebra starts up it's supposed to ignore (filter out) any kernel routes with flag "1" because it should assume it put those there to begin with. I think before Quagga version 1 this was working, and in version >= 1 it pulls in those kernel routes into the zebra RIB.

If I reboot a firewall and go to OSPF -> Status -> Zebra routes, I see a bunch of OSPF routes but barely any K (kernel) routes. If I make any change on the Global Settings or Interface Settings tab quagga restarts, and then when looking at the zebra routes it is filled with kernel routes (one for each OSPF route).

Can you ask Martin to look at this:
Commit: https://github.com/Quagga/quagga/commit/0d0686f98e64017415071e590bde262f0ab5a4c9

File: zebra/zebra_rib.c
Function: rib_sweep_table

This function is commented out starting in version 1, but it was used in version 0.99.24. There is a block of code in it:

Code: [Select]

if (rib->type == ZEBRA_ROUTE_KERNEL &&
CHECK_FLAG (rib->flags, ZEBRA_FLAG_SELFROUTE))
{
ret = rib_uninstall_kernel (rn, rib);
if (! ret)
rib_delnode (rn, rib);
}

The rib_weed_tables function that is still being used doesn't seem to do this same thing, from what I can tell. This URL shows them side-by-side: https://fossies.org/diffs/quagga/0.99.24.1_vs_1.0.20160315/zebra/zebra_rib.c-diff.html

If you can point me to the thread where you are discussing this with Martin, I can pass this along to him if you prefer.


________________________________
From: Martin Winter <mwinter@opensourcerouting.org>
Sent: 10 October 2016 04:42
To: Reqlez Guy
Cc: quagga-users@lists.quagga.net
Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD / PfSense New to Release 1.0

Just seeing this now...

On 3 Oct 2016, at 18:13, Reqlez Guy wrote:

> Hello.
>
> Can anybody review this and see what they think ?
> https://forum.pfsense.org/index.php?topic=111108.0
Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0)<https://forum.pfsense.org/index.php?topic=111108.0>
forum.pfsense.org
Author Topic: Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0) (Read 3006 times)



>
> When triggering failover ... the failover link does not work with
> version 1 .... reverting back to .99 no problems. Pfsense Team seems
> to think it's something regarding Zebra restart... Several users have
> confirmed this issue. See thread for further info.

What is the "normal" version of Quagga in PfSense? (i.e. output of
"zebra -v")
Is this 1.0.20160315 ?

Can you post the output of a "zebra -v" from it (it should give some
compile options as well)
And what is the base OS? FreeBSD 10?

We are just about trying to get a new version out. If you are able to
compile your own
version, then it might be worthwhile to download and build from the
latest git master.

- Martin Winter
Re: Issues with Routes in FreeBSD / PfSense New to Release 1.0 [ In reply to ]
Wait ... is this bug related to this ??? https://lists.quagga.net/pipermail/quagga-dev/2016-February/014777.html Sounds very familiar ( since this looks like similar thing people are experiencing... ) and Pfsense is on FreeBSD 10 ...


________________________________
From: Reqlez Guy
Sent: 30 October 2016 01:08
To: quagga-users@lists.quagga.net
Cc: Martin Winter
Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD / PfSense New to Release 1.0


So here are the responses from Pfsense and another user in the forum:



----------

Spydre13:


I looked at the changelog too, and didn't see anything that would fix this. The main problem is that when Quagga restarts, it doesn't recognize the routes that it previously put in there, so it pulls them in as "kernel" routes and they will always take precedence. That's why it works fine until Quagga is restarted (which is basically kill & start, there is no graceful restart in Quagga). Since the rib_sweep_table() function isn't used anymore, when it starts up it doesn't remove routes from the list of kernel routes that it previously put there (which it flags as RTF_PROTO1, or "1" in netstat -r). I don't see how they aren't having more issues with this, unless the common scenario is that Quagga never gets restarted unless the whole OS is restarted.

I don't see why kill -9 matters here, because it worked fine before v1.0, and there is no graceful restart capability in Quagga. Ideally pfSense could use the Quagga VTY to make changes live without restarting, and then write changes to the config files for the next time it starts up, but I doubt anyone wants to take on a project like that.

If you want more details let me know, but it would probably make more sense to discuss on the Quagga list instead of here.
----------



----------

Jimp from pfsense team:


That sounds like the issue. Preventing it from restarting is a hackish workaround no matter what signal is used. It will get restarted at some point and failing to recover gracefully is a regression in quagga's behavior in 1.x.

It needs to recognize the flags it sets on routes in the table, and it isn't. Hopefully someone at Quagga can pick up and run with that on their list.
----------


Can anybody comment ? Since this bug has been in the code for 8 months now ... or more ...


________________________________
From: Reqlez Guy
Sent: 18 October 2016 19:46
To: quagga-users@lists.quagga.net
Cc: Martin Winter
Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD / PfSense New to Release 1.0


Sorry it seems I'm a lists noob and didnt realize this stuff was not going into list... i'm assuming that i should be emailing the list only ? and not Martin or anybody else ? I CCed Martin just in case.



So as per Martin, he thinks what is triggering the issue is the use of -9 to terminate quagga process in pfsense rc scripts. I did submit the debug logs to Martin... not sure if you need more. And no, I have no tested the routers yet while eliminating -9 from pfsense scripts.


Martin: Please see below response from a person in pfsense forum:


I see Martin's reply to you on Oct. 10, but I don't see anything after that. Are you emailing him off-list?

I was looking through the Quagga code last night, and found something that I'm wondering whether or not could be the problem. Quagga (zebra daemon) puts routes into the kernel with flag "1" (RTF_PROTO1, see netstat man page). When zebra starts up it's supposed to ignore (filter out) any kernel routes with flag "1" because it should assume it put those there to begin with. I think before Quagga version 1 this was working, and in version >= 1 it pulls in those kernel routes into the zebra RIB.

If I reboot a firewall and go to OSPF -> Status -> Zebra routes, I see a bunch of OSPF routes but barely any K (kernel) routes. If I make any change on the Global Settings or Interface Settings tab quagga restarts, and then when looking at the zebra routes it is filled with kernel routes (one for each OSPF route).

Can you ask Martin to look at this:
Commit: https://github.com/Quagga/quagga/commit/0d0686f98e64017415071e590bde262f0ab5a4c9

File: zebra/zebra_rib.c
Function: rib_sweep_table

This function is commented out starting in version 1, but it was used in version 0.99.24. There is a block of code in it:

Code: [Select]

if (rib->type == ZEBRA_ROUTE_KERNEL &&
CHECK_FLAG (rib->flags, ZEBRA_FLAG_SELFROUTE))
{
ret = rib_uninstall_kernel (rn, rib);
if (! ret)
rib_delnode (rn, rib);
}

The rib_weed_tables function that is still being used doesn't seem to do this same thing, from what I can tell. This URL shows them side-by-side: https://fossies.org/diffs/quagga/0.99.24.1_vs_1.0.20160315/zebra/zebra_rib.c-diff.html

If you can point me to the thread where you are discussing this with Martin, I can pass this along to him if you prefer.


________________________________
From: Martin Winter <mwinter@opensourcerouting.org>
Sent: 10 October 2016 04:42
To: Reqlez Guy
Cc: quagga-users@lists.quagga.net
Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD / PfSense New to Release 1.0

Just seeing this now...

On 3 Oct 2016, at 18:13, Reqlez Guy wrote:

> Hello.
>
> Can anybody review this and see what they think ?
> https://forum.pfsense.org/index.php?topic=111108.0
Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0)<https://forum.pfsense.org/index.php?topic=111108.0>
forum.pfsense.org
Author Topic: Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0) (Read 3006 times)



>
> When triggering failover ... the failover link does not work with
> version 1 .... reverting back to .99 no problems. Pfsense Team seems
> to think it's something regarding Zebra restart... Several users have
> confirmed this issue. See thread for further info.

What is the "normal" version of Quagga in PfSense? (i.e. output of
"zebra -v")
Is this 1.0.20160315 ?

Can you post the output of a "zebra -v" from it (it should give some
compile options as well)
And what is the base OS? FreeBSD 10?

We are just about trying to get a new version out. If you are able to
compile your own
version, then it might be worthwhile to download and build from the
latest git master.

- Martin Winter
Re: Issues with Routes in FreeBSD / PfSense New to Release 1.0 [ In reply to ]
Reqlez,

On 12 Nov 2016, at 22:13, Reqlez Guy wrote:

> Wait ... is this bug related to this ???
> https://lists.quagga.net/pipermail/quagga-dev/2016-February/014777.html
> Sounds very familiar ( since this looks like similar thing people
> are experiencing... ) and Pfsense is on FreeBSD 10 ...

No, this is a different issue.
(I hope to have some closure on that issue from february soon as well,
but that issue is actually some updates from Zebra to FreeBSD getting
ignored and it looks like some
data corruption)

The issue here (and what I still lack to understand) is the reason for
pfsense to kill Quagga every time something changes.
And it does this the most brutal way with a “kill -9” which give all
the quagga processes no chance for cleanup.

Quagga is designed for dynamic routing, so an interface going down etc
is what it is designed to work with and
recalculate the routing tables based on this - and in a much faster way
than a full rebuilding all tables because
of a restart.
I assume there must be some reason for the kill - maybe another bug? -
and I would love to get an answer
for this.

- Martin

> ________________________________
> From: Reqlez Guy
> Sent: 30 October 2016 01:08
> To: quagga-users@lists.quagga.net
> Cc: Martin Winter
> Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD /
> PfSense New to Release 1.0
>
> So here are the responses from Pfsense and another user in the forum:
>
> ----------
>
> Spydre13:
>
> I looked at the changelog too, and didn't see anything that would fix
> this. The main problem is that when Quagga restarts, it doesn't
> recognize the routes that it previously put in there, so it pulls them
> in as "kernel" routes and they will always take precedence. That's
> why it works fine until Quagga is restarted (which is basically kill &
> start, there is no graceful restart in Quagga). Since the
> rib_sweep_table() function isn't used anymore, when it starts up it
> doesn't remove routes from the list of kernel routes that it
> previously put there (which it flags as RTF_PROTO1, or "1" in netstat
> -r). I don't see how they aren't having more issues with this, unless
> the common scenario is that Quagga never gets restarted unless the
> whole OS is restarted.
>
> I don't see why kill -9 matters here, because it worked fine before
> v1.0, and there is no graceful restart capability in Quagga. Ideally
> pfSense could use the Quagga VTY to make changes live without
> restarting, and then write changes to the config files for the next
> time it starts up, but I doubt anyone wants to take on a project like
> that.
>
> If you want more details let me know, but it would probably make more
> sense to discuss on the Quagga list instead of here.
> ----------
>
> ----------
>
> Jimp from pfsense team:
>
> That sounds like the issue. Preventing it from restarting is a hackish
> workaround no matter what signal is used. It will get restarted at
> some point and failing to recover gracefully is a regression in
> quagga's behavior in 1.x.
>
> It needs to recognize the flags it sets on routes in the table, and it
> isn't. Hopefully someone at Quagga can pick up and run with that on
> their list.
> ----------
>
> Can anybody comment ? Since this bug has been in the code for 8 months
> now ... or more ...
>
> ________________________________
> From: Reqlez Guy
> Sent: 18 October 2016 19:46
> To: quagga-users@lists.quagga.net
> Cc: Martin Winter
> Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD /
> PfSense New to Release 1.0
>
> Sorry it seems I'm a lists noob and didnt realize this stuff was not
> going into list... i'm assuming that i should be emailing the list
> only ? and not Martin or anybody else ? I CCed Martin just in case.
>
> So as per Martin, he thinks what is triggering the issue is the use of
> -9 to terminate quagga process in pfsense rc scripts. I did submit the
> debug logs to Martin... not sure if you need more. And no, I have no
> tested the routers yet while eliminating -9 from pfsense scripts.
>
> Martin: Please see below response from a person in pfsense forum:
>
> I see Martin's reply to you on Oct. 10, but I don't see anything after
> that. Are you emailing him off-list?
>
> I was looking through the Quagga code last night, and found something
> that I'm wondering whether or not could be the problem. Quagga (zebra
> daemon) puts routes into the kernel with flag "1" (RTF_PROTO1, see
> netstat man page). When zebra starts up it's supposed to ignore
> (filter out) any kernel routes with flag "1" because it should assume
> it put those there to begin with. I think before Quagga version 1
> this was working, and in version >= 1 it pulls in those kernel routes
> into the zebra RIB.
>
> If I reboot a firewall and go to OSPF -> Status -> Zebra routes, I see
> a bunch of OSPF routes but barely any K (kernel) routes. If I make
> any change on the Global Settings or Interface Settings tab quagga
> restarts, and then when looking at the zebra routes it is filled with
> kernel routes (one for each OSPF route).
>
> Can you ask Martin to look at this:
> Commit:
> https://github.com/Quagga/quagga/commit/0d0686f98e64017415071e590bde262f0ab5a4c9
>
> File: zebra/zebra_rib.c
> Function: rib_sweep_table
>
> This function is commented out starting in version 1, but it was used
> in version 0.99.24. There is a block of code in it:
>
> Code: [Select]
>
> if (rib->type == ZEBRA_ROUTE_KERNEL &&
> CHECK_FLAG (rib->flags, ZEBRA_FLAG_SELFROUTE))
> {
> ret = rib_uninstall_kernel (rn, rib);
> if (! ret)
> rib_delnode (rn, rib);
> }
>
> The rib_weed_tables function that is still being used doesn't seem to
> do this same thing, from what I can tell. This URL shows them
> side-by-side:
> https://fossies.org/diffs/quagga/0.99.24.1_vs_1.0.20160315/zebra/zebra_rib.c-diff.html
>
> If you can point me to the thread where you are discussing this with
> Martin, I can pass this along to him if you prefer.
>
> ________________________________
> From: Martin Winter <mwinter@opensourcerouting.org>
> Sent: 10 October 2016 04:42
> To: Reqlez Guy
> Cc: quagga-users@lists.quagga.net
> Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD /
> PfSense New to Release 1.0
>
> Just seeing this now...
>
> On 3 Oct 2016, at 18:13, Reqlez Guy wrote:
>
>> Hello.
>>
>> Can anybody review this and see what they think ?
>> https://forum.pfsense.org/index.php?topic=111108.0
> Major issue with QUAGGA-OSPF and VLANs (pfsense
> 2.3.0)<https://forum.pfsense.org/index.php?topic=111108.0>
> forum.pfsense.org
> Author Topic: Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0)
> (Read 3006 times)
>
>>
>> When triggering failover ... the failover link does not work with
>> version 1 .... reverting back to .99 no problems. Pfsense Team seems
>> to think it's something regarding Zebra restart... Several users have
>> confirmed this issue. See thread for further info.
>
> What is the "normal" version of Quagga in PfSense? (i.e. output of
> "zebra -v")
> Is this 1.0.20160315 ?
>
> Can you post the output of a "zebra -v" from it (it should give some
> compile options as well)
> And what is the base OS? FreeBSD 10?
>
> We are just about trying to get a new version out. If you are able to
> compile your own
> version, then it might be worthwhile to download and build from the
> latest git master.
>
> - Martin Winter



_______________________________________________
Quagga-users mailing list
Quagga-users@lists.quagga.net
https://lists.quagga.net/mailman/listinfo/quagga-users
Re: Issues with Routes in FreeBSD / PfSense New to Release 1.0 [ In reply to ]
Martin,

The issue is that when zebra starts up it fails to recognize routes in the
kernel that it previously put there. To put it another way, it fails to
recognize that those kernel routes were put there by itself (with flag
"1"), so it assumes they are kernel routes and adds them to the RIB. From
then on, those routes will always take priority, and never go away, so in
effect OSPF is broken.

I have pointed out where (I think) the error comes from, both in this
thread and in a dev thread, but haven't received any response. Whether
zebra is killed in a brutal or nice way, zebra should recognize routes it
inserted into the kernel the next time it starts up, which it currently
does not do (since version >=1.0). Is this not a problem on other
operating systems, or is it just that in most cases zebra is never
restarted unless the entire operating system is restarted?

If you could suggest the proper way to restart zebra that would be
appreciated. However, that would still not be ideal because the routes
would be removed for a short period of time until they are re-inserted
(when zebra starts back up), which is not the intention. Of course it
would be ideal if pfSense could make config changes using the zebra VTY,
but it currently does not have that capability. Currently pfSense writes
changes to the Quagga config files (when a change is made through the web
interface) and then restarts both daemons.

Please let us know what other info we can provide, and how we can help to
get this issue resolved. Compensation for fixing this "bug" is a
possibility.

Thank you,
Nate Baker

On Tue, Nov 15, 2016 at 6:06 PM, Martin Winter <
mwinter@opensourcerouting.org> wrote:

> Reqlez,
>
> On 12 Nov 2016, at 22:13, Reqlez Guy wrote:
>
> Wait ... is this bug related to this ??? https://lists.quagga.net/piper
>> mail/quagga-dev/2016-February/014777.html Sounds very familiar ( since
>> this looks like similar thing people are experiencing... ) and Pfsense is
>> on FreeBSD 10 ...
>>
>
> No, this is a different issue.
> (I hope to have some closure on that issue from february soon as well, but
> that issue is actually some updates from Zebra to FreeBSD getting ignored
> and it looks like some
> data corruption)
>
> The issue here (and what I still lack to understand) is the reason for
> pfsense to kill Quagga every time something changes.
> And it does this the most brutal way with a “kill -9” which give all the
> quagga processes no chance for cleanup.
>
> Quagga is designed for dynamic routing, so an interface going down etc is
> what it is designed to work with and
> recalculate the routing tables based on this - and in a much faster way
> than a full rebuilding all tables because
> of a restart.
> I assume there must be some reason for the kill - maybe another bug? - and
> I would love to get an answer
> for this.
>
> - Martin
>
> ________________________________
>> From: Reqlez Guy
>> Sent: 30 October 2016 01:08
>> To: quagga-users@lists.quagga.net
>> Cc: Martin Winter
>> Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD / PfSense
>> New to Release 1.0
>>
>> So here are the responses from Pfsense and another user in the forum:
>>
>> ----------
>>
>> Spydre13:
>>
>> I looked at the changelog too, and didn't see anything that would fix
>> this. The main problem is that when Quagga restarts, it doesn't recognize
>> the routes that it previously put in there, so it pulls them in as "kernel"
>> routes and they will always take precedence. That's why it works fine
>> until Quagga is restarted (which is basically kill & start, there is no
>> graceful restart in Quagga). Since the rib_sweep_table() function isn't
>> used anymore, when it starts up it doesn't remove routes from the list of
>> kernel routes that it previously put there (which it flags as RTF_PROTO1,
>> or "1" in netstat -r). I don't see how they aren't having more issues with
>> this, unless the common scenario is that Quagga never gets restarted unless
>> the whole OS is restarted.
>>
>> I don't see why kill -9 matters here, because it worked fine before v1.0,
>> and there is no graceful restart capability in Quagga. Ideally pfSense
>> could use the Quagga VTY to make changes live without restarting, and then
>> write changes to the config files for the next time it starts up, but I
>> doubt anyone wants to take on a project like that.
>>
>> If you want more details let me know, but it would probably make more
>> sense to discuss on the Quagga list instead of here.
>> ----------
>>
>> ----------
>>
>> Jimp from pfsense team:
>>
>> That sounds like the issue. Preventing it from restarting is a hackish
>> workaround no matter what signal is used. It will get restarted at some
>> point and failing to recover gracefully is a regression in quagga's
>> behavior in 1.x.
>>
>> It needs to recognize the flags it sets on routes in the table, and it
>> isn't. Hopefully someone at Quagga can pick up and run with that on their
>> list.
>> ----------
>>
>> Can anybody comment ? Since this bug has been in the code for 8 months
>> now ... or more ...
>>
>> ________________________________
>> From: Reqlez Guy
>> Sent: 18 October 2016 19:46
>> To: quagga-users@lists.quagga.net
>> Cc: Martin Winter
>> Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD / PfSense
>> New to Release 1.0
>>
>> Sorry it seems I'm a lists noob and didnt realize this stuff was not
>> going into list... i'm assuming that i should be emailing the list only ?
>> and not Martin or anybody else ? I CCed Martin just in case.
>>
>> So as per Martin, he thinks what is triggering the issue is the use of -9
>> to terminate quagga process in pfsense rc scripts. I did submit the debug
>> logs to Martin... not sure if you need more. And no, I have no tested the
>> routers yet while eliminating -9 from pfsense scripts.
>>
>> Martin: Please see below response from a person in pfsense forum:
>>
>> I see Martin's reply to you on Oct. 10, but I don't see anything after
>> that. Are you emailing him off-list?
>>
>> I was looking through the Quagga code last night, and found something
>> that I'm wondering whether or not could be the problem. Quagga (zebra
>> daemon) puts routes into the kernel with flag "1" (RTF_PROTO1, see netstat
>> man page). When zebra starts up it's supposed to ignore (filter out) any
>> kernel routes with flag "1" because it should assume it put those there to
>> begin with. I think before Quagga version 1 this was working, and in
>> version >= 1 it pulls in those kernel routes into the zebra RIB.
>>
>> If I reboot a firewall and go to OSPF -> Status -> Zebra routes, I see a
>> bunch of OSPF routes but barely any K (kernel) routes. If I make any
>> change on the Global Settings or Interface Settings tab quagga restarts,
>> and then when looking at the zebra routes it is filled with kernel routes
>> (one for each OSPF route).
>>
>> Can you ask Martin to look at this:
>> Commit: https://github.com/Quagga/quagga/commit/0d0686f98e6401741507
>> 1e590bde262f0ab5a4c9
>>
>> File: zebra/zebra_rib.c
>> Function: rib_sweep_table
>>
>> This function is commented out starting in version 1, but it was used in
>> version 0.99.24. There is a block of code in it:
>>
>> Code: [Select]
>>
>> if (rib->type == ZEBRA_ROUTE_KERNEL &&
>> CHECK_FLAG (rib->flags, ZEBRA_FLAG_SELFROUTE))
>> {
>> ret = rib_uninstall_kernel (rn, rib);
>> if (! ret)
>> rib_delnode (rn, rib);
>> }
>>
>> The rib_weed_tables function that is still being used doesn't seem to do
>> this same thing, from what I can tell. This URL shows them side-by-side:
>> https://fossies.org/diffs/quagga/0.99.24.1_vs_1.0.20160315/
>> zebra/zebra_rib.c-diff.html
>>
>> If you can point me to the thread where you are discussing this with
>> Martin, I can pass this along to him if you prefer.
>>
>> ________________________________
>> From: Martin Winter <mwinter@opensourcerouting.org>
>> Sent: 10 October 2016 04:42
>> To: Reqlez Guy
>> Cc: quagga-users@lists.quagga.net
>> Subject: Re: [quagga-users 14440] Issues with Routes in FreeBSD / PfSense
>> New to Release 1.0
>>
>> Just seeing this now...
>>
>> On 3 Oct 2016, at 18:13, Reqlez Guy wrote:
>>
>> Hello.
>>>
>>> Can anybody review this and see what they think ?
>>> https://forum.pfsense.org/index.php?topic=111108.0
>>>
>> Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0)<
>> https://forum.pfsense.org/index.php?topic=111108.0>
>> forum.pfsense.org
>> Author Topic: Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0)
>> (Read 3006 times)
>>
>>
>>> When triggering failover ... the failover link does not work with
>>> version 1 .... reverting back to .99 no problems. Pfsense Team seems
>>> to think it's something regarding Zebra restart... Several users have
>>> confirmed this issue. See thread for further info.
>>>
>>
>> What is the "normal" version of Quagga in PfSense? (i.e. output of
>> "zebra -v")
>> Is this 1.0.20160315 ?
>>
>> Can you post the output of a "zebra -v" from it (it should give some
>> compile options as well)
>> And what is the base OS? FreeBSD 10?
>>
>> We are just about trying to get a new version out. If you are able to
>> compile your own
>> version, then it might be worthwhile to download and build from the
>> latest git master.
>>
>> - Martin Winter
>>
>
>
>
> _______________________________________________
> Quagga-users mailing list
> Quagga-users@lists.quagga.net
> https://lists.quagga.net/mailman/listinfo/quagga-users
>
Re: Issues with Routes in FreeBSD / PfSense New to Release 1.0 [ In reply to ]
On 11/16/2016 1:49 PM, Nathan Baker wrote:
>
> If you could suggest the proper way to restart zebra that would be
> appreciated. However, that would still not be ideal because the routes

Not sure why they use a kill -9 to shut it down. If you gracefully shut
down bgpd ospfd and the zebra, the routes are all properly withdrawn.
You could also do a route flush before starting the quagga daemons.

---Mike


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

_______________________________________________
Quagga-users mailing list
Quagga-users@lists.quagga.net
https://lists.quagga.net/mailman/listinfo/quagga-users
Re: Issues with Routes in FreeBSD / PfSense New to Release 1.0 [ In reply to ]
If there was a graceful restart, that would probably work fine. However,
I've read several times on the lists that there is no such thing for
Quagga. If we do a graceful shut down and then start again, traffic will
drop during that short period of time when the routes are removed, until
OSPF learns the routes again and re-inserts them.

Why is everyone dancing around the main question here...which is: Why did
someone decide to remove the code that filters out kernel routes that were
put there by zebra, before they are put into the RIB? What benefit is
there in removing that? These routes are inserted into the kernel with the
proper flag, and they are detected as zebra routes when zebra starts up,
but someone removed (actually commented out) the function that removes them
from the kernel route list before inserting into zebra's RIB. My guess is
that it was unintentional (hence it being commented out), either it was
meant to be put back or someone didn't understand what it was doing.

-Nate

On Wed, Nov 16, 2016 at 3:24 PM, Mike Tancsa <mike@sentex.net> wrote:

> On 11/16/2016 1:49 PM, Nathan Baker wrote:
> >
> > If you could suggest the proper way to restart zebra that would be
> > appreciated. However, that would still not be ideal because the routes
>
> Not sure why they use a kill -9 to shut it down. If you gracefully shut
> down bgpd ospfd and the zebra, the routes are all properly withdrawn.
> You could also do a route flush before starting the quagga daemons.
>
> ---Mike
>
>
> --
> -------------------
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, mike@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada http://www.tancsa.com/
>
Re: Issues with Routes in FreeBSD / PfSense New to Release 1.0 [ In reply to ]
On 11/16/2016 3:53 PM, Nathan Baker wrote:
> If there was a graceful restart, that would probably work fine.

On FreeBSD, if I do

0(ps_spare3)# netstat -nr | wc
58 324 4108
0(ps_spare3)# killall bgpd
0(ps_spare3)# killall zebra
0(ps_spare3)# netstat -nr | wc
26 132 1804
0(ps_spare3)#

the shutdown clears out the routes that were added by bgpd so the
routing table is back to the way it was.

---Mike


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

_______________________________________________
Quagga-users mailing list
Quagga-users@lists.quagga.net
https://lists.quagga.net/mailman/listinfo/quagga-users
Re: Issues with Routes in FreeBSD / PfSense New to Release 1.0 [ In reply to ]
Mike,

I explained earlier in this thread how it works (or used to work), and in a
separate thread on the dev list. It uses flags for the kernel routes, in
this case RTF_PROTO1 (shows up as "1" in netstat -r).

Why are you trying so hard to resist this question? The functionality was
already in place, someone removed it as part of the first 1.0 release. If
it was still there, you wouldn't need to do a route flush before starting
when it crashes, even though it's rare. If you flush the routes, there is
a period of time where traffic will not pass until OSPF or BGP finishes
starting up and inserts the routes again. Why would you rather have this
extra step and disruption instead of just leaving the code in there that
worked perfectly fine?

-Nate

On Wed, Nov 16, 2016 at 4:31 PM, Mike Tancsa <mike@sentex.net> wrote:

> On 11/16/2016 4:06 PM, Nathan Baker wrote:
> > Mike - I understand, but that doesn't address the question. If zebra
> > crashed and you restarted it, you would run into the problem we're
> > talking about.
>
> Sure, I do a route flush in the cases where it crashes (which is
> thankfully rare)
>
> >
> > The reason for the kill/start is to pick up the new configuration, not
> > to clear the routes from the table.
>
> In this case its more of a "quagga was not intended to be configured
> that way".... Its designed to be configured through vtysh or through the
> local telnet interface, no ? I am not sure how its supposed to know
> what is still a valid route from before and what is not. Depending on
> the routing protocol, it might not be able to make deterministic
> decisions that way.
>
>
> ---Mike
>
> >
> > On Wed, Nov 16, 2016 at 3:59 PM, Mike Tancsa <mike@sentex.net
> > <mailto:mike@sentex.net>> wrote:
> >
> > On 11/16/2016 3:53 PM, Nathan Baker wrote:
> > > If there was a graceful restart, that would probably work fine.
> >
> > On FreeBSD, if I do
> >
> > 0(ps_spare3)# netstat -nr | wc
> > 58 324 4108
> > 0(ps_spare3)# killall bgpd
> > 0(ps_spare3)# killall zebra
> > 0(ps_spare3)# netstat -nr | wc
> > 26 132 1804
> > 0(ps_spare3)#
> >
> > the shutdown clears out the routes that were added by bgpd so the
> > routing table is back to the way it was.
> >
> > ---Mike
> >
> >
> > --
> > -------------------
> > Mike Tancsa, tel +1 519 651 3400 <tel:%2B1%20519%20651%203400>
> > Sentex Communications, mike@sentex.net <mailto:mike@sentex.net>
> > Providing Internet services since 1994 www.sentex.net
> > <http://www.sentex.net>
> > Cambridge, Ontario Canada http://www.tancsa.com/
> >
> >
>
>
> --
> -------------------
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, mike@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada http://www.tancsa.com/
>