Mailing List Archive

don't retry a failed server
Hi,

I have a bit of a problem. If backhand sends a request to a server, and
the request fails, backhand tries to resend the request, but I can't
seem to get it to exclude the failed server.

Here's an example:

[Fri May 14 22:20:47 2004] [notice] Func executed for (null) [byAge(NULL)] (3 -> 3)
[Fri May 14 22:20:47 2004] [notice] New server list: [ 0 1 2 ]
[Fri May 14 22:20:47 2004] [notice] Func executed for (null) [byBusyChildren(NULL)] (3 -> 3)
[Fri May 14 22:20:47 2004] [notice] New server list: [ 0 1 2 ]
[Fri May 14 22:20:47 2004] [notice] Func executed for (null) [byRandom(NULL)] (3 -> 3)
[Fri May 14 22:20:47 2004] [notice] New server list: [ 1 0 2 ]
[Fri May 14 22:20:47 2004] [notice] Func executed for (null) [byLoad(NULL)] (3 -> 3)
[Fri May 14 22:20:47 2004] [notice] New server list: [ 0 2 1 ]
[Fri May 14 22:20:47 2004] [notice] Func executed for (null) [removeSelf(NULL)] (3 -> 2)
[Fri May 14 22:20:47 2004] [notice] New server list: [ 2 1 ]
[Fri May 14 22:20:47 2004] [notice] All funcs executed -> web2
..snip
[Fri May 14 22:20:47 2004] [error] mod_backhand: Tried... failed
..snip
[Fri May 14 22:20:47 2004] [notice] Func executed for (null) [byAge(NULL)] (3 -> 3)
[Fri May 14 22:20:47 2004] [notice] New server list: [ 0 1 2 ]
[Fri May 14 22:20:47 2004] [notice] Func executed for (null) [byBusyChildren(NULL)] (3 -> 3)
[Fri May 14 22:20:47 2004] [notice] New server list: [ 0 1 2 ]
[Fri May 14 22:20:47 2004] [notice] Func executed for (null) [byRandom(NULL)] (3 -> 3)
[Fri May 14 22:20:47 2004] [notice] New server list: [ 0 1 2 ]
[Fri May 14 22:20:47 2004] [notice] Func executed for (null) [byLoad(NULL)] (3 -> 3)
[Fri May 14 22:20:47 2004] [notice] New server list: [ 0 2 1 ]
[Fri May 14 22:20:47 2004] [notice] Func executed for (null) [removeSelf(NULL)] (3 -> 2)
[Fri May 14 22:20:47 2004] [notice] New server list: [ 2 1 ]
[Fri May 14 22:20:47 2004] [notice] All funcs executed -> web2

My conf looks like:

Backhand byAge
BackHand byBusyChildren
BackHand byRandom
BackHand byLoad
BackHand removeSelf

(removeSelf as the main server only distributes requests to one of several mod_perl servers).

I was playing with byRandom, and by putting it last, backhand will probably
try a different server if the first one fails, but then you lose the
distributing by load.

Any ideas?

Cheers,

Alex

--
Alex Krohn <alex@gossamer-threads.com>


_______________________________________________
backhand-users mailing list
backhand-users@lists.backhand.org
http://lists.backhand.org/mailman/listinfo/backhand-users
Re: don't retry a failed server [ In reply to ]
Alex Krohn wrote:

>Hi,
>
>I have a bit of a problem. If backhand sends a request to a server, and
>the request fails, backhand tries to resend the request, but I can't
>seem to get it to exclude the failed server.
>
>Here's an example:
>
>[Fri May 14 22:20:47 2004] [notice] All funcs executed -> web2
>..snip
>[Fri May 14 22:20:47 2004] [error] mod_backhand: Tried... failed
>..snip
>[Fri May 14 22:20:47 2004] [notice] All funcs executed -> web2
>
>My conf looks like:
>
> Backhand byAge
> BackHand byBusyChildren
> BackHand byRandom
> BackHand byLoad
> BackHand removeSelf
>
>(removeSelf as the main server only distributes requests to one of several mod_perl servers).
>
>I was playing with byRandom, and by putting it last, backhand will probably
>try a different server if the first one fails, but then you lose the
>distributing by load.
>
>Any ideas?
>
>
The first question is this:

mod_backhand is peer-based. The byAge rule says to not use servers that
are not running (you haven't heard from in 5 seconds). So, in the
example you gave, web2 is up, running and announcing that it is okay to
receive traffic. So, why is web2 failing to service the request?

--
// Theo Schlossnagle
// Principal Engineer -- http://www.omniti.com/~jesus/
// Postal Engine -- http://www.postalengine.com/
// Ecelerity: fastest MTA on Earth


_______________________________________________
backhand-users mailing list
backhand-users@lists.backhand.org
http://lists.backhand.org/mailman/listinfo/backhand-users
Re: don't retry a failed server [ In reply to ]
Hi,

> >I have a bit of a problem. If backhand sends a request to a server, and
> >the request fails, backhand tries to resend the request, but I can't
> >seem to get it to exclude the failed server.
> >
> >Here's an example:
> >
> >[Fri May 14 22:20:47 2004] [notice] All funcs executed -> web2
> >..snip
> >[Fri May 14 22:20:47 2004] [error] mod_backhand: Tried... failed
> >..snip
> >[Fri May 14 22:20:47 2004] [notice] All funcs executed -> web2
> >
> >My conf looks like:
> >
> > Backhand byAge
> > BackHand byBusyChildren
> > BackHand byRandom
> > BackHand byLoad
> > BackHand removeSelf
> >
> >(removeSelf as the main server only distributes requests to one of several mod_perl servers).
> >
> >I was playing with byRandom, and by putting it last, backhand will probably
> >try a different server if the first one fails, but then you lose the
> >distributing by load.
> >
> >Any ideas?
> >
> >
> The first question is this:
>
> mod_backhand is peer-based. The byAge rule says to not use servers that
> are not running (you haven't heard from in 5 seconds). So, in the
> example you gave, web2 is up, running and announcing that it is okay to
> receive traffic. So, why is web2 failing to service the request?

Thanks for the quick response!

I'm forcing web2 to fail, by restarting the mod_perl server, and hitting
the cluster while the mod_perl server is restarting. Due to the amount
of code that is preloaded, and database connections setup, a mod_perl restart
takes about 10 seconds.

I'm trying to make it so that I can restart all the mod_perl servers in
serial, without the user seeing anything (so if they happen to be sent
to a server that just got restarted, they will get sent to a new one).

Cheers,

Alex
--
Alex Krohn <alex@gossamer-threads.com>

_______________________________________________
backhand-users mailing list
backhand-users@lists.backhand.org
http://lists.backhand.org/mailman/listinfo/backhand-users
Re: don't retry a failed server [ In reply to ]
Alex Krohn wrote:

>Hi,
>
>
>
>>>I have a bit of a problem. If backhand sends a request to a server, and
>>>the request fails, backhand tries to resend the request, but I can't
>>>seem to get it to exclude the failed server.
>>>
>>>Here's an example:
>>>
>>>[Fri May 14 22:20:47 2004] [notice] All funcs executed -> web2
>>>..snip
>>>[Fri May 14 22:20:47 2004] [error] mod_backhand: Tried... failed
>>>..snip
>>>[Fri May 14 22:20:47 2004] [notice] All funcs executed -> web2
>>>
>>>My conf looks like:
>>>
>>> Backhand byAge
>>> BackHand byBusyChildren
>>> BackHand byRandom
>>> BackHand byLoad
>>> BackHand removeSelf
>>>
>>>(removeSelf as the main server only distributes requests to one of several mod_perl servers).
>>>
>>>I was playing with byRandom, and by putting it last, backhand will probably
>>>try a different server if the first one fails, but then you lose the
>>>distributing by load.
>>>
>>>Any ideas?
>>>
>>>
>>>
>>>
>>The first question is this:
>>
>>mod_backhand is peer-based. The byAge rule says to not use servers that
>>are not running (you haven't heard from in 5 seconds). So, in the
>>example you gave, web2 is up, running and announcing that it is okay to
>>receive traffic. So, why is web2 failing to service the request?
>>
>>
>
>Thanks for the quick response!
>
>I'm forcing web2 to fail, by restarting the mod_perl server, and hitting
>the cluster while the mod_perl server is restarting. Due to the amount
>of code that is preloaded, and database connections setup, a mod_perl restart
>takes about 10 seconds.
>
>I'm trying to make it so that I can restart all the mod_perl servers in
>serial, without the user seeing anything (so if they happen to be sent
>to a server that just got restarted, they will get sent to a new one).
>
>
That is a valid problem. Ideally, there would be a maturity setting in
the front end servers -- telling it when they mature. Also a "failure"
setting, but that i a bit dangerous as overloaded servers can fail once
and succeed on an immediate reattempt. (which is why the code works as
it does)...

Like any good system, there is a hack workaround.

block mod_backhand's outbound UDP traffic from your box until mod_perl
starts up completely.

Assuming you are running on FreeBSD and running MulticastStats as
225.2.3.4:4445,1

ipfw add 00010 deny udp from any to any 4445 out
Wait 5 seconds (or whatever your byAge timeout is...
shutdown...
update code...
restart...
wait 'til restart is complete.
ipfw delete 00010

Something like this can be accomplished on Linux to with ipchains or
iptables.

--
// Theo Schlossnagle
// Principal Engineer -- http://www.omniti.com/~jesus/
// Postal Engine -- http://www.postalengine.com/
// Ecelerity: fastest MTA on Earth


_______________________________________________
backhand-users mailing list
backhand-users@lists.backhand.org
http://lists.backhand.org/mailman/listinfo/backhand-users
Re: don't retry a failed server [ In reply to ]
Hi Theo,

> >I'm forcing web2 to fail, by restarting the mod_perl server, and hitting
> >the cluster while the mod_perl server is restarting. Due to the amount
> >of code that is preloaded, and database connections setup, a mod_perl restart
> >takes about 10 seconds.
> >
> >I'm trying to make it so that I can restart all the mod_perl servers in
> >serial, without the user seeing anything (so if they happen to be sent
> >to a server that just got restarted, they will get sent to a new one).
> >
> >
> That is a valid problem. Ideally, there would be a maturity setting in
> the front end servers -- telling it when they mature. Also a "failure"
> setting, but that i a bit dangerous as overloaded servers can fail once
> and succeed on an immediate reattempt. (which is why the code works as
> it does)...
>
> Like any good system, there is a hack workaround.
>
> block mod_backhand's outbound UDP traffic from your box until mod_perl
> starts up completely.
>
> Assuming you are running on FreeBSD and running MulticastStats as
> 225.2.3.4:4445,1
>
> ipfw add 00010 deny udp from any to any 4445 out
> Wait 5 seconds (or whatever your byAge timeout is...
> shutdown...
> update code...
> restart...
> wait 'til restart is complete.
> ipfw delete 00010
>
> Something like this can be accomplished on Linux to with ipchains or
> iptables.

Thanks, works great! For anyone else, you can do this with iptables
using:

iptables -A OUTPUT -p udp --sport 4445 -j REJECT
sleep 5
restart mod_perl
iptables -D OUTPUT -p udp --sport 4445 -j REJECT

Cheers,

Alex
--
Alex Krohn <alex@gossamer-threads.com>

_______________________________________________
backhand-users mailing list
backhand-users@lists.backhand.org
http://lists.backhand.org/mailman/listinfo/backhand-users