Mailing List Archive: Interesting problem with ServerIron GT

Interesting problem with ServerIron GT

Mar 6, 2006, 9:59 AM

Post #1 of 4 (1375 views)

Hi.

So I've got an Interesting problem on a ServerIron GT EGC16.

I have two mail servers (running postfix) that are being load balanced in
the normal, easy way[1]. See below[2] for the software version.

Every so often, we get pages from our alerting system (nagios). Those
messages are that the 3 addresses, the two real mail servers, and the vip,
are down. Unreachable.

They don't always go down at the same time, but they often do in clusters;
one goes down, then back up, then another goes down. Or both go down close
in time to eachother, then back up close to eachother. The events haven't
been observed to last longer than about 10 minutes. Most of them are only
3-4 minutes in duration.

I've checked the logging on the servers, and they show no interruption in
layer 1 connectivity (i.e., no log messages about the interfaces going down,
which would show if it had). Arp timeouts occurred to me as a possibility,
but I've not been able to get any conclusive data.

The log messages on the serveriron are brief, just stating that it went
down, and back up. No useful information :^( Our cat6500 says nothing at
all in its logs during these events.

The real servers are in a VLAN, vlan 20. The nagios system is across our
network in another place. The foundry links to our catalyst 6509 via a
trunk group of four gig-E ports (i.e., "trunk switch ethe 3/15 to 3/16 ethe
4/15 to 4/16")

Network arch is roughly:

{corp offices with nagios probe}----[router]
|
[cat 6500]--------{Internet}
|
{real servers}-------------------[SIGT EGC16]

What have run TCP dumps on the servers and on clients during these events.
One thing that I do notice is that arp requests appear to come from the
foundry's configured management IP address, rather than the VIP. I don't
know if this is a problem or not, but it may be, as the VIP and the
management address are in different subnets. This is also confirmed from
the log messages on the servers:

arplookup 1.2.3.130 failed: host is not on local network

Anyway, it's really frustrating, and I'm unsure of where to look next.

Has anyone seen this behavior before?

Thanks for the help!
Gabriel

[1] Configuration excerpts: (IP subnet has been replaced with 1.2.3)

trunk switch ethe 3/15 to 3/16 ethe 4/15 to 4/16
!
server real mail1 1.2.3.102
port smtp
!
server real mail2 1.2.3.103
port smtp
!
!
server virtual mail-cluster 1.2.3.101
port smtp
bind smtp mail1 smtp mail2 smtp
!
vlan 20 name mail-servers by port
tagged ethe 3/15 to 3/16 ethe 4/15 to 4/16
untagged ethe 3/5 ethe 4/5
!
hostname sigt-sea-01
ip address 1.2.3.130 255.255.255.192
ip default-gateway 1.2.3.129

********

[2] show version:
SW: Version 09.3.01bTD2 Copyright (c) 1996-2003 Foundry Networks, Inc.
Compiled on Jul 07 2005 at 21:17:20 labeled as WXM09301b
(3769367 bytes) from Primary wxm09301b.bin
HW: ServerIronGT E-1 Switch, SYSIF version 21, Serial #: Non-exist

Slot 1 & 2 are:
SL 1: B0GMR WSM2 Management Module, SYSIF 2, M6, ACTIVE
Serial #: removed
0 MB SHM, 1 Application Processors
16384 KB BRAM, SMC version 5, BM version 21
SW: (1)09.3.01bTF2

Slots 3 & 4 are J-BxGC16 JetCore Gig Copper Module, SYSIF 2

--
Gabriel Cain Senior Systems Administrator
PopCap Games gabriel at popcap.com
Direct: (206) 256-4243 Mobile: (425) 418-8166

Interesting problem with ServerIron GT [ In reply to ]

Cliff at kodakgallery

Mar 6, 2006, 10:18 AM

Post #2 of 4 (1373 views)

Permalink

This is just a quick guess. But you may want to configure a server
source-ip on the subnet local to the real servers:

Server source-ip <ip address> <mask> 0.0.0.0

This is done from the global configuration. If you read the docs you
will see that this is usually for source-nat. But it does quite a bit
more, including sourcing keepalives from this address and possibly arp
requests.

-----Original Message-----
From: foundry-nsp-bounces@puck.nether.net
[mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of Gabriel Cain
Sent: Monday, March 06, 2006 10:00 AM
To: foundry-nsp at puck.nether.net
Subject: [f-nsp] Interesting problem with ServerIron GT

Hi.

So I've got an Interesting problem on a ServerIron GT EGC16.

I have two mail servers (running postfix) that are being load balanced
in
the normal, easy way[1]. See below[2] for the software version.

Every so often, we get pages from our alerting system (nagios). Those
messages are that the 3 addresses, the two real mail servers, and the
vip,
are down. Unreachable.

They don't always go down at the same time, but they often do in
clusters;
one goes down, then back up, then another goes down. Or both go down
close
in time to eachother, then back up close to eachother. The events
haven't
been observed to last longer than about 10 minutes. Most of them are
only
3-4 minutes in duration.

I've checked the logging on the servers, and they show no interruption
in
layer 1 connectivity (i.e., no log messages about the interfaces going
down,
which would show if it had). Arp timeouts occurred to me as a
possibility,
but I've not been able to get any conclusive data.

The log messages on the serveriron are brief, just stating that it went
down, and back up. No useful information :^( Our cat6500 says nothing
at
all in its logs during these events.

The real servers are in a VLAN, vlan 20. The nagios system is across
our
network in another place. The foundry links to our catalyst 6509 via a
trunk group of four gig-E ports (i.e., "trunk switch ethe 3/15 to 3/16
ethe
4/15 to 4/16")

Network arch is roughly:

{corp offices with nagios probe}----[router]
|
[cat 6500]--------{Internet}
|
{real servers}-------------------[SIGT EGC16]

What have run TCP dumps on the servers and on clients during these
events.
One thing that I do notice is that arp requests appear to come from the
foundry's configured management IP address, rather than the VIP. I
don't
know if this is a problem or not, but it may be, as the VIP and the
management address are in different subnets. This is also confirmed
from
the log messages on the servers:

arplookup 1.2.3.130 failed: host is not on local network

Anyway, it's really frustrating, and I'm unsure of where to look next.

Has anyone seen this behavior before?

Thanks for the help!
Gabriel

[1] Configuration excerpts: (IP subnet has been replaced with 1.2.3)

trunk switch ethe 3/15 to 3/16 ethe 4/15 to 4/16
!
server real mail1 1.2.3.102
port smtp
!
server real mail2 1.2.3.103
port smtp
!
!
server virtual mail-cluster 1.2.3.101
port smtp
bind smtp mail1 smtp mail2 smtp
!
vlan 20 name mail-servers by port
tagged ethe 3/15 to 3/16 ethe 4/15 to 4/16
untagged ethe 3/5 ethe 4/5
!
hostname sigt-sea-01
ip address 1.2.3.130 255.255.255.192
ip default-gateway 1.2.3.129

********

[2] show version:
SW: Version 09.3.01bTD2 Copyright (c) 1996-2003 Foundry Networks, Inc.
Compiled on Jul 07 2005 at 21:17:20 labeled as WXM09301b
(3769367 bytes) from Primary wxm09301b.bin
HW: ServerIronGT E-1 Switch, SYSIF version 21, Serial #: Non-exist

Slot 1 & 2 are:
SL 1: B0GMR WSM2 Management Module, SYSIF 2, M6, ACTIVE
Serial #: removed
0 MB SHM, 1 Application Processors
16384 KB BRAM, SMC version 5, BM version 21
SW: (1)09.3.01bTF2

Slots 3 & 4 are J-BxGC16 JetCore Gig Copper Module, SYSIF 2

--
Gabriel Cain Senior Systems
Administrator
PopCap Games
gabriel at popcap.com
Direct: (206) 256-4243 Mobile: (425)
418-8166

_______________________________________________
foundry-nsp mailing list
foundry-nsp at puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

Interesting problem with ServerIron GT [ In reply to ]

pclark at raindance

Mar 6, 2006, 3:34 PM

Post #3 of 4 (1374 views)

Permalink

What OS are your real servers running? Do they have multiple
interfaces? Is the VIP using DSR? I ask because we ran into a number
of ARP related problems with real servers with multiple ethernet
interfaces running Linux kernels of 2.2.x and 2.4.x.

-----Original Message-----
From: foundry-nsp-bounces@puck.nether.net
[mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of Cliff Fogle
Sent: Monday, March 06, 2006 11:18 AM
To: Gabriel Cain; foundry-nsp at puck.nether.net
Subject: Re: [f-nsp] Interesting problem with ServerIron GT

This is just a quick guess. But you may want to configure a server
source-ip on the subnet local to the real servers:

Server source-ip <ip address> <mask> 0.0.0.0

This is done from the global configuration. If you read the docs you
will see that this is usually for source-nat. But it does quite a bit
more, including sourcing keepalives from this address and possibly arp
requests.

-----Original Message-----
From: foundry-nsp-bounces@puck.nether.net
[mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of Gabriel Cain
Sent: Monday, March 06, 2006 10:00 AM
To: foundry-nsp at puck.nether.net
Subject: [f-nsp] Interesting problem with ServerIron GT

Hi.

So I've got an Interesting problem on a ServerIron GT EGC16.

I have two mail servers (running postfix) that are being load balanced
in the normal, easy way[1]. See below[2] for the software version.

Every so often, we get pages from our alerting system (nagios). Those
messages are that the 3 addresses, the two real mail servers, and the
vip, are down. Unreachable.

They don't always go down at the same time, but they often do in
clusters; one goes down, then back up, then another goes down. Or both
go down close in time to eachother, then back up close to eachother.
The events haven't been observed to last longer than about 10 minutes.
Most of them are only
3-4 minutes in duration.

I've checked the logging on the servers, and they show no interruption
in layer 1 connectivity (i.e., no log messages about the interfaces
going down, which would show if it had). Arp timeouts occurred to me as
a possibility, but I've not been able to get any conclusive data.

The log messages on the serveriron are brief, just stating that it went
down, and back up. No useful information :^( Our cat6500 says nothing
at all in its logs during these events.

The real servers are in a VLAN, vlan 20. The nagios system is across
our
network in another place. The foundry links to our catalyst 6509 via a
trunk group of four gig-E ports (i.e., "trunk switch ethe 3/15 to 3/16
ethe
4/15 to 4/16")

Network arch is roughly:

{corp offices with nagios probe}----[router]
|
[cat 6500]--------{Internet}
|
{real servers}-------------------[SIGT EGC16]

What have run TCP dumps on the servers and on clients during these
events.
One thing that I do notice is that arp requests appear to come from the
foundry's configured management IP address, rather than the VIP. I
don't know if this is a problem or not, but it may be, as the VIP and
the management address are in different subnets. This is also confirmed
from the log messages on the servers:

arplookup 1.2.3.130 failed: host is not on local network

Anyway, it's really frustrating, and I'm unsure of where to look next.

Has anyone seen this behavior before?

Thanks for the help!
Gabriel

[1] Configuration excerpts: (IP subnet has been replaced with 1.2.3)

trunk switch ethe 3/15 to 3/16 ethe 4/15 to 4/16 !
server real mail1 1.2.3.102
port smtp
!
server real mail2 1.2.3.103
port smtp
!
!
server virtual mail-cluster 1.2.3.101
port smtp
bind smtp mail1 smtp mail2 smtp
!
vlan 20 name mail-servers by port
tagged ethe 3/15 to 3/16 ethe 4/15 to 4/16 untagged ethe 3/5 ethe 4/5
!
hostname sigt-sea-01
ip address 1.2.3.130 255.255.255.192
ip default-gateway 1.2.3.129

********

[2] show version:
SW: Version 09.3.01bTD2 Copyright (c) 1996-2003 Foundry Networks, Inc.
Compiled on Jul 07 2005 at 21:17:20 labeled as WXM09301b
(3769367 bytes) from Primary wxm09301b.bin
HW: ServerIronGT E-1 Switch, SYSIF version 21, Serial #: Non-exist

Slot 1 & 2 are:
SL 1: B0GMR WSM2 Management Module, SYSIF 2, M6, ACTIVE
Serial #: removed
0 MB SHM, 1 Application Processors
16384 KB BRAM, SMC version 5, BM version 21
SW: (1)09.3.01bTF2

Slots 3 & 4 are J-BxGC16 JetCore Gig Copper Module, SYSIF 2

--
Gabriel Cain Senior Systems
Administrator
PopCap Games
gabriel at popcap.com
Direct: (206) 256-4243 Mobile: (425)
418-8166

_______________________________________________
foundry-nsp mailing list
foundry-nsp at puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

_______________________________________________
foundry-nsp mailing list
foundry-nsp at puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

Interesting problem with ServerIron GT [ In reply to ]

gabriel at popcap

Mar 6, 2006, 3:36 PM

Post #4 of 4 (1364 views)

Permalink

Peter Clark wrote:
> What OS are your real servers running?

FreeBSD for these.

> Do they have multiple
> interfaces?

Only 1 active.

> Is the VIP using DSR?

No.

> I ask because we ran into a number
> of ARP related problems with real servers with multiple ethernet
> interfaces running Linux kernels of 2.2.x and 2.4.x.

Good to know, but no, these arn't linux.

Thanks!
Gabriel

--
Gabriel Cain Senior Systems Administrator
PopCap Games gabriel at popcap.com
Direct: (206) 256-4243 Mobile: (425) 418-8166