Cluster s/w specs:
Kernel: 2.6.32-431.17.1.el6.x86_64
OS: CentOS 6.5
corosync-1.4.1-17.el6_5.1.x86_64
pacemaker-1.1.10-14.el6_5.3.x86_64
crmsh-2.0+git46-1.1.x86_64
Attached to this email are two text files, one contains the output of 'crm
configure show' (addresses sanitized) and the other contains the output of
'crm_simulate -sL'
Here is the situation, and we've encountered this multiple times now and
I've been unable to solve it:
* A machine in the cluster fails
* There is a spare node, unused, in the cluster available for
assignment
* The resource group that was on the failed machine, instead of
being put onto the spare, unused node is placed on a node where
another resource group is already running
* The displaced resource group then is launched on the spare,
unused node
As an example, this morning the following occurred:
Resource Group NRTMASTER is running on system gpmhac01
Resource Group NRTPNODE1 is running on system gpmhac02
Resource Group NRTPNODE2 is running on system gpmhac05
Resource Group NRTPNODE3 is running on system gpmhac04
Resource Group NRTPNODE4 is running on system gpmhac03
system gpmhac06 is up, available, and unused
system gpmhac04 fails and powers off
Resource Group NRTPNODE3 is moved to system gpmhac05
Resource Group NRTNPODE2 is moved to system gpmhac06
One of the big things that seems to occur here is that while the group
NRTPNODE3 is being launched on gpmhac05, the group NRTPNODE2 is being shut
down simultaneously which is causing race conditions where one start
script is putting a state file in place, while the stop script is erasing
it. This leaves the system in an unuseable state because required files,
parameters, and settings are missing/corrupted.
Secondly, there is simply no reason to kill a perfectly healthy resource
group, that is operating just fine in order to launch a resource group
whose machine has failed when:
1. There's a spare node available
2. The resource groups have equal priority with each other, i.e.
all of the NRTPNODE# resource groups have priority "60"
So I really need some help here in getting this setup so that it behaves
the way we *think* it should be doing based on what we understand of the
Pacemaker architecture. Obviously we're missing something since this
resource group "shuffling" occurs when there's a failed system, despite
having an unused, spare node available for immediate use, and has bitten
us several times. The fact that the race condition between startup and
shutdown is also causing the system that is brought up to be useless is
exacerbating the situation immensely.
Ideally, this is what we want:
1. If a system fails, the resources/resource group running on it
are moved to an unused, available system. No other resource
shuffling occurs amongst system occurs.
2. If a system fails and there is not an unused, available
system to fail over to, then IF the resource group has a higher
priority than another resource group, the group with the lower
priority is shutdown. Only when that shutdown is complete will
the resource group with the higher priority start its startup of
resources.
3. If a system fails and there is not an unused, available
system to fail over to, then IF the resource group has the same
or lower priority to all other resource groups, then it will not
attempt to launch itself on any other node, nor cause any other
resource group to stop or migrate.
4. Unless specifically, and manually, ordered to move OR if the
hardware system fails, a resource group should remain on its
current hardware system. It should never be forced to migrate to
a new system because something of equal or lower priority failed
and migrated to a new system.
5. We do not need resource groups to fail back to original nodes,
when running we want them to stay running on their current system
until/unless a hardware failure occurs and forces them off the
system, or we manually tell them to move.
Can someone please look over our configuration, and the bizzare scores
that I see from the crm_simulate output, and help get me to the point
where I can achieve an HA cluster that doesn't kill healthy resources in
some kind of game of musical chairs when there's an empty chair available.
Can you also tell me why or help me to ensure that a startup doesn't occur
until AFTER a shutdown is completely done?
Obviously we're misunderstanding or misapplying something with our
resource stickiness, or resource group priority, or something and we
really need to get this resolved. My job is literally on the line with
this due to these failures to operate in the fashion we expect. So all
help is appreciated.
Thanks
Tony
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
Kernel: 2.6.32-431.17.1.el6.x86_64
OS: CentOS 6.5
corosync-1.4.1-17.el6_5.1.x86_64
pacemaker-1.1.10-14.el6_5.3.x86_64
crmsh-2.0+git46-1.1.x86_64
Attached to this email are two text files, one contains the output of 'crm
configure show' (addresses sanitized) and the other contains the output of
'crm_simulate -sL'
Here is the situation, and we've encountered this multiple times now and
I've been unable to solve it:
* A machine in the cluster fails
* There is a spare node, unused, in the cluster available for
assignment
* The resource group that was on the failed machine, instead of
being put onto the spare, unused node is placed on a node where
another resource group is already running
* The displaced resource group then is launched on the spare,
unused node
As an example, this morning the following occurred:
Resource Group NRTMASTER is running on system gpmhac01
Resource Group NRTPNODE1 is running on system gpmhac02
Resource Group NRTPNODE2 is running on system gpmhac05
Resource Group NRTPNODE3 is running on system gpmhac04
Resource Group NRTPNODE4 is running on system gpmhac03
system gpmhac06 is up, available, and unused
system gpmhac04 fails and powers off
Resource Group NRTPNODE3 is moved to system gpmhac05
Resource Group NRTNPODE2 is moved to system gpmhac06
One of the big things that seems to occur here is that while the group
NRTPNODE3 is being launched on gpmhac05, the group NRTPNODE2 is being shut
down simultaneously which is causing race conditions where one start
script is putting a state file in place, while the stop script is erasing
it. This leaves the system in an unuseable state because required files,
parameters, and settings are missing/corrupted.
Secondly, there is simply no reason to kill a perfectly healthy resource
group, that is operating just fine in order to launch a resource group
whose machine has failed when:
1. There's a spare node available
2. The resource groups have equal priority with each other, i.e.
all of the NRTPNODE# resource groups have priority "60"
So I really need some help here in getting this setup so that it behaves
the way we *think* it should be doing based on what we understand of the
Pacemaker architecture. Obviously we're missing something since this
resource group "shuffling" occurs when there's a failed system, despite
having an unused, spare node available for immediate use, and has bitten
us several times. The fact that the race condition between startup and
shutdown is also causing the system that is brought up to be useless is
exacerbating the situation immensely.
Ideally, this is what we want:
1. If a system fails, the resources/resource group running on it
are moved to an unused, available system. No other resource
shuffling occurs amongst system occurs.
2. If a system fails and there is not an unused, available
system to fail over to, then IF the resource group has a higher
priority than another resource group, the group with the lower
priority is shutdown. Only when that shutdown is complete will
the resource group with the higher priority start its startup of
resources.
3. If a system fails and there is not an unused, available
system to fail over to, then IF the resource group has the same
or lower priority to all other resource groups, then it will not
attempt to launch itself on any other node, nor cause any other
resource group to stop or migrate.
4. Unless specifically, and manually, ordered to move OR if the
hardware system fails, a resource group should remain on its
current hardware system. It should never be forced to migrate to
a new system because something of equal or lower priority failed
and migrated to a new system.
5. We do not need resource groups to fail back to original nodes,
when running we want them to stay running on their current system
until/unless a hardware failure occurs and forces them off the
system, or we manually tell them to move.
Can someone please look over our configuration, and the bizzare scores
that I see from the crm_simulate output, and help get me to the point
where I can achieve an HA cluster that doesn't kill healthy resources in
some kind of game of musical chairs when there's an empty chair available.
Can you also tell me why or help me to ensure that a startup doesn't occur
until AFTER a shutdown is completely done?
Obviously we're misunderstanding or misapplying something with our
resource stickiness, or resource group priority, or something and we
really need to get this resolved. My job is literally on the line with
this due to these failures to operate in the fashion we expect. So all
help is appreciated.
Thanks
Tony
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.