Mailing List Archive: Q: Resource migration (Xen live migration)

Hello!

I have some questions on pacemakers's resource migration. We have a Xen host that has some problems (still to be investigated) that causes some VM disk not be be ready for use.

When tyring to migrate a VM frem the bad host to a good host through pacemaker, migration seemed to hang. At some state the "source VM" was no longer present on the bad host (Unable to find domain 'v09'), but pacemaker still tried a migration:
crmd[6779]: notice: te_rsc_command: Initiating action 100: migrate_from prm_xen_v09_migrate_from_0 on h05
Only after the timeout CRM realized that there is a problem:
crmd[6779]: warning: status_from_rc: Action 100 (prm_xen_v09_migrate_from_0) on h05 failed (target: 0 vs. rc: 1): Error
After that CRM still stried a stop on the "source host" (h10) (and on the destination host):
crmd[6779]: notice: te_rsc_command: Initiating action 98: stop prm_xen_v09_stop_0 on h10
crmd[6779]: notice: te_rsc_command: Initiating action 26: stop prm_xen_v09_stop_0 on h05

Q1: Is this the way it should work?

Before that we had the same situation (thae bad host had been set to "standby") when someone tired of waiting so long destroyed the affected Xen VMS on the source host while the cluster was migrating. Eventually the VMs came up (restarted instead of being live migrated) on the good hosts.

Then we shutdown OpenAIS on the bad host, installed updates and rebooted the bad host (during reboot OpenAIS was started (still standby)).
To my surprise pacemaker thought the VMS were still running on the bad host and initiated a migration. As there were no source VMs on the bad host, but alle the affected VMs were running on some good host, CRM stutdown the VMs on the good hostss, just to restart them.

Q2: Ist this expected behavior? I can hardly believe!

Software is SLES11 SP3 with pacemaker-1.1.11-0.7.53 (and related) on all hosts.

Regards,
Ulrich

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

> On 13 Feb 2015, at 8:38 pm, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> wrote:
>
> Hello!
>
> I have some questions on pacemakers's resource migration. We have a Xen host that has some problems (still to be investigated) that causes some VM disk not be be ready for use.
>
> When tyring to migrate a VM frem the bad host to a good host through pacemaker, migration seemed to hang. At some state the "source VM" was no longer present on the bad host (Unable to find domain 'v09'), but pacemaker still tried a migration:
> crmd[6779]: notice: te_rsc_command: Initiating action 100: migrate_from prm_xen_v09_migrate_from_0 on h05
> Only after the timeout CRM realized that there is a problem:
> crmd[6779]: warning: status_from_rc: Action 100 (prm_xen_v09_migrate_from_0) on h05 failed (target: 0 vs. rc: 1): Error
> After that CRM still stried a stop on the "source host" (h10) (and on the destination host):
> crmd[6779]: notice: te_rsc_command: Initiating action 98: stop prm_xen_v09_stop_0 on h10
> crmd[6779]: notice: te_rsc_command: Initiating action 26: stop prm_xen_v09_stop_0 on h05
>
> Q1: Is this the way it should work?

Mostly, but the agent should have detected the condition earlier and returned an error (instead of timing out).

>
> Before that we had the same situation (thae bad host had been set to "standby") when someone tired of waiting so long destroyed the affected Xen VMS on the source host while the cluster was migrating. Eventually the VMs came up (restarted instead of being live migrated) on the good hosts.
>
> Then we shutdown OpenAIS on the bad host, installed updates and rebooted the bad host (during reboot OpenAIS was started (still standby)).
> To my surprise pacemaker thought the VMS were still running on the bad host and initiated a migration.

That would be coming from the resource agent.

> As there were no source VMs on the bad host, but alle the affected VMs were running on some good host, CRM stutdown the VMs on the good hostss, just to restart them.
>
> Q2: Ist this expected behavior? I can hardly believe!

Nope, fix the agent :)

>
> Software is SLES11 SP3 with pacemaker-1.1.11-0.7.53 (and related) on all hosts.
>
> Regards,
> Ulrich
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems