Mailing List Archive

Fencing of bare-metal remote nodes
Hi!

is subj implemented?

Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs.

Best,
Vladislav

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Fencing of bare-metal remote nodes [ In reply to ]
----- Original Message -----
> Hi!
>
> is subj implemented?
>
> Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs.

Yes, fencing remote-nodes works. Are you certain your fencing devices can handle
fencing the remote-node? Fencing a remote-node requires a cluster node to
invoke the agent that actually performs the fencing action on the remote-node.

-- Vossel

>
> Best,
> Vladislav
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Fencing of bare-metal remote nodes [ In reply to ]
25.11.2014 23:41, David Vossel wrote:
>
>
> ----- Original Message -----
>> Hi!
>>
>> is subj implemented?
>>
>> Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs.
>
> Yes, fencing remote-nodes works. Are you certain your fencing devices can handle
> fencing the remote-node? Fencing a remote-node requires a cluster node to
> invoke the agent that actually performs the fencing action on the remote-node.

Yes, if I invoke fencing action manually ('crm node fence <rnode>' in
crmsh syntax), node is fenced. So the issue seems to be related to the
detection of a "need fencing".

Comments in related git commits are a little bit terse in this area. So
could you please explain what exactly needs to happen on a remote node
to initiate fencing?

I tried so far:
* kill pacemaker_remoted when no resources are running. systemd restated
it and crmd reconnected after some time.
* crash kernel when no resources are running
* crash kernel during massive start of resources

No fencing happened. In the last case that start actions 'hung' and were
failed by timeout (it is rather long), node was not even listed as
failed. My customer asked me to stop crashing nodes because one of them
does not boot anymore (I "like" that modern UEFI hardware very much.),
so it is hard for me to play more with that.

Best,
Vladislav


>
> -- Vossel
>
>>
>> Best,
>> Vladislav
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Fencing of bare-metal remote nodes [ In reply to ]
On 25/11/14 03:15 AM, Vladislav Bogdanov wrote:
> Hi!
>
> is subj implemented?
>
> Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs.
>
> Best,
> Vladislav

Please share your configuration(s) and application names/versions. OS
info wouldn't hurt, too. Relevant log entries on the surviving node
after panicing the other node would also be helpful.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Fencing of bare-metal remote nodes [ In reply to ]
----- Original Message -----
> 25.11.2014 23:41, David Vossel wrote:
> >
> >
> > ----- Original Message -----
> >> Hi!
> >>
> >> is subj implemented?
> >>
> >> Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs.
> >
> > Yes, fencing remote-nodes works. Are you certain your fencing devices can
> > handle
> > fencing the remote-node? Fencing a remote-node requires a cluster node to
> > invoke the agent that actually performs the fencing action on the
> > remote-node.
>
> Yes, if I invoke fencing action manually ('crm node fence <rnode>' in
> crmsh syntax), node is fenced. So the issue seems to be related to the
> detection of a "need fencing".
>
> Comments in related git commits are a little bit terse in this area. So
> could you please explain what exactly needs to happen on a remote node
> to initiate fencing?
>
> I tried so far:
> * kill pacemaker_remoted when no resources are running. systemd restated
> it and crmd reconnected after some time.
> * crash kernel when no resources are running
> * crash kernel during massive start of resources

this last one should definitely cause fencing. What version of pacemaker are
you using? I've made changes in this area recently. Can you provide a crm_report.

-- David

>
> No fencing happened. In the last case that start actions 'hung' and were
> failed by timeout (it is rather long), node was not even listed as
> failed. My customer asked me to stop crashing nodes because one of them
> does not boot anymore (I "like" that modern UEFI hardware very much.),
> so it is hard for me to play more with that.
>
> Best,
> Vladislav
>
>
> >
> > -- Vossel
> >
> >>
> >> Best,
> >> Vladislav
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Fencing of bare-metal remote nodes [ In reply to ]
26.11.2014 18:36, David Vossel wrote:
>
>
> ----- Original Message -----
>> 25.11.2014 23:41, David Vossel wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> Hi!
>>>>
>>>> is subj implemented?
>>>>
>>>> Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs.
>>>
>>> Yes, fencing remote-nodes works. Are you certain your fencing devices can
>>> handle
>>> fencing the remote-node? Fencing a remote-node requires a cluster node to
>>> invoke the agent that actually performs the fencing action on the
>>> remote-node.
>>
>> Yes, if I invoke fencing action manually ('crm node fence <rnode>' in
>> crmsh syntax), node is fenced. So the issue seems to be related to the
>> detection of a "need fencing".
>>
>> Comments in related git commits are a little bit terse in this area. So
>> could you please explain what exactly needs to happen on a remote node
>> to initiate fencing?
>>
>> I tried so far:
>> * kill pacemaker_remoted when no resources are running. systemd restated
>> it and crmd reconnected after some time.
>> * crash kernel when no resources are running
>> * crash kernel during massive start of resources
>
> this last one should definitely cause fencing. What version of pacemaker are
> you using? I've made changes in this area recently. Can you provide a crm_report.

It's c191bf3.
crm_report is ready, but I still wait an approval from a customer to
send it.


>
> -- David
>
>>
>> No fencing happened. In the last case that start actions 'hung' and were
>> failed by timeout (it is rather long), node was not even listed as
>> failed. My customer asked me to stop crashing nodes because one of them
>> does not boot anymore (I "like" that modern UEFI hardware very much.),
>> so it is hard for me to play more with that.
>>
>> Best,
>> Vladislav
>>
>>
>>>
>>> -- Vossel
>>>
>>>>
>>>> Best,
>>>> Vladislav
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Fencing of bare-metal remote nodes [ In reply to ]
----- Original Message -----
> 26.11.2014 18:36, David Vossel wrote:
> >
> >
> > ----- Original Message -----
> >> 25.11.2014 23:41, David Vossel wrote:
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> Hi!
> >>>>
> >>>> is subj implemented?
> >>>>
> >>>> Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing
> >>>> occurs.
> >>>
> >>> Yes, fencing remote-nodes works. Are you certain your fencing devices can
> >>> handle
> >>> fencing the remote-node? Fencing a remote-node requires a cluster node to
> >>> invoke the agent that actually performs the fencing action on the
> >>> remote-node.
> >>
> >> Yes, if I invoke fencing action manually ('crm node fence <rnode>' in
> >> crmsh syntax), node is fenced. So the issue seems to be related to the
> >> detection of a "need fencing".
> >>
> >> Comments in related git commits are a little bit terse in this area. So
> >> could you please explain what exactly needs to happen on a remote node
> >> to initiate fencing?
> >>
> >> I tried so far:
> >> * kill pacemaker_remoted when no resources are running. systemd restated
> >> it and crmd reconnected after some time.

This should definitely cause the remote-node to be fenced. I tested this
earlier today after reading you were having problems and my setup fenced
the remote-node correctly.

> >> * crash kernel when no resources are running

If a remote-node connection is lost and pacemaker was able to verify the
node is clean before the connection is lost, pacemaker will attempt to
reconnect to the remote-node without issuing a fencing request.

I could see why both fencing and not fencing in this situation could be desired.
Maybe i should make an option.

> >> * crash kernel during massive start of resources

This should definitely cause the remote node to be fenced.

> >
> > this last one should definitely cause fencing. What version of pacemaker
> > are
> > you using? I've made changes in this area recently. Can you provide a
> > crm_report.
>
> It's c191bf3.
> crm_report is ready, but I still wait an approval from a customer to
> send it.

Great. I really need to see what you all are doing. Outside of my own setup I have
not seen many setups where pacemaker remote deployed on baremetal nodes. It is possible
something in your configuration exposes some edge case I haven't encountered yet.

There's a US holiday Thrusday and Friday, so I won't be able to look at this until next
week.

-- Vossel

>
> >
> > -- David
> >
> >>
> >> No fencing happened. In the last case that start actions 'hung' and were
> >> failed by timeout (it is rather long), node was not even listed as
> >> failed. My customer asked me to stop crashing nodes because one of them
> >> does not boot anymore (I "like" that modern UEFI hardware very much.),
> >> so it is hard for me to play more with that.
> >>
> >> Best,
> >> Vladislav
> >>
> >>
> >>>
> >>> -- Vossel
> >>>
> >>>>
> >>>> Best,
> >>>> Vladislav
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Fencing of bare-metal remote nodes [ In reply to ]
25.11.2014 23:41, David Vossel wrote:
>
>
> ----- Original Message -----
>> Hi!
>>
>> is subj implemented?
>>
>> Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs.
>
> Yes, fencing remote-nodes works. Are you certain your fencing devices can handle
> fencing the remote-node? Fencing a remote-node requires a cluster node to
> invoke the agent that actually performs the fencing action on the remote-node.

David, a couple of questions.

I see that in your fencing tests you just stop systemd unit.
Shouldn't pacemaker_remoted somehow notify crmd that it is being
shutdown? And shouldn't crmd stop all resources on that remote node
before granting that shutdown?

Also, from what I see now it would be natural to hide current
implementation of remote node configuration under <node/> syntax. Now
remote nodes do have almost all features of normal nodes, including node
attributes. What do you think about it?

Best,
Vladislav

>
> -- Vossel
>
>>
>> Best,
>> Vladislav
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: Fencing of bare-metal remote nodes [ In reply to ]
----- Original Message -----
> 25.11.2014 23:41, David Vossel wrote:
> >
> >
> > ----- Original Message -----
> >> Hi!
> >>
> >> is subj implemented?
> >>
> >> Trying echo c > /proc/sysrq-trigger on remote nodes and no fencing occurs.
> >
> > Yes, fencing remote-nodes works. Are you certain your fencing devices can
> > handle
> > fencing the remote-node? Fencing a remote-node requires a cluster node to
> > invoke the agent that actually performs the fencing action on the
> > remote-node.
>
> David, a couple of questions.
>
> I see that in your fencing tests you just stop systemd unit.
> Shouldn't pacemaker_remoted somehow notify crmd that it is being
> shutdown? And shouldn't crmd stop all resources on that remote node
> before granting that shutdown?

yes, this needs to happen at some point.

Right now the shutdown method for a remote-node is to disable the connection
resource and wait for all the resources to stop before killing pacemaker_remoted
on the remote node. That isn't exactly ideal.


> Also, from what I see now it would be natural to hide current
> implementation of remote node configuration under <node/> syntax. Now
> remote nodes do have almost all features of normal nodes, including node
> attributes. What do you think about it?

ha, well. yes. at this point that might make sense. I had originally never
planned on remote-nodes entering the actual <nodes> section, but eventually
that changed. I'd like for usage of remote nodes to mature a bit before I
commit to changing something like this though. I'm still a bit uncertain how
people are going to use baremetal remote nodes. The use cases people come
up with keep surprising me. Keeping the remote node definition as a resource
gives us a bit more flexibility for configuration.

-- Vossel

>
> Best,
> Vladislav
>
> >
> > -- Vossel
> >
> >>
> >> Best,
> >> Vladislav
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org