HI,
We are using conserver to handle the consoles in our cluster, everything
worked perfect until several months ago when our cluster is growing larger
and larger. For now, our cluster has 2,000 nodes and will be growing to
16,000 nodes in the near future, we are seeing problems with the 2,000
nodes.
1. the conserver will start responding slow after the conserver have been
running for a while(maybe several days, I am not so sure), when the
conserver responds slow, it probably takes more than 10 seconds to open the
node console, or occasionally can not open the consoles for the nodes at
all. we have to restart the conserver to fix the problem.
2. The conserver restart will take a very long time, about 5 minutes, to
finish the initialization with 2,000 nodes, during the conserver
initialization, the rcons will get "Connection refused" error.
We did try some scaling tuning for conserver, but does not seem quite
helpful. Could you give me some further instructions on the conserver
scaling tuning? thank you.
1. Hierarchy: we setup several conserver hosts in the cluster, use the
"master" keyword on the central management node to specify which conserver
should the console goes to.
#xCAT BEGIN aixcn1 CONS
console aixcn1 {
type exec;
master aixsn1;
}
#xCAT END aixcn1 CONS
2. Change the number of consoles each daemon can handle, we changed the
number to 64 by specifying -m 64 with the conserver daemon
Thanks,
-------------------------------------------------------------------------
Li,Guang Cheng (Àî¹â³É)
IBM China Software Development Laboratory
We are using conserver to handle the consoles in our cluster, everything
worked perfect until several months ago when our cluster is growing larger
and larger. For now, our cluster has 2,000 nodes and will be growing to
16,000 nodes in the near future, we are seeing problems with the 2,000
nodes.
1. the conserver will start responding slow after the conserver have been
running for a while(maybe several days, I am not so sure), when the
conserver responds slow, it probably takes more than 10 seconds to open the
node console, or occasionally can not open the consoles for the nodes at
all. we have to restart the conserver to fix the problem.
2. The conserver restart will take a very long time, about 5 minutes, to
finish the initialization with 2,000 nodes, during the conserver
initialization, the rcons will get "Connection refused" error.
We did try some scaling tuning for conserver, but does not seem quite
helpful. Could you give me some further instructions on the conserver
scaling tuning? thank you.
1. Hierarchy: we setup several conserver hosts in the cluster, use the
"master" keyword on the central management node to specify which conserver
should the console goes to.
#xCAT BEGIN aixcn1 CONS
console aixcn1 {
type exec;
master aixsn1;
}
#xCAT END aixcn1 CONS
2. Change the number of consoles each daemon can handle, we changed the
number to 64 by specifying -m 64 with the conserver daemon
Thanks,
-------------------------------------------------------------------------
Li,Guang Cheng (Àî¹â³É)
IBM China Software Development Laboratory