Hi Lucene fans,
We use lucene-replicator to copy our indexes from a primary to replica nodes.
Usually, startup and shutdown are fine. In particular we call PrimaryNode.close.
But, in some edge cases - dropped connection? IOException? some process crashed? -
we sometimes hang in PrimaryNode.waitForAllRemotesToClose, which never returns.
I suspect we have a reference counting bug: in some exceptional case, we forget to release our CopyState.
This definitely should be fixed, but in the meantime, it's very unhelpful for the primary node to never come down.
I was considering submitting a PR to add a configurable timeout for the shutdown wait - and after the timeout expires,
continue with closing even though some replicas did not terminate.
They will possibly crash with an "IOException: directory closed" later, or maybe never come back at all.
Does this sound like a welcome change? Is there a better way to avoid hanging here, other than to be bug-free?
It's quite challenging to figure out where the CopyState wasn't released, as only a count is kept.
Thanks!
Steven Schlansker
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
We use lucene-replicator to copy our indexes from a primary to replica nodes.
Usually, startup and shutdown are fine. In particular we call PrimaryNode.close.
But, in some edge cases - dropped connection? IOException? some process crashed? -
we sometimes hang in PrimaryNode.waitForAllRemotesToClose, which never returns.
I suspect we have a reference counting bug: in some exceptional case, we forget to release our CopyState.
This definitely should be fixed, but in the meantime, it's very unhelpful for the primary node to never come down.
I was considering submitting a PR to add a configurable timeout for the shutdown wait - and after the timeout expires,
continue with closing even though some replicas did not terminate.
They will possibly crash with an "IOException: directory closed" later, or maybe never come back at all.
Does this sound like a welcome change? Is there a better way to avoid hanging here, other than to be bug-free?
It's quite challenging to figure out where the CopyState wasn't released, as only a count is kept.
Thanks!
Steven Schlansker
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org