Summary: I've been contemplating a simple enhancement to how SolrCloud
resolves files in a configSet: when a file isn't in ZooKeeper, fallback
resolution to the same-named configset on the file system (which normally
is ignored in SolrCloud today). A further fallback to _default on the
filesystem could be useful as well. The mutable space is always ZK if you
edit a schema or configOverlay.json or whatever.
My primary motivation is allowing for upgrades to plugins, configs, or Solr
itself to be easier in some scenarios (certainly not all!). Imagine that
you've got configOverlay.json (with some handlers defined) & params.json &
schema.xml in ZK, and solrconfig.xml on the file system, plus some partial
xml file of schema field types that is "xi:include"-ed by schema.xml.
Assume that a custom Solr Docker image is used including custom plugins,
and with this configSet baked in. One day you add some new token filters,
add a new Lucene merge policy, and remove some outdated update request
processor. You do plugin code changes and xi:included field type changes
and edit solrconfig.xml, and build this into your latest company Solr
Docker image, and you get it deployed using Kubernetes. Those changes can
be safe to deploy without touching any ZK resident configSet. Other
changes might not be (e.g. removing a field type that is referenced, etc.
or doing changes to analyzed text that are too incompatible requiring a
re-index) but my point is that some are, and this would be easier.
An additional motivation is storing large relatively static common
resources on the file system. Where I work, I've got over a gig of them
:-). This can be worked around with solr.allow.unsafe.resourceloading=true
but... it'd be nice to not have to resort to that.
Another benefit would be to make it easier to separate one's own
configuration with that of the _default configSet you took from Solr when
starting a new project. Resolving differences and then doing Solr upgrades
was a common task I had to do as a consultant and my own Solr upgrades.
Granted this is possible today but perhaps if this overlay was
emphasized/embraced more, it would lead to this outcome. It's still a
problem that a bare-bones solrconfig.xml & schema.xml are either too
bare-bones or say too much, and it's a separate issue for Solr to improve
that.
Probably secondary related issue: If the SolrCloud configSet ZK node were
to be optional instead of required (thus assume the configSet is entirely
on the file system), it would bring other benefits. It would allow users
to use the "file store" or some network mounted storage (NFS) as the
configSet location. It would accelerate experimentation with SolrCloud in
docker locally. The biggest PITA anyone notices when first exploring
SolrCloud is that configs are fundamentally not on the file system despite
you seeing them there; it's all in ZK. And there's no super convenient way
to edit the configuration, not even a web UI. Using the file system for
configSets would be especially nice when doing local SolrCloud
experimentation in Docker, eliminating an annoying configSet deployment
step.
I plan to file an issue of course but I think this deserved a dev list
discussion.
I know the new package manager could help with my primary motivating
use-case, but I think at present there are too many obstacles there, at
least at present. A file system fallback is a simple thing by comparison.
Question: Does the k8s Solr Operator do anything to make configSet &
plugin upgrades better?
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
resolves files in a configSet: when a file isn't in ZooKeeper, fallback
resolution to the same-named configset on the file system (which normally
is ignored in SolrCloud today). A further fallback to _default on the
filesystem could be useful as well. The mutable space is always ZK if you
edit a schema or configOverlay.json or whatever.
My primary motivation is allowing for upgrades to plugins, configs, or Solr
itself to be easier in some scenarios (certainly not all!). Imagine that
you've got configOverlay.json (with some handlers defined) & params.json &
schema.xml in ZK, and solrconfig.xml on the file system, plus some partial
xml file of schema field types that is "xi:include"-ed by schema.xml.
Assume that a custom Solr Docker image is used including custom plugins,
and with this configSet baked in. One day you add some new token filters,
add a new Lucene merge policy, and remove some outdated update request
processor. You do plugin code changes and xi:included field type changes
and edit solrconfig.xml, and build this into your latest company Solr
Docker image, and you get it deployed using Kubernetes. Those changes can
be safe to deploy without touching any ZK resident configSet. Other
changes might not be (e.g. removing a field type that is referenced, etc.
or doing changes to analyzed text that are too incompatible requiring a
re-index) but my point is that some are, and this would be easier.
An additional motivation is storing large relatively static common
resources on the file system. Where I work, I've got over a gig of them
:-). This can be worked around with solr.allow.unsafe.resourceloading=true
but... it'd be nice to not have to resort to that.
Another benefit would be to make it easier to separate one's own
configuration with that of the _default configSet you took from Solr when
starting a new project. Resolving differences and then doing Solr upgrades
was a common task I had to do as a consultant and my own Solr upgrades.
Granted this is possible today but perhaps if this overlay was
emphasized/embraced more, it would lead to this outcome. It's still a
problem that a bare-bones solrconfig.xml & schema.xml are either too
bare-bones or say too much, and it's a separate issue for Solr to improve
that.
Probably secondary related issue: If the SolrCloud configSet ZK node were
to be optional instead of required (thus assume the configSet is entirely
on the file system), it would bring other benefits. It would allow users
to use the "file store" or some network mounted storage (NFS) as the
configSet location. It would accelerate experimentation with SolrCloud in
docker locally. The biggest PITA anyone notices when first exploring
SolrCloud is that configs are fundamentally not on the file system despite
you seeing them there; it's all in ZK. And there's no super convenient way
to edit the configuration, not even a web UI. Using the file system for
configSets would be especially nice when doing local SolrCloud
experimentation in Docker, eliminating an annoying configSet deployment
step.
I plan to file an issue of course but I think this deserved a dev list
discussion.
I know the new package manager could help with my primary motivating
use-case, but I think at present there are too many obstacles there, at
least at present. A file system fallback is a simple thing by comparison.
Question: Does the k8s Solr Operator do anything to make configSet &
plugin upgrades better?
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley