On Mon, Aug 3, 2020 at 5:03 PM Erick Erickson <erickerickson@gmail.com>
wrote:
> Gus’s point about implementing something before removing it is well taken,
> but we can deprecate it immediately without removing it. Gus’s point about
> dynamic fields not being found until later in the cycle is well taken, but
> not enough to persuade me.
>
> Fair enough :)
> I’m not enthusiastic about multiple getting started schemas. The whole
> motivation behind schemaless is that the user doesn’t need to know about
> schemas to get started. By providing multiple “getting started” schemas we
> require them to become aware of schemas again.
>
> Here's my theory (which may or may not be persuasive :) )
My thinking in that suggestion is that the majority of the problem is due
to the fact that people new to a technology will tend to latch onto the
defaults that come with something as being something that should be held
onto until you have a good reason to change it. This is reasonable because
changing things you don't understand willy nilly is often a road to pain.
And people DO want a safe starting point and we should give it to them
because it makes their life easier once they get a little further down the
road, but this is not compatible with the easy-start schemaless mode.
Looking at
https://lucene.apache.org/solr/guide/8_5/solr-tutorial.html I
see that the initial tutorial experience is fully scripted, and the user
won't likely notice if they are told to ignore _default or guessing-proto
in favor of the tech products config set... BUT when they do get to the
point of looking at the config name they'll see the more descriptive name.
So rather than seeing "_default" and thinking "Ah ha! Here's something I
can take as gospel and not change until I have a reason!" they'll see
"guessing-proto" or "dynamic-proto" and say "Hunh, I wonder what that
means?" which is a good question for them to ask I think.
The concept of a default lays in a strong bias of not touching it (IMHO)
which will be wrong most of the time no matter what we give them as a
default. If something must be a default I'd favor a non-managed,
non-dynamic, non-guessing minimal schema with the required fields, and an
id field, maybe a _text_ field, and a comment pointing to the section of
the ref guide where they can copy and paste in all the stuff that's
currently in our base schema as example (things like the text_ga type), IF
they want it. I get really tired of seeing mile long schemas that have a
ton of unused stuff that is retained because people didn't know if they
needed it or not...
Note that not having some default would break back compat, on bin/solr but
changing the default is also a break of sorts.
>
> All that said, maybe we could rethink the approach. My two objections are:
> 1> schemaless, by updating the schema based on a very small sample set is
> very susceptible to failing early and often
> 2> Constantly updating the config in ZK and reloading the collections
> seems very hard to get right.
>
I have for some time thought the inability to upload and download a config
(or files within a config) via the web UI was a gap. But I found it easier
to write
https://plugins.gradle.org/plugin/com.needhamsoftware.solr-gradle than
add that feature to the UI :)
> So I can imagine a “getting started” mode that indexed to the glob field
> while creating a schema. Ideally, it would be necessary to enable it
> specifically rather than have it be the default. I’d imagine this being
> coupled with some kind of “export schema” button. So the process would be
> > start Solr with -Dsolr.learningmode.confg=some_config_name.
> > index a bunch of documents, perhaps prototyping the search app on the
> dynamic glob field.
> > The admin UI should have a big, intrusive banner saying “RUNNING IN
> LEARNING MODE” with instructions on what to do next.
> > In that mode there’d need to be a “save schema” button or something.
> What I’d like that to do would be examine the index and write a new schema
> somewhere. If ths was the mode, then you’d be able to run it any time.
>
+1 for anything that makes a round-trip of working with the schema easier,
but not really a fan of learning mode.
>
>
>