Mailing List Archive

Approach for a new Autoscaling framework
[I’m moving a discussion from the PR
<https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
<https://issues.apache.org/jira/browse/SOLR-14613> to the dev list for a
wider audience. This is about replacing the now (in master) gone
Autoscaling framework with a way for clients to write their customized
placement code]

It took me a long time to write this mail and it's quite long, sorry.
Please anybody interested in the future of Autoscaling (not only those I
cc'ed) do read it and provide feedback. Very impacting decisions have to be
made now.

Thanks Noble for your feedback.
I believe it is important that we are aligned on what we build here, esp.
at the early defining stages (now).

Let me try to elaborate on your concerns and provide in general the
rationale behind the approach.

*> Anyone who wishes to implement this should not require to learn a lot
before even getting started*
For somebody who knows Solr (what is a Node, Collection, Shard, Replica)
and basic notions related to Autoscaling (getting variables representing
current state to make decisions), there’s not much to learn. The framework
uses the same concepts, often with the same names.

*> I don't believe we should have a set of interfaces that duplicate
existing classes just for this functionality.*
Where appropriate we can have existing classes be the implementations for
these interfaces and be passed to the plugins, that would be perfectly ok.
The proposal doesn’t include implementations at this stage, therefore
there’s no duplication, or not yet... (we must get the interfaces right and
agreed upon before implementation). If some interface methods in the
proposal have a different name from equivalent methods in internal classes
we plan to use, of course let's rename one or the other.

Existing internal abstractions are most of the time concrete classes and
not interfaces (Replica, Slice, DocCollection, ClusterState). Making these
visible to contrib code living elsewhere is making future refactoring hard
and contrib code will most likely end up reaching to methods it shouldn’t
be using. If we define a clean set of interfaces for plugins, I wouldn’t
hesitate to break external plugins that reach out to other internal Solr
classes, but will make everything possible to keep the API backward
compatible so existing plugins can be recompiled without change.

*> 24 interfaces to do this is definitely over engineering*
I don’t consider the number of classes or interfaces a metric of complexity
or of engineering quality. There are sample
<https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
plugin implementations to serve as a base for plugin writers (and for us
defining this framework) and I believe the process is relatively simple.
Trying to do the same things with existing Solr classes might prove a lot
harder (but might be worth the effort for comparison purposes to make sure
we agree on the approach? For example, getting sister replicas of a given
replica in the proposed API is: replica.getShard()
<https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
.getReplicas()
<https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
Doing so with the internal classes likely involves getting the DocCollection
and Slice name from the Replica, then get the DocCollection from the
cluster state, there get the Slice based on its name and finally
getReplicas() from the Slice). I consider the role of this new framework is
to make life as easy as possible for writing placement code and the like,
make life easy for us to maintain it, make it easy to write a simulation
engine (should be at least an order of magnitude simpler than the previous
one), etc.

An example regarding readability and number of interfaces: rather than
defining an enum with runtime annotation for building its instances (
Variable.Type
<https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
and then very generic access methods, the proposal defines a specific
interface for each “variable type” (called properties
<https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
Rather than concatenating strings to specify the data to return from a
remote node (based on snitches
<https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
see doc
<https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
the proposal is explicit and strongly typed (here
<https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb>
example
to get a specific system property from a node). This definitely does
increase the number of interfaces, but reduces IMO the effort to code to
these abstractions and provides a lot more compile time and IDE assistance.

Goal is to hide all the boilerplate code and machinery (and to a point -
complexity) in the implementations of these interfaces rather than have
each plugin writer deal with the same problems.

We’re moving from something that was complex and hard to read and debug yet
functionally extremely rich, to something simpler for us, more demanding
for users (write code rather than policy config if there's a need for new
behavior) but that should not be less "expressive" in any significant way.
One could even imagine reimplementing the former Autoscaling config Domain
Specific Language on top of these API (maybe as a summer internship project
:)

*> This is a common mistake that we all do. When we design a feature we
think that is the most important thing.*
If by *"most important thing"* you mean investing the best reasonable
effort to do things right then yes.
If you mean trying to make a minor feature look more important and inflated
than it is, I disagree.
As a personal note, replica placement is not the aspect of SolrCloud I'm
most interested in, but the first bottleneck we hit when pushing the scale
of SolrCloud. I approach this with a state of mind "let's do it right and
get it out of the way" to move to topics I really want to work on (around
distribution in SolrCloud and the role of Overseer). Implementing
Autoscaling in a way that simplifies future refactoring (or that does not
make them harder than they already are) is therefore *very high* on my
priority list, to support modest changes (Slice to Shard renaming) and more
ambitious ones (replacing Zookeeper, removing Overseer, you name it).

Thanks for reading, again sorry for the long email, but I hope this helps
(at least helps the discussion),
Ilan


On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com> wrote:

> I don't believe we should have a set of interfaces that duplicate existing
> classes just for this functionality. This is a common mistake that we all
> do. When we design a feature we think that is the most important thing. We
> endup over designing and over engineering things. This feature will remain
> a tiny part of Solr. Anyone who wishes to implement this should not require
> to learn a lot before even getting started. Let's try to have a minimal set
> of interfaces so that people who try to implement them do not have a huge
> learning cure.
>
> Let's try to understand the requirement
>
> - Solr wants a set of positions to place a few replicas
> - The implementation wants to know what is the current state of the
> cluster so that it can make those decisions
>
> 24 interfaces to do this is definitely over engineering
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
> .
>
Re: Approach for a new Autoscaling framework [ In reply to ]
I think we should move the discussion back to the PR because it has more
context and inline comments are possible. Having this discussion in 4
places (jira, pr, slack and dev list is very hard to keep track of).

On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com> wrote:

> [I’m moving a discussion from the PR
> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list for a
> wider audience. This is about replacing the now (in master) gone
> Autoscaling framework with a way for clients to write their customized
> placement code]
>
> It took me a long time to write this mail and it's quite long, sorry.
> Please anybody interested in the future of Autoscaling (not only those I
> cc'ed) do read it and provide feedback. Very impacting decisions have to be
> made now.
>
> Thanks Noble for your feedback.
> I believe it is important that we are aligned on what we build here, esp.
> at the early defining stages (now).
>
> Let me try to elaborate on your concerns and provide in general the
> rationale behind the approach.
>
> *> Anyone who wishes to implement this should not require to learn a lot
> before even getting started*
> For somebody who knows Solr (what is a Node, Collection, Shard, Replica)
> and basic notions related to Autoscaling (getting variables representing
> current state to make decisions), there’s not much to learn. The framework
> uses the same concepts, often with the same names.
>
> *> I don't believe we should have a set of interfaces that duplicate
> existing classes just for this functionality.*
> Where appropriate we can have existing classes be the implementations for
> these interfaces and be passed to the plugins, that would be perfectly ok.
> The proposal doesn’t include implementations at this stage, therefore
> there’s no duplication, or not yet... (we must get the interfaces right and
> agreed upon before implementation). If some interface methods in the
> proposal have a different name from equivalent methods in internal classes
> we plan to use, of course let's rename one or the other.
>
> Existing internal abstractions are most of the time concrete classes and
> not interfaces (Replica, Slice, DocCollection, ClusterState). Making
> these visible to contrib code living elsewhere is making future refactoring
> hard and contrib code will most likely end up reaching to methods it
> shouldn’t be using. If we define a clean set of interfaces for plugins, I
> wouldn’t hesitate to break external plugins that reach out to other
> internal Solr classes, but will make everything possible to keep the API
> backward compatible so existing plugins can be recompiled without change.
>
> *> 24 interfaces to do this is definitely over engineering*
> I don’t consider the number of classes or interfaces a metric of
> complexity or of engineering quality. There are sample
> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
> plugin implementations to serve as a base for plugin writers (and for us
> defining this framework) and I believe the process is relatively simple.
> Trying to do the same things with existing Solr classes might prove a lot
> harder (but might be worth the effort for comparison purposes to make sure
> we agree on the approach? For example, getting sister replicas of a given
> replica in the proposed API is: replica.getShard()
> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
> .getReplicas()
> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
> Doing so with the internal classes likely involves getting the
> DocCollection and Slice name from the Replica, then get the DocCollection
> from the cluster state, there get the Slice based on its name and finally
> getReplicas() from the Slice). I consider the role of this new framework
> is to make life as easy as possible for writing placement code and the
> like, make life easy for us to maintain it, make it easy to write a
> simulation engine (should be at least an order of magnitude simpler than
> the previous one), etc.
>
> An example regarding readability and number of interfaces: rather than
> defining an enum with runtime annotation for building its instances (
> Variable.Type
> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
> and then very generic access methods, the proposal defines a specific
> interface for each “variable type” (called properties
> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
> Rather than concatenating strings to specify the data to return from a
> remote node (based on snitches
> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
> see doc
> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
> the proposal is explicit and strongly typed (here
> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
> to get a specific system property from a node). This definitely does
> increase the number of interfaces, but reduces IMO the effort to code to
> these abstractions and provides a lot more compile time and IDE assistance.
>
> Goal is to hide all the boilerplate code and machinery (and to a point -
> complexity) in the implementations of these interfaces rather than have
> each plugin writer deal with the same problems.
>
> We’re moving from something that was complex and hard to read and debug
> yet functionally extremely rich, to something simpler for us, more
> demanding for users (write code rather than policy config if there's a need
> for new behavior) but that should not be less "expressive" in any
> significant way. One could even imagine reimplementing the former
> Autoscaling config Domain Specific Language on top of these API (maybe as a
> summer internship project :)
>
> *> This is a common mistake that we all do. When we design a feature we
> think that is the most important thing.*
> If by *"most important thing"* you mean investing the best reasonable
> effort to do things right then yes.
> If you mean trying to make a minor feature look more important and
> inflated than it is, I disagree.
> As a personal note, replica placement is not the aspect of SolrCloud I'm
> most interested in, but the first bottleneck we hit when pushing the scale
> of SolrCloud. I approach this with a state of mind "let's do it right and
> get it out of the way" to move to topics I really want to work on (around
> distribution in SolrCloud and the role of Overseer). Implementing
> Autoscaling in a way that simplifies future refactoring (or that does not
> make them harder than they already are) is therefore *very high* on my
> priority list, to support modest changes (Slice to Shard renaming) and
> more ambitious ones (replacing Zookeeper, removing Overseer, you name it).
>
> Thanks for reading, again sorry for the long email, but I hope this helps
> (at least helps the discussion),
> Ilan
>
>
> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com> wrote:
>
>> I don't believe we should have a set of interfaces that duplicate
>> existing classes just for this functionality. This is a common mistake that
>> we all do. When we design a feature we think that is the most important
>> thing. We endup over designing and over engineering things. This feature
>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>> not require to learn a lot before even getting started. Let's try to have a
>> minimal set of interfaces so that people who try to implement them do not
>> have a huge learning cure.
>>
>> Let's try to understand the requirement
>>
>> - Solr wants a set of positions to place a few replicas
>> - The implementation wants to know what is the current state of the
>> cluster so that it can make those decisions
>>
>> 24 interfaces to do this is definitely over engineering
>>
>> —
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>> .
>>
>
>
Re: Approach for a new Autoscaling framework [ In reply to ]
I think this is a valid thing to discuss on the dev list, since this isn't
just about code comments.
It seems to me that Ilan wants to discuss the philosophy around how to
design plugins and the interfaces in Solr which the plugins will talk to.
This is broad and affects much more than just the Autoscaling framework.

As a community & product, we have so far agreed that Solr should be lighter
weight and additional features should live in plugins that are managed
separately from Solr itself.
At that point we need to think about the lifetime and support of these
plugins. People love to refactor stuff in the solr core, which before
plugins wasn't a large issue.
However if we are now intending for many customers to rely on plugins, then
we need to come up with standards and guarantees so that these plugins
don't:

- Stall people from upgrading Solr (minor or major versions)
- Hinder the development of Solr Core
- Cause us more headaches trying to keep multiple repos of plugins up to
date with recent versions of Solr


I am not completely sure where I stand right now, but this is definitely
something that we should be thinking about when migrating all of this
functionality to plugins.

- Houston

On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <ishan@apache.org>
wrote:

> I think we should move the discussion back to the PR because it has more
> context and inline comments are possible. Having this discussion in 4
> places (jira, pr, slack and dev list is very hard to keep track of).
>
> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com> wrote:
>
>> [I’m moving a discussion from the PR
>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list for a
>> wider audience. This is about replacing the now (in master) gone
>> Autoscaling framework with a way for clients to write their customized
>> placement code]
>>
>> It took me a long time to write this mail and it's quite long, sorry.
>> Please anybody interested in the future of Autoscaling (not only those I
>> cc'ed) do read it and provide feedback. Very impacting decisions have to be
>> made now.
>>
>> Thanks Noble for your feedback.
>> I believe it is important that we are aligned on what we build here, esp.
>> at the early defining stages (now).
>>
>> Let me try to elaborate on your concerns and provide in general the
>> rationale behind the approach.
>>
>> *> Anyone who wishes to implement this should not require to learn a lot
>> before even getting started*
>> For somebody who knows Solr (what is a Node, Collection, Shard, Replica)
>> and basic notions related to Autoscaling (getting variables representing
>> current state to make decisions), there’s not much to learn. The framework
>> uses the same concepts, often with the same names.
>>
>> *> I don't believe we should have a set of interfaces that duplicate
>> existing classes just for this functionality.*
>> Where appropriate we can have existing classes be the implementations for
>> these interfaces and be passed to the plugins, that would be perfectly ok.
>> The proposal doesn’t include implementations at this stage, therefore
>> there’s no duplication, or not yet... (we must get the interfaces right and
>> agreed upon before implementation). If some interface methods in the
>> proposal have a different name from equivalent methods in internal classes
>> we plan to use, of course let's rename one or the other.
>>
>> Existing internal abstractions are most of the time concrete classes and
>> not interfaces (Replica, Slice, DocCollection, ClusterState). Making
>> these visible to contrib code living elsewhere is making future refactoring
>> hard and contrib code will most likely end up reaching to methods it
>> shouldn’t be using. If we define a clean set of interfaces for plugins, I
>> wouldn’t hesitate to break external plugins that reach out to other
>> internal Solr classes, but will make everything possible to keep the API
>> backward compatible so existing plugins can be recompiled without change.
>>
>> *> 24 interfaces to do this is definitely over engineering*
>> I don’t consider the number of classes or interfaces a metric of
>> complexity or of engineering quality. There are sample
>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>> plugin implementations to serve as a base for plugin writers (and for us
>> defining this framework) and I believe the process is relatively simple.
>> Trying to do the same things with existing Solr classes might prove a lot
>> harder (but might be worth the effort for comparison purposes to make sure
>> we agree on the approach? For example, getting sister replicas of a given
>> replica in the proposed API is: replica.getShard()
>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>> .getReplicas()
>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>> Doing so with the internal classes likely involves getting the
>> DocCollection and Slice name from the Replica, then get the DocCollection
>> from the cluster state, there get the Slice based on its name and
>> finally getReplicas() from the Slice). I consider the role of this new
>> framework is to make life as easy as possible for writing placement code
>> and the like, make life easy for us to maintain it, make it easy to write a
>> simulation engine (should be at least an order of magnitude simpler than
>> the previous one), etc.
>>
>> An example regarding readability and number of interfaces: rather than
>> defining an enum with runtime annotation for building its instances (
>> Variable.Type
>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>> and then very generic access methods, the proposal defines a specific
>> interface for each “variable type” (called properties
>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>> Rather than concatenating strings to specify the data to return from a
>> remote node (based on snitches
>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>> see doc
>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>> the proposal is explicit and strongly typed (here
>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
>> to get a specific system property from a node). This definitely does
>> increase the number of interfaces, but reduces IMO the effort to code to
>> these abstractions and provides a lot more compile time and IDE assistance.
>>
>> Goal is to hide all the boilerplate code and machinery (and to a point -
>> complexity) in the implementations of these interfaces rather than have
>> each plugin writer deal with the same problems.
>>
>> We’re moving from something that was complex and hard to read and debug
>> yet functionally extremely rich, to something simpler for us, more
>> demanding for users (write code rather than policy config if there's a need
>> for new behavior) but that should not be less "expressive" in any
>> significant way. One could even imagine reimplementing the former
>> Autoscaling config Domain Specific Language on top of these API (maybe as a
>> summer internship project :)
>>
>> *> This is a common mistake that we all do. When we design a feature we
>> think that is the most important thing.*
>> If by *"most important thing"* you mean investing the best reasonable
>> effort to do things right then yes.
>> If you mean trying to make a minor feature look more important and
>> inflated than it is, I disagree.
>> As a personal note, replica placement is not the aspect of SolrCloud I'm
>> most interested in, but the first bottleneck we hit when pushing the scale
>> of SolrCloud. I approach this with a state of mind "let's do it right and
>> get it out of the way" to move to topics I really want to work on (around
>> distribution in SolrCloud and the role of Overseer). Implementing
>> Autoscaling in a way that simplifies future refactoring (or that does not
>> make them harder than they already are) is therefore *very high* on my
>> priority list, to support modest changes (Slice to Shard renaming) and
>> more ambitious ones (replacing Zookeeper, removing Overseer, you name it).
>>
>> Thanks for reading, again sorry for the long email, but I hope this helps
>> (at least helps the discussion),
>> Ilan
>>
>>
>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com> wrote:
>>
>>> I don't believe we should have a set of interfaces that duplicate
>>> existing classes just for this functionality. This is a common mistake that
>>> we all do. When we design a feature we think that is the most important
>>> thing. We endup over designing and over engineering things. This feature
>>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>>> not require to learn a lot before even getting started. Let's try to have a
>>> minimal set of interfaces so that people who try to implement them do not
>>> have a huge learning cure.
>>>
>>> Let's try to understand the requirement
>>>
>>> - Solr wants a set of positions to place a few replicas
>>> - The implementation wants to know what is the current state of the
>>> cluster so that it can make those decisions
>>>
>>> 24 interfaces to do this is definitely over engineering
>>>
>>> —
>>> You are receiving this because you authored the thread.
>>> Reply to this email directly, view it on GitHub
>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>> or unsubscribe
>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>> .
>>>
>>
>>
Re: Approach for a new Autoscaling framework [ In reply to ]
Important discussion indeed.

I don’t have time to dive deep into the PR or make up my mind whether there is a simpler and more future proof way of designing these APIs. But I understand that autoscaling is a complex beast and it is important we get it right.

One question regarding having to write code vs config. Is the plan to ship some very simple light weight default placement rules ootb that gives 80% of users what they need with simple config, or would every user need to write code to e.g. spread replicas across hosts/racks? I’d be interested in seeing an alternative proposal laid out, perhaps not in code but with a design that can be compared and discussed.

Jan Høydahl

> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonputman@gmail.com>:
>
> ?
> I think this is a valid thing to discuss on the dev list, since this isn't just about code comments.
> It seems to me that Ilan wants to discuss the philosophy around how to design plugins and the interfaces in Solr which the plugins will talk to.
> This is broad and affects much more than just the Autoscaling framework.
>
> As a community & product, we have so far agreed that Solr should be lighter weight and additional features should live in plugins that are managed separately from Solr itself.
> At that point we need to think about the lifetime and support of these plugins. People love to refactor stuff in the solr core, which before plugins wasn't a large issue.
> However if we are now intending for many customers to rely on plugins, then we need to come up with standards and guarantees so that these plugins don't:
> Stall people from upgrading Solr (minor or major versions)
> Hinder the development of Solr Core
> Cause us more headaches trying to keep multiple repos of plugins up to date with recent versions of Solr
>
> I am not completely sure where I stand right now, but this is definitely something that we should be thinking about when migrating all of this functionality to plugins.
>
> - Houston
>
>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <ishan@apache.org> wrote:
>> I think we should move the discussion back to the PR because it has more context and inline comments are possible. Having this discussion in 4 places (jira, pr, slack and dev list is very hard to keep track of).
>>
>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com> wrote:
>>> [.I’m moving a discussion from the PR for SOLR-14613 to the dev list for a wider audience. This is about replacing the now (in master) gone Autoscaling framework with a way for clients to write their customized placement code]
>>>
>>> It took me a long time to write this mail and it's quite long, sorry.
>>> Please anybody interested in the future of Autoscaling (not only those I cc'ed) do read it and provide feedback. Very impacting decisions have to be made now.
>>>
>>> Thanks Noble for your feedback.
>>> I believe it is important that we are aligned on what we build here, esp. at the early defining stages (now).
>>>
>>> Let me try to elaborate on your concerns and provide in general the rationale behind the approach.
>>>
>>> > Anyone who wishes to implement this should not require to learn a lot before even getting started
>>> For somebody who knows Solr (what is a Node, Collection, Shard, Replica) and basic notions related to Autoscaling (getting variables representing current state to make decisions), there’s not much to learn. The framework uses the same concepts, often with the same names.
>>>
>>> > I don't believe we should have a set of interfaces that duplicate existing classes just for this functionality.
>>> Where appropriate we can have existing classes be the implementations for these interfaces and be passed to the plugins, that would be perfectly ok. The proposal doesn’t include implementations at this stage, therefore there’s no duplication, or not yet... (we must get the interfaces right and agreed upon before implementation). If some interface methods in the proposal have a different name from equivalent methods in internal classes we plan to use, of course let's rename one or the other.
>>>
>>> Existing internal abstractions are most of the time concrete classes and not interfaces (Replica, Slice, DocCollection, ClusterState). Making these visible to contrib code living elsewhere is making future refactoring hard and contrib code will most likely end up reaching to methods it shouldn’t be using. If we define a clean set of interfaces for plugins, I wouldn’t hesitate to break external plugins that reach out to other internal Solr classes, but will make everything possible to keep the API backward compatible so existing plugins can be recompiled without change.
>>>
>>> > 24 interfaces to do this is definitely over engineering
>>> I don’t consider the number of classes or interfaces a metric of complexity or of engineering quality. There are sample plugin implementations to serve as a base for plugin writers (and for us defining this framework) and I believe the process is relatively simple. Trying to do the same things with existing Solr classes might prove a lot harder (but might be worth the effort for comparison purposes to make sure we agree on the approach? For example, getting sister replicas of a given replica in the proposed API is: replica.getShard().getReplicas(). Doing so with the internal classes likely involves getting the DocCollection and Slice name from the Replica, then get the DocCollection from the cluster state, there get the Slice based on its name and finally getReplicas() from the Slice). I consider the role of this new framework is to make life as easy as possible for writing placement code and the like, make life easy for us to maintain it, make it easy to write a simulation engine (should be at least an order of magnitude simpler than the previous one), etc.
>>>
>>> An example regarding readability and number of interfaces: rather than defining an enum with runtime annotation for building its instances (Variable.Type) and then very generic access methods, the proposal defines a specific interface for each “variable type” (called properties). Rather than concatenating strings to specify the data to return from a remote node (based on snitches, see doc), the proposal is explicit and strongly typed (here example to get a specific system property from a node). This definitely does increase the number of interfaces, but reduces IMO the effort to code to these abstractions and provides a lot more compile time and IDE assistance.
>>>
>>> Goal is to hide all the boilerplate code and machinery (and to a point - complexity) in the implementations of these interfaces rather than have each plugin writer deal with the same problems.
>>>
>>> We’re moving from something that was complex and hard to read and debug yet functionally extremely rich, to something simpler for us, more demanding for users (write code rather than policy config if there's a need for new behavior) but that should not be less "expressive" in any significant way. One could even imagine reimplementing the former Autoscaling config Domain Specific Language on top of these API (maybe as a summer internship project :)
>>>
>>> > This is a common mistake that we all do. When we design a feature we think that is the most important thing.
>>> If by "most important thing" you mean investing the best reasonable effort to do things right then yes.
>>> If you mean trying to make a minor feature look more important and inflated than it is, I disagree.
>>> As a personal note, replica placement is not the aspect of SolrCloud I'm most interested in, but the first bottleneck we hit when pushing the scale of SolrCloud. I approach this with a state of mind "let's do it right and get it out of the way" to move to topics I really want to work on (around distribution in SolrCloud and the role of Overseer). Implementing Autoscaling in a way that simplifies future refactoring (or that does not make them harder than they already are) is therefore very high on my priority list, to support modest changes (Slice to Shard renaming) and more ambitious ones (replacing Zookeeper, removing Overseer, you name it).
>>>
>>> Thanks for reading, again sorry for the long email, but I hope this helps (at least helps the discussion),
>>> Ilan
>>>
>>>
>>>
>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com> wrote:
>>>> I don't believe we should have a set of interfaces that duplicate existing classes just for this functionality. This is a common mistake that we all do. When we design a feature we think that is the most important thing. We endup over designing and over engineering things. This feature will remain a tiny part of Solr. Anyone who wishes to implement this should not require to learn a lot before even getting started. Let's try to have a minimal set of interfaces so that people who try to implement them do not have a huge learning cure.
>>>>
>>>> Let's try to understand the requirement
>>>>
>>>> Solr wants a set of positions to place a few replicas
>>>> The implementation wants to know what is the current state of the cluster so that it can make those decisions
>>>> 24 interfaces to do this is definitely over engineering
>>>>
>>>> —
>>>> You are receiving this because you authored the thread.
>>>> Reply to this email directly, view it on GitHub, or unsubscribe.
>>>>
>>>
>>>
Re: Approach for a new Autoscaling framework [ In reply to ]
In my opinion we have to (and therefore will) ship at least a basic prod
ready implementation on top of the API that does simple things (not sure
about rack, but for example balance cores and disk size without co locating
replicas of same shard on same node).
Without such an implementation, I suspect adoption will be low. Moreover,
it's always a lot more friendly to start coding from a working example than
from scratch.

Not clear to me what type of "alternative proposal" you're thinking of Jan.
Alternative API proposal? Alternative approach to replace Autoscaling?

Ilan

Ilan

On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan.asf@cominvent.com> wrote:

> Important discussion indeed.
>
> I don’t have time to dive deep into the PR or make up my mind whether
> there is a simpler and more future proof way of designing these APIs. But I
> understand that autoscaling is a complex beast and it is important we get
> it right.
>
> One question regarding having to write code vs config. Is the plan to ship
> some very simple light weight default placement rules ootb that gives 80%
> of users what they need with simple config, or would every user need to
> write code to e.g. spread replicas across hosts/racks? I’d be interested in
> seeing an alternative proposal laid out, perhaps not in code but with a
> design that can be compared and discussed.
>
> Jan Høydahl
>
> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonputman@gmail.com>:
>
> ?
> I think this is a valid thing to discuss on the dev list, since this isn't
> just about code comments.
> It seems to me that Ilan wants to discuss the philosophy around how to
> design plugins and the interfaces in Solr which the plugins will talk to.
> This is broad and affects much more than just the Autoscaling framework.
>
> As a community & product, we have so far agreed that Solr should be
> lighter weight and additional features should live in plugins that are
> managed separately from Solr itself.
> At that point we need to think about the lifetime and support of these
> plugins. People love to refactor stuff in the solr core, which before
> plugins wasn't a large issue.
> However if we are now intending for many customers to rely on plugins,
> then we need to come up with standards and guarantees so that these plugins
> don't:
>
> - Stall people from upgrading Solr (minor or major versions)
> - Hinder the development of Solr Core
> - Cause us more headaches trying to keep multiple repos of plugins up
> to date with recent versions of Solr
>
>
> I am not completely sure where I stand right now, but this is definitely
> something that we should be thinking about when migrating all of this
> functionality to plugins.
>
> - Houston
>
> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <ishan@apache.org>
> wrote:
>
>> I think we should move the discussion back to the PR because it has more
>> context and inline comments are possible. Having this discussion in 4
>> places (jira, pr, slack and dev list is very hard to keep track of).
>>
>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com> wrote:
>>
>>> [I’m moving a discussion from the PR
>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list for
>>> a wider audience. This is about replacing the now (in master) gone
>>> Autoscaling framework with a way for clients to write their customized
>>> placement code]
>>>
>>> It took me a long time to write this mail and it's quite long, sorry.
>>> Please anybody interested in the future of Autoscaling (not only those I
>>> cc'ed) do read it and provide feedback. Very impacting decisions have to be
>>> made now.
>>>
>>> Thanks Noble for your feedback.
>>> I believe it is important that we are aligned on what we build here,
>>> esp. at the early defining stages (now).
>>>
>>> Let me try to elaborate on your concerns and provide in general the
>>> rationale behind the approach.
>>>
>>> *> Anyone who wishes to implement this should not require to learn a lot
>>> before even getting started*
>>> For somebody who knows Solr (what is a Node, Collection, Shard, Replica)
>>> and basic notions related to Autoscaling (getting variables representing
>>> current state to make decisions), there’s not much to learn. The framework
>>> uses the same concepts, often with the same names.
>>>
>>> *> I don't believe we should have a set of interfaces that duplicate
>>> existing classes just for this functionality.*
>>> Where appropriate we can have existing classes be the implementations
>>> for these interfaces and be passed to the plugins, that would be perfectly
>>> ok. The proposal doesn’t include implementations at this stage, therefore
>>> there’s no duplication, or not yet... (we must get the interfaces right and
>>> agreed upon before implementation). If some interface methods in the
>>> proposal have a different name from equivalent methods in internal classes
>>> we plan to use, of course let's rename one or the other.
>>>
>>> Existing internal abstractions are most of the time concrete classes and
>>> not interfaces (Replica, Slice, DocCollection, ClusterState). Making
>>> these visible to contrib code living elsewhere is making future refactoring
>>> hard and contrib code will most likely end up reaching to methods it
>>> shouldn’t be using. If we define a clean set of interfaces for plugins, I
>>> wouldn’t hesitate to break external plugins that reach out to other
>>> internal Solr classes, but will make everything possible to keep the API
>>> backward compatible so existing plugins can be recompiled without change.
>>>
>>> *> 24 interfaces to do this is definitely over engineering*
>>> I don’t consider the number of classes or interfaces a metric of
>>> complexity or of engineering quality. There are sample
>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>>> plugin implementations to serve as a base for plugin writers (and for us
>>> defining this framework) and I believe the process is relatively simple.
>>> Trying to do the same things with existing Solr classes might prove a lot
>>> harder (but might be worth the effort for comparison purposes to make sure
>>> we agree on the approach? For example, getting sister replicas of a given
>>> replica in the proposed API is: replica.getShard()
>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>>> .getReplicas()
>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>>> Doing so with the internal classes likely involves getting the
>>> DocCollection and Slice name from the Replica, then get the
>>> DocCollection from the cluster state, there get the Slice based on its
>>> name and finally getReplicas() from the Slice). I consider the role of
>>> this new framework is to make life as easy as possible for writing
>>> placement code and the like, make life easy for us to maintain it, make it
>>> easy to write a simulation engine (should be at least an order of magnitude
>>> simpler than the previous one), etc.
>>>
>>> An example regarding readability and number of interfaces: rather than
>>> defining an enum with runtime annotation for building its instances (
>>> Variable.Type
>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>>> and then very generic access methods, the proposal defines a specific
>>> interface for each “variable type” (called properties
>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>>> Rather than concatenating strings to specify the data to return from a
>>> remote node (based on snitches
>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>>> see doc
>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>>> the proposal is explicit and strongly typed (here
>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
>>> to get a specific system property from a node). This definitely does
>>> increase the number of interfaces, but reduces IMO the effort to code to
>>> these abstractions and provides a lot more compile time and IDE assistance.
>>>
>>> Goal is to hide all the boilerplate code and machinery (and to a point -
>>> complexity) in the implementations of these interfaces rather than have
>>> each plugin writer deal with the same problems.
>>>
>>> We’re moving from something that was complex and hard to read and debug
>>> yet functionally extremely rich, to something simpler for us, more
>>> demanding for users (write code rather than policy config if there's a need
>>> for new behavior) but that should not be less "expressive" in any
>>> significant way. One could even imagine reimplementing the former
>>> Autoscaling config Domain Specific Language on top of these API (maybe as a
>>> summer internship project :)
>>>
>>> *> This is a common mistake that we all do. When we design a feature we
>>> think that is the most important thing.*
>>> If by *"most important thing"* you mean investing the best reasonable
>>> effort to do things right then yes.
>>> If you mean trying to make a minor feature look more important and
>>> inflated than it is, I disagree.
>>> As a personal note, replica placement is not the aspect of SolrCloud I'm
>>> most interested in, but the first bottleneck we hit when pushing the scale
>>> of SolrCloud. I approach this with a state of mind "let's do it right and
>>> get it out of the way" to move to topics I really want to work on (around
>>> distribution in SolrCloud and the role of Overseer). Implementing
>>> Autoscaling in a way that simplifies future refactoring (or that does not
>>> make them harder than they already are) is therefore *very high* on my
>>> priority list, to support modest changes (Slice to Shard renaming) and
>>> more ambitious ones (replacing Zookeeper, removing Overseer, you name it).
>>>
>>> Thanks for reading, again sorry for the long email, but I hope this
>>> helps (at least helps the discussion),
>>> Ilan
>>>
>>>
>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com>
>>> wrote:
>>>
>>>> I don't believe we should have a set of interfaces that duplicate
>>>> existing classes just for this functionality. This is a common mistake that
>>>> we all do. When we design a feature we think that is the most important
>>>> thing. We endup over designing and over engineering things. This feature
>>>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>>>> not require to learn a lot before even getting started. Let's try to have a
>>>> minimal set of interfaces so that people who try to implement them do not
>>>> have a huge learning cure.
>>>>
>>>> Let's try to understand the requirement
>>>>
>>>> - Solr wants a set of positions to place a few replicas
>>>> - The implementation wants to know what is the current state of the
>>>> cluster so that it can make those decisions
>>>>
>>>> 24 interfaces to do this is definitely over engineering
>>>>
>>>> —
>>>> You are receiving this because you authored the thread.
>>>> Reply to this email directly, view it on GitHub
>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>>> or unsubscribe
>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>>> .
>>>>
>>>
>>>
Re: Approach for a new Autoscaling framework [ In reply to ]
> Not clear to me what type of "alternative proposal" you're thinking of Jan

That would be the responsibility of Noble and others who have concerns to detail - and try convince other peers.
It’s hard for me as a spectator to know whether to agree with Noble without a clear picture of what the alternative API or approach would look like.
I’m often a fan of loosely typed APIs since they tend to cause less boilerplate code, but strong typing may indeed be a sound choice in this API.

Jan Høydahl

> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilansolr@gmail.com>:
>
> ?
> In my opinion we have to (and therefore will) ship at least a basic prod ready implementation on top of the API that does simple things (not sure about rack, but for example balance cores and disk size without co locating replicas of same shard on same node).
> Without such an implementation, I suspect adoption will be low. Moreover, it's always a lot more friendly to start coding from a working example than from scratch.
>
> Not clear to me what type of "alternative proposal" you're thinking of Jan. Alternative API proposal? Alternative approach to replace Autoscaling?
>
> Ilan
>
> Ilan
>
>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan.asf@cominvent.com> wrote:
>> Important discussion indeed.
>>
>> I don’t have time to dive deep into the PR or make up my mind whether there is a simpler and more future proof way of designing these APIs. But I understand that autoscaling is a complex beast and it is important we get it right.
>>
>> One question regarding having to write code vs config. Is the plan to ship some very simple light weight default placement rules ootb that gives 80% of users what they need with simple config, or would every user need to write code to e.g. spread replicas across hosts/racks? I’d be interested in seeing an alternative proposal laid out, perhaps not in code but with a design that can be compared and discussed.
>>
>> Jan Høydahl
>>
>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonputman@gmail.com>:
>>>>
>>> ?
>>> I think this is a valid thing to discuss on the dev list, since this isn't just about code comments.
>>> It seems to me that Ilan wants to discuss the philosophy around how to design plugins and the interfaces in Solr which the plugins will talk to.
>>> This is broad and affects much more than just the Autoscaling framework.
>>>
>>> As a community & product, we have so far agreed that Solr should be lighter weight and additional features should live in plugins that are managed separately from Solr itself.
>>> At that point we need to think about the lifetime and support of these plugins. People love to refactor stuff in the solr core, which before plugins wasn't a large issue.
>>> However if we are now intending for many customers to rely on plugins, then we need to come up with standards and guarantees so that these plugins don't:
>>> Stall people from upgrading Solr (minor or major versions)
>>> Hinder the development of Solr Core
>>> Cause us more headaches trying to keep multiple repos of plugins up to date with recent versions of Solr
>>>
>>> I am not completely sure where I stand right now, but this is definitely something that we should be thinking about when migrating all of this functionality to plugins.
>>>
>>> - Houston
>>>
>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <ishan@apache.org> wrote:
>>>> I think we should move the discussion back to the PR because it has more context and inline comments are possible. Having this discussion in 4 places (jira, pr, slack and dev list is very hard to keep track of).
>>>>
>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com> wrote:
>>>>> [.I’m moving a discussion from the PR for SOLR-14613 to the dev list for a wider audience. This is about replacing the now (in master) gone Autoscaling framework with a way for clients to write their customized placement code]
>>>>>
>>>>> It took me a long time to write this mail and it's quite long, sorry.
>>>>> Please anybody interested in the future of Autoscaling (not only those I cc'ed) do read it and provide feedback. Very impacting decisions have to be made now.
>>>>>
>>>>> Thanks Noble for your feedback.
>>>>> I believe it is important that we are aligned on what we build here, esp. at the early defining stages (now).
>>>>>
>>>>> Let me try to elaborate on your concerns and provide in general the rationale behind the approach.
>>>>>
>>>>> > Anyone who wishes to implement this should not require to learn a lot before even getting started
>>>>> For somebody who knows Solr (what is a Node, Collection, Shard, Replica) and basic notions related to Autoscaling (getting variables representing current state to make decisions), there’s not much to learn. The framework uses the same concepts, often with the same names.
>>>>>
>>>>> > I don't believe we should have a set of interfaces that duplicate existing classes just for this functionality.
>>>>> Where appropriate we can have existing classes be the implementations for these interfaces and be passed to the plugins, that would be perfectly ok. The proposal doesn’t include implementations at this stage, therefore there’s no duplication, or not yet... (we must get the interfaces right and agreed upon before implementation). If some interface methods in the proposal have a different name from equivalent methods in internal classes we plan to use, of course let's rename one or the other.
>>>>>
>>>>> Existing internal abstractions are most of the time concrete classes and not interfaces (Replica, Slice, DocCollection, ClusterState). Making these visible to contrib code living elsewhere is making future refactoring hard and contrib code will most likely end up reaching to methods it shouldn’t be using. If we define a clean set of interfaces for plugins, I wouldn’t hesitate to break external plugins that reach out to other internal Solr classes, but will make everything possible to keep the API backward compatible so existing plugins can be recompiled without change.
>>>>>
>>>>> > 24 interfaces to do this is definitely over engineering
>>>>> I don’t consider the number of classes or interfaces a metric of complexity or of engineering quality. There are sample plugin implementations to serve as a base for plugin writers (and for us defining this framework) and I believe the process is relatively simple. Trying to do the same things with existing Solr classes might prove a lot harder (but might be worth the effort for comparison purposes to make sure we agree on the approach? For example, getting sister replicas of a given replica in the proposed API is: replica.getShard().getReplicas(). Doing so with the internal classes likely involves getting the DocCollection and Slice name from the Replica, then get the DocCollection from the cluster state, there get the Slice based on its name and finally getReplicas() from the Slice). I consider the role of this new framework is to make life as easy as possible for writing placement code and the like, make life easy for us to maintain it, make it easy to write a simulation engine (should be at least an order of magnitude simpler than the previous one), etc.
>>>>>
>>>>> An example regarding readability and number of interfaces: rather than defining an enum with runtime annotation for building its instances (Variable.Type) and then very generic access methods, the proposal defines a specific interface for each “variable type” (called properties). Rather than concatenating strings to specify the data to return from a remote node (based on snitches, see doc), the proposal is explicit and strongly typed (here example to get a specific system property from a node). This definitely does increase the number of interfaces, but reduces IMO the effort to code to these abstractions and provides a lot more compile time and IDE assistance.
>>>>>
>>>>> Goal is to hide all the boilerplate code and machinery (and to a point - complexity) in the implementations of these interfaces rather than have each plugin writer deal with the same problems.
>>>>>
>>>>> We’re moving from something that was complex and hard to read and debug yet functionally extremely rich, to something simpler for us, more demanding for users (write code rather than policy config if there's a need for new behavior) but that should not be less "expressive" in any significant way. One could even imagine reimplementing the former Autoscaling config Domain Specific Language on top of these API (maybe as a summer internship project :)
>>>>>
>>>>> > This is a common mistake that we all do. When we design a feature we think that is the most important thing.
>>>>> If by "most important thing" you mean investing the best reasonable effort to do things right then yes.
>>>>> If you mean trying to make a minor feature look more important and inflated than it is, I disagree.
>>>>> As a personal note, replica placement is not the aspect of SolrCloud I'm most interested in, but the first bottleneck we hit when pushing the scale of SolrCloud. I approach this with a state of mind "let's do it right and get it out of the way" to move to topics I really want to work on (around distribution in SolrCloud and the role of Overseer). Implementing Autoscaling in a way that simplifies future refactoring (or that does not make them harder than they already are) is therefore very high on my priority list, to support modest changes (Slice to Shard renaming) and more ambitious ones (replacing Zookeeper, removing Overseer, you name it).
>>>>>
>>>>> Thanks for reading, again sorry for the long email, but I hope this helps (at least helps the discussion),
>>>>> Ilan
>>>>>
>>>>>
>>>>>
>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com> wrote:
>>>>>> I don't believe we should have a set of interfaces that duplicate existing classes just for this functionality. This is a common mistake that we all do. When we design a feature we think that is the most important thing. We endup over designing and over engineering things. This feature will remain a tiny part of Solr. Anyone who wishes to implement this should not require to learn a lot before even getting started. Let's try to have a minimal set of interfaces so that people who try to implement them do not have a huge learning cure.
>>>>>>
>>>>>> Let's try to understand the requirement
>>>>>>
>>>>>> Solr wants a set of positions to place a few replicas
>>>>>> The implementation wants to know what is the current state of the cluster so that it can make those decisions
>>>>>> 24 interfaces to do this is definitely over engineering
>>>>>>
>>>>>> —
>>>>>> You are receiving this because you authored the thread.
>>>>>> Reply to this email directly, view it on GitHub, or unsubscribe.
>>>>>>
>>>>>
>>>>>
Re: Approach for a new Autoscaling framework [ In reply to ]
Scanned through the PR and read some of this thread. I likely have missed
much other discussion, so forgive me if I'm dredging up somethings that are
already discussed elsewhere.

The idea of designing the interfaces defining what information is available
seems good here, but I worry that it's too auto-scaling focused. In my
imagination, I would see solr having a standard informational interface
that is useful to any plugin of any sort. Autoscaling should be leveraging
that and we should be enhancing that to enable autoscaling. The current
state of the system is one key type of information, but another type of
information that should exist within solr and be exposed to plugins
(including autoscaling) is events. When a new node joins there should be an
event for example so that plugins can listen for that rather than
incessantly polling and comparing the list of 100 nodes to a cached list of
100 nodes.

In the PR I see a bunch of classes all off in a separate package, which
looks like an autoscaling fiefdom which will be tempted if not forced to
duplicate lots of stuff relative to other plugins and/or core.

As a side note I would think the metrics system could be a plugin that
leverages the same set of informational interfaces....

So there should be 3 parts to this as I imagine it.

1) Enhancements to the **plugin system** that make information about the
cluster available solr to ALL plugins
2) Enhancements to the **plugin system** API's provided to ALL plugins that
allow them to mutate solr safely.
3) A plugin that we intend to support for our users currently using auto
scaling utilizes the enhanced information to provide a similar level of
functionality as is *promised* by our current documentation of autoscaling,
there might be some gaps or differences but we should be discussing what
they are and providing recommended workarounds for users that relied on
those promises to the users. Even if there were cases where we failed to
deliver, if there were at least some conditions under which we could
deliver the promised functionality those should be supported. Only if we
never were able to deliver and it never worked under any circumstance
should we rip stuff out entirely.

Implicit in the above is the concept that there should be a facade between
plugins and the core of solr.

WRT #1 which will necessarily involve information collected from remote
nodes, we need to be designing that thinking about what informational
guarantees it provides. Latency, consistency, delivery, etc. We also need
to think about what is exposed in a read-only fashion vs what plugins might
write back to solr. Certainly there will be a lot of information that most
plugins ignore, and we might consider having groupings of information and
interfaces or annotations that indicate what info is provided, but the
simplest default state is to just give plugins a reference to a class that
they can use to drill into information about the cluster as needed.
(SolrInformationBooth? ... or less tongue in cheek... enhance
SolrInfoBean? )

Finally a fourth thing that occurs to me as I write is we need to consider
what information one plugin might make available to the rest of the solr
plugins. This might come later, and is hard because it's very hard to
anticipate what info might be generated by unknown plugins in the future.

So some humorous, not seriously suggested but hopefully memorable class
names encapsulating the concepts:

SolrInformationBooth (place to query)
SolrLoudspeaker (event announcements)
SolrControlLevers (mutate solr cluster)
SolrPluginFacebookPage (info published by the plugin that others can watch)

The "facade" provided to plugins by the plugin system should grow and
expand such that more and more plugins can rely on it. This effort should
grow it enough to move autoscaling onto it without dropping (much)
functionality that we've previously published.

-Gus

On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan.asf@cominvent.com> wrote:

> Not clear to me what type of "alternative proposal" you're thinking of Jan
>
>
> That would be the responsibility of Noble and others who have concerns to
> detail - and try convince other peers.
> It’s hard for me as a spectator to know whether to agree with Noble
> without a clear picture of what the alternative API or approach would look
> like.
> I’m often a fan of loosely typed APIs since they tend to cause less
> boilerplate code, but strong typing may indeed be a sound choice in this
> API.
>
> Jan Høydahl
>
> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilansolr@gmail.com>:
>
> ?
> In my opinion we have to (and therefore will) ship at least a basic prod
> ready implementation on top of the API that does simple things (not sure
> about rack, but for example balance cores and disk size without co locating
> replicas of same shard on same node).
> Without such an implementation, I suspect adoption will be low. Moreover,
> it's always a lot more friendly to start coding from a working example than
> from scratch.
>
> Not clear to me what type of "alternative proposal" you're thinking of
> Jan. Alternative API proposal? Alternative approach to replace Autoscaling?
>
> Ilan
>
> Ilan
>
> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan.asf@cominvent.com>
> wrote:
>
>> Important discussion indeed.
>>
>> I don’t have time to dive deep into the PR or make up my mind whether
>> there is a simpler and more future proof way of designing these APIs. But I
>> understand that autoscaling is a complex beast and it is important we get
>> it right.
>>
>> One question regarding having to write code vs config. Is the plan to
>> ship some very simple light weight default placement rules ootb that gives
>> 80% of users what they need with simple config, or would every user need to
>> write code to e.g. spread replicas across hosts/racks? I’d be interested in
>> seeing an alternative proposal laid out, perhaps not in code but with a
>> design that can be compared and discussed.
>>
>> Jan Høydahl
>>
>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonputman@gmail.com>:
>>
>> ?
>> I think this is a valid thing to discuss on the dev list, since this
>> isn't just about code comments.
>> It seems to me that Ilan wants to discuss the philosophy around how to
>> design plugins and the interfaces in Solr which the plugins will talk to.
>> This is broad and affects much more than just the Autoscaling framework.
>>
>> As a community & product, we have so far agreed that Solr should be
>> lighter weight and additional features should live in plugins that are
>> managed separately from Solr itself.
>> At that point we need to think about the lifetime and support of these
>> plugins. People love to refactor stuff in the solr core, which before
>> plugins wasn't a large issue.
>> However if we are now intending for many customers to rely on plugins,
>> then we need to come up with standards and guarantees so that these plugins
>> don't:
>>
>> - Stall people from upgrading Solr (minor or major versions)
>> - Hinder the development of Solr Core
>> - Cause us more headaches trying to keep multiple repos of plugins up
>> to date with recent versions of Solr
>>
>>
>> I am not completely sure where I stand right now, but this is definitely
>> something that we should be thinking about when migrating all of this
>> functionality to plugins.
>>
>> - Houston
>>
>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <ishan@apache.org>
>> wrote:
>>
>>> I think we should move the discussion back to the PR because it has more
>>> context and inline comments are possible. Having this discussion in 4
>>> places (jira, pr, slack and dev list is very hard to keep track of).
>>>
>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com> wrote:
>>>
>>>> [I’m moving a discussion from the PR
>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list for
>>>> a wider audience. This is about replacing the now (in master) gone
>>>> Autoscaling framework with a way for clients to write their customized
>>>> placement code]
>>>>
>>>> It took me a long time to write this mail and it's quite long, sorry.
>>>> Please anybody interested in the future of Autoscaling (not only those
>>>> I cc'ed) do read it and provide feedback. Very impacting decisions have to
>>>> be made now.
>>>>
>>>> Thanks Noble for your feedback.
>>>> I believe it is important that we are aligned on what we build here,
>>>> esp. at the early defining stages (now).
>>>>
>>>> Let me try to elaborate on your concerns and provide in general the
>>>> rationale behind the approach.
>>>>
>>>> *> Anyone who wishes to implement this should not require to learn a
>>>> lot before even getting started*
>>>> For somebody who knows Solr (what is a Node, Collection, Shard,
>>>> Replica) and basic notions related to Autoscaling (getting variables
>>>> representing current state to make decisions), there’s not much to learn.
>>>> The framework uses the same concepts, often with the same names.
>>>>
>>>> *> I don't believe we should have a set of interfaces that duplicate
>>>> existing classes just for this functionality.*
>>>> Where appropriate we can have existing classes be the implementations
>>>> for these interfaces and be passed to the plugins, that would be perfectly
>>>> ok. The proposal doesn’t include implementations at this stage, therefore
>>>> there’s no duplication, or not yet... (we must get the interfaces right and
>>>> agreed upon before implementation). If some interface methods in the
>>>> proposal have a different name from equivalent methods in internal classes
>>>> we plan to use, of course let's rename one or the other.
>>>>
>>>> Existing internal abstractions are most of the time concrete classes
>>>> and not interfaces (Replica, Slice, DocCollection, ClusterState).
>>>> Making these visible to contrib code living elsewhere is making future
>>>> refactoring hard and contrib code will most likely end up reaching to
>>>> methods it shouldn’t be using. If we define a clean set of interfaces for
>>>> plugins, I wouldn’t hesitate to break external plugins that reach out to
>>>> other internal Solr classes, but will make everything possible to keep the
>>>> API backward compatible so existing plugins can be recompiled without
>>>> change.
>>>>
>>>> *> 24 interfaces to do this is definitely over engineering*
>>>> I don’t consider the number of classes or interfaces a metric of
>>>> complexity or of engineering quality. There are sample
>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>>>> plugin implementations to serve as a base for plugin writers (and for us
>>>> defining this framework) and I believe the process is relatively simple.
>>>> Trying to do the same things with existing Solr classes might prove a lot
>>>> harder (but might be worth the effort for comparison purposes to make sure
>>>> we agree on the approach? For example, getting sister replicas of a given
>>>> replica in the proposed API is: replica.getShard()
>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>>>> .getReplicas()
>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>>>> Doing so with the internal classes likely involves getting the
>>>> DocCollection and Slice name from the Replica, then get the
>>>> DocCollection from the cluster state, there get the Slice based on its
>>>> name and finally getReplicas() from the Slice). I consider the role of
>>>> this new framework is to make life as easy as possible for writing
>>>> placement code and the like, make life easy for us to maintain it, make it
>>>> easy to write a simulation engine (should be at least an order of magnitude
>>>> simpler than the previous one), etc.
>>>>
>>>> An example regarding readability and number of interfaces: rather than
>>>> defining an enum with runtime annotation for building its instances (
>>>> Variable.Type
>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>>>> and then very generic access methods, the proposal defines a specific
>>>> interface for each “variable type” (called properties
>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>>>> Rather than concatenating strings to specify the data to return from a
>>>> remote node (based on snitches
>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>>>> see doc
>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>>>> the proposal is explicit and strongly typed (here
>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
>>>> to get a specific system property from a node). This definitely does
>>>> increase the number of interfaces, but reduces IMO the effort to code to
>>>> these abstractions and provides a lot more compile time and IDE assistance.
>>>>
>>>> Goal is to hide all the boilerplate code and machinery (and to a point
>>>> - complexity) in the implementations of these interfaces rather than have
>>>> each plugin writer deal with the same problems.
>>>>
>>>> We’re moving from something that was complex and hard to read and debug
>>>> yet functionally extremely rich, to something simpler for us, more
>>>> demanding for users (write code rather than policy config if there's a need
>>>> for new behavior) but that should not be less "expressive" in any
>>>> significant way. One could even imagine reimplementing the former
>>>> Autoscaling config Domain Specific Language on top of these API (maybe as a
>>>> summer internship project :)
>>>>
>>>> *> This is a common mistake that we all do. When we design a feature we
>>>> think that is the most important thing.*
>>>> If by *"most important thing"* you mean investing the best reasonable
>>>> effort to do things right then yes.
>>>> If you mean trying to make a minor feature look more important and
>>>> inflated than it is, I disagree.
>>>> As a personal note, replica placement is not the aspect of SolrCloud
>>>> I'm most interested in, but the first bottleneck we hit when pushing the
>>>> scale of SolrCloud. I approach this with a state of mind "let's do it right
>>>> and get it out of the way" to move to topics I really want to work on
>>>> (around distribution in SolrCloud and the role of Overseer). Implementing
>>>> Autoscaling in a way that simplifies future refactoring (or that does not
>>>> make them harder than they already are) is therefore *very high* on my
>>>> priority list, to support modest changes (Slice to Shard renaming) and
>>>> more ambitious ones (replacing Zookeeper, removing Overseer, you name it).
>>>>
>>>> Thanks for reading, again sorry for the long email, but I hope this
>>>> helps (at least helps the discussion),
>>>> Ilan
>>>>
>>>>
>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com>
>>>> wrote:
>>>>
>>>>> I don't believe we should have a set of interfaces that duplicate
>>>>> existing classes just for this functionality. This is a common mistake that
>>>>> we all do. When we design a feature we think that is the most important
>>>>> thing. We endup over designing and over engineering things. This feature
>>>>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>>>>> not require to learn a lot before even getting started. Let's try to have a
>>>>> minimal set of interfaces so that people who try to implement them do not
>>>>> have a huge learning cure.
>>>>>
>>>>> Let's try to understand the requirement
>>>>>
>>>>> - Solr wants a set of positions to place a few replicas
>>>>> - The implementation wants to know what is the current state of
>>>>> the cluster so that it can make those decisions
>>>>>
>>>>> 24 interfaces to do this is definitely over engineering
>>>>>
>>>>> —
>>>>> You are receiving this because you authored the thread.
>>>>> Reply to this email directly, view it on GitHub
>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>>>> or unsubscribe
>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>>>> .
>>>>>
>>>>
>>>>

--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)
Re: Approach for a new Autoscaling framework [ In reply to ]
Thanks Gus!
This makes a lot of sense but significantly increases IMO the scope and
effort to define an "Autoscaling" framework interface.

I'd be happy to try to see what concepts could be shared and how a generic
plugin facade could be defined.

What are the other types of plugins that would share such a unified
approach? Do they already exist under another form or are just projects at
this stage, like Autoscaling plugins?

But... Assuming this is the first "facade" layer to be defined between Solr
and external code, it might be hard to make it generic and get it right.
There's value in starting simple, understanding the tradeoffs and
generalizing later.

Also I'd like to make sure we're not paying a performance "genericity tax"
in Autoscaling for unneeded features.

Ilan

Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.heck@gmail.com> a écrit :

> Scanned through the PR and read some of this thread. I likely have missed
> much other discussion, so forgive me if I'm dredging up somethings that are
> already discussed elsewhere.
>
> The idea of designing the interfaces defining what information is
> available seems good here, but I worry that it's too auto-scaling focused.
> In my imagination, I would see solr having a standard informational
> interface that is useful to any plugin of any sort. Autoscaling should be
> leveraging that and we should be enhancing that to enable autoscaling. The
> current state of the system is one key type of information, but another
> type of information that should exist within solr and be exposed to plugins
> (including autoscaling) is events. When a new node joins there should be an
> event for example so that plugins can listen for that rather than
> incessantly polling and comparing the list of 100 nodes to a cached list of
> 100 nodes.
>
> In the PR I see a bunch of classes all off in a separate package, which
> looks like an autoscaling fiefdom which will be tempted if not forced to
> duplicate lots of stuff relative to other plugins and/or core.
>
> As a side note I would think the metrics system could be a plugin that
> leverages the same set of informational interfaces....
>
> So there should be 3 parts to this as I imagine it.
>
> 1) Enhancements to the **plugin system** that make information about the
> cluster available solr to ALL plugins
> 2) Enhancements to the **plugin system** API's provided to ALL plugins
> that allow them to mutate solr safely.
> 3) A plugin that we intend to support for our users currently using auto
> scaling utilizes the enhanced information to provide a similar level of
> functionality as is *promised* by our current documentation of autoscaling,
> there might be some gaps or differences but we should be discussing what
> they are and providing recommended workarounds for users that relied on
> those promises to the users. Even if there were cases where we failed to
> deliver, if there were at least some conditions under which we could
> deliver the promised functionality those should be supported. Only if we
> never were able to deliver and it never worked under any circumstance
> should we rip stuff out entirely.
>
> Implicit in the above is the concept that there should be a facade between
> plugins and the core of solr.
>
> WRT #1 which will necessarily involve information collected from remote
> nodes, we need to be designing that thinking about what informational
> guarantees it provides. Latency, consistency, delivery, etc. We also need
> to think about what is exposed in a read-only fashion vs what plugins might
> write back to solr. Certainly there will be a lot of information that most
> plugins ignore, and we might consider having groupings of information and
> interfaces or annotations that indicate what info is provided, but the
> simplest default state is to just give plugins a reference to a class that
> they can use to drill into information about the cluster as needed.
> (SolrInformationBooth? ... or less tongue in cheek... enhance
> SolrInfoBean? )
>
> Finally a fourth thing that occurs to me as I write is we need to consider
> what information one plugin might make available to the rest of the solr
> plugins. This might come later, and is hard because it's very hard to
> anticipate what info might be generated by unknown plugins in the future.
>
> So some humorous, not seriously suggested but hopefully memorable class
> names encapsulating the concepts:
>
> SolrInformationBooth (place to query)
> SolrLoudspeaker (event announcements)
> SolrControlLevers (mutate solr cluster)
> SolrPluginFacebookPage (info published by the plugin that others can watch)
>
> The "facade" provided to plugins by the plugin system should grow and
> expand such that more and more plugins can rely on it. This effort should
> grow it enough to move autoscaling onto it without dropping (much)
> functionality that we've previously published.
>
> -Gus
>
> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan.asf@cominvent.com> wrote:
>
>> Not clear to me what type of "alternative proposal" you're thinking of Jan
>>
>>
>> That would be the responsibility of Noble and others who have concerns to
>> detail - and try convince other peers.
>> It’s hard for me as a spectator to know whether to agree with Noble
>> without a clear picture of what the alternative API or approach would look
>> like.
>> I’m often a fan of loosely typed APIs since they tend to cause less
>> boilerplate code, but strong typing may indeed be a sound choice in this
>> API.
>>
>> Jan Høydahl
>>
>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilansolr@gmail.com>:
>>
>> ?
>> In my opinion we have to (and therefore will) ship at least a basic prod
>> ready implementation on top of the API that does simple things (not sure
>> about rack, but for example balance cores and disk size without co locating
>> replicas of same shard on same node).
>> Without such an implementation, I suspect adoption will be low. Moreover,
>> it's always a lot more friendly to start coding from a working example than
>> from scratch.
>>
>> Not clear to me what type of "alternative proposal" you're thinking of
>> Jan. Alternative API proposal? Alternative approach to replace Autoscaling?
>>
>> Ilan
>>
>> Ilan
>>
>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan.asf@cominvent.com>
>> wrote:
>>
>>> Important discussion indeed.
>>>
>>> I don’t have time to dive deep into the PR or make up my mind whether
>>> there is a simpler and more future proof way of designing these APIs. But I
>>> understand that autoscaling is a complex beast and it is important we get
>>> it right.
>>>
>>> One question regarding having to write code vs config. Is the plan to
>>> ship some very simple light weight default placement rules ootb that gives
>>> 80% of users what they need with simple config, or would every user need to
>>> write code to e.g. spread replicas across hosts/racks? I’d be interested in
>>> seeing an alternative proposal laid out, perhaps not in code but with a
>>> design that can be compared and discussed.
>>>
>>> Jan Høydahl
>>>
>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonputman@gmail.com>:
>>>
>>> ?
>>> I think this is a valid thing to discuss on the dev list, since this
>>> isn't just about code comments.
>>> It seems to me that Ilan wants to discuss the philosophy around how to
>>> design plugins and the interfaces in Solr which the plugins will talk to.
>>> This is broad and affects much more than just the Autoscaling framework.
>>>
>>> As a community & product, we have so far agreed that Solr should be
>>> lighter weight and additional features should live in plugins that are
>>> managed separately from Solr itself.
>>> At that point we need to think about the lifetime and support of these
>>> plugins. People love to refactor stuff in the solr core, which before
>>> plugins wasn't a large issue.
>>> However if we are now intending for many customers to rely on plugins,
>>> then we need to come up with standards and guarantees so that these plugins
>>> don't:
>>>
>>> - Stall people from upgrading Solr (minor or major versions)
>>> - Hinder the development of Solr Core
>>> - Cause us more headaches trying to keep multiple repos of plugins
>>> up to date with recent versions of Solr
>>>
>>>
>>> I am not completely sure where I stand right now, but this is definitely
>>> something that we should be thinking about when migrating all of this
>>> functionality to plugins.
>>>
>>> - Houston
>>>
>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <ishan@apache.org>
>>> wrote:
>>>
>>>> I think we should move the discussion back to the PR because it has
>>>> more context and inline comments are possible. Having this discussion in 4
>>>> places (jira, pr, slack and dev list is very hard to keep track of).
>>>>
>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com>
>>>> wrote:
>>>>
>>>>> [I’m moving a discussion from the PR
>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list
>>>>> for a wider audience. This is about replacing the now (in master) gone
>>>>> Autoscaling framework with a way for clients to write their customized
>>>>> placement code]
>>>>>
>>>>> It took me a long time to write this mail and it's quite long, sorry.
>>>>> Please anybody interested in the future of Autoscaling (not only those
>>>>> I cc'ed) do read it and provide feedback. Very impacting decisions have to
>>>>> be made now.
>>>>>
>>>>> Thanks Noble for your feedback.
>>>>> I believe it is important that we are aligned on what we build here,
>>>>> esp. at the early defining stages (now).
>>>>>
>>>>> Let me try to elaborate on your concerns and provide in general the
>>>>> rationale behind the approach.
>>>>>
>>>>> *> Anyone who wishes to implement this should not require to learn a
>>>>> lot before even getting started*
>>>>> For somebody who knows Solr (what is a Node, Collection, Shard,
>>>>> Replica) and basic notions related to Autoscaling (getting variables
>>>>> representing current state to make decisions), there’s not much to learn.
>>>>> The framework uses the same concepts, often with the same names.
>>>>>
>>>>> *> I don't believe we should have a set of interfaces that duplicate
>>>>> existing classes just for this functionality.*
>>>>> Where appropriate we can have existing classes be the implementations
>>>>> for these interfaces and be passed to the plugins, that would be perfectly
>>>>> ok. The proposal doesn’t include implementations at this stage, therefore
>>>>> there’s no duplication, or not yet... (we must get the interfaces right and
>>>>> agreed upon before implementation). If some interface methods in the
>>>>> proposal have a different name from equivalent methods in internal classes
>>>>> we plan to use, of course let's rename one or the other.
>>>>>
>>>>> Existing internal abstractions are most of the time concrete classes
>>>>> and not interfaces (Replica, Slice, DocCollection, ClusterState).
>>>>> Making these visible to contrib code living elsewhere is making future
>>>>> refactoring hard and contrib code will most likely end up reaching to
>>>>> methods it shouldn’t be using. If we define a clean set of interfaces for
>>>>> plugins, I wouldn’t hesitate to break external plugins that reach out to
>>>>> other internal Solr classes, but will make everything possible to keep the
>>>>> API backward compatible so existing plugins can be recompiled without
>>>>> change.
>>>>>
>>>>> *> 24 interfaces to do this is definitely over engineering*
>>>>> I don’t consider the number of classes or interfaces a metric of
>>>>> complexity or of engineering quality. There are sample
>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>>>>> plugin implementations to serve as a base for plugin writers (and for us
>>>>> defining this framework) and I believe the process is relatively simple.
>>>>> Trying to do the same things with existing Solr classes might prove a lot
>>>>> harder (but might be worth the effort for comparison purposes to make sure
>>>>> we agree on the approach? For example, getting sister replicas of a given
>>>>> replica in the proposed API is: replica.getShard()
>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>>>>> .getReplicas()
>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>>>>> Doing so with the internal classes likely involves getting the
>>>>> DocCollection and Slice name from the Replica, then get the
>>>>> DocCollection from the cluster state, there get the Slice based on
>>>>> its name and finally getReplicas() from the Slice). I consider the
>>>>> role of this new framework is to make life as easy as possible for writing
>>>>> placement code and the like, make life easy for us to maintain it, make it
>>>>> easy to write a simulation engine (should be at least an order of magnitude
>>>>> simpler than the previous one), etc.
>>>>>
>>>>> An example regarding readability and number of interfaces: rather than
>>>>> defining an enum with runtime annotation for building its instances (
>>>>> Variable.Type
>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>>>>> and then very generic access methods, the proposal defines a specific
>>>>> interface for each “variable type” (called properties
>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>>>>> Rather than concatenating strings to specify the data to return from a
>>>>> remote node (based on snitches
>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>>>>> see doc
>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>>>>> the proposal is explicit and strongly typed (here
>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
>>>>> to get a specific system property from a node). This definitely does
>>>>> increase the number of interfaces, but reduces IMO the effort to code to
>>>>> these abstractions and provides a lot more compile time and IDE assistance.
>>>>>
>>>>> Goal is to hide all the boilerplate code and machinery (and to a point
>>>>> - complexity) in the implementations of these interfaces rather than have
>>>>> each plugin writer deal with the same problems.
>>>>>
>>>>> We’re moving from something that was complex and hard to read and
>>>>> debug yet functionally extremely rich, to something simpler for us, more
>>>>> demanding for users (write code rather than policy config if there's a need
>>>>> for new behavior) but that should not be less "expressive" in any
>>>>> significant way. One could even imagine reimplementing the former
>>>>> Autoscaling config Domain Specific Language on top of these API (maybe as a
>>>>> summer internship project :)
>>>>>
>>>>> *> This is a common mistake that we all do. When we design a feature
>>>>> we think that is the most important thing.*
>>>>> If by *"most important thing"* you mean investing the best reasonable
>>>>> effort to do things right then yes.
>>>>> If you mean trying to make a minor feature look more important and
>>>>> inflated than it is, I disagree.
>>>>> As a personal note, replica placement is not the aspect of SolrCloud
>>>>> I'm most interested in, but the first bottleneck we hit when pushing the
>>>>> scale of SolrCloud. I approach this with a state of mind "let's do it right
>>>>> and get it out of the way" to move to topics I really want to work on
>>>>> (around distribution in SolrCloud and the role of Overseer). Implementing
>>>>> Autoscaling in a way that simplifies future refactoring (or that does not
>>>>> make them harder than they already are) is therefore *very high* on
>>>>> my priority list, to support modest changes (Slice to Shard renaming)
>>>>> and more ambitious ones (replacing Zookeeper, removing Overseer, you name
>>>>> it).
>>>>>
>>>>> Thanks for reading, again sorry for the long email, but I hope this
>>>>> helps (at least helps the discussion),
>>>>> Ilan
>>>>>
>>>>>
>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com>
>>>>> wrote:
>>>>>
>>>>>> I don't believe we should have a set of interfaces that duplicate
>>>>>> existing classes just for this functionality. This is a common mistake that
>>>>>> we all do. When we design a feature we think that is the most important
>>>>>> thing. We endup over designing and over engineering things. This feature
>>>>>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>>>>>> not require to learn a lot before even getting started. Let's try to have a
>>>>>> minimal set of interfaces so that people who try to implement them do not
>>>>>> have a huge learning cure.
>>>>>>
>>>>>> Let's try to understand the requirement
>>>>>>
>>>>>> - Solr wants a set of positions to place a few replicas
>>>>>> - The implementation wants to know what is the current state of
>>>>>> the cluster so that it can make those decisions
>>>>>>
>>>>>> 24 interfaces to do this is definitely over engineering
>>>>>>
>>>>>> —
>>>>>> You are receiving this because you authored the thread.
>>>>>> Reply to this email directly, view it on GitHub
>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>>>>> or unsubscribe
>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
Re: Approach for a new Autoscaling framework [ In reply to ]
Hi Ilan,

I like where we're going with
https://github.com/apache/lucene-solr/pull/1684 . Correct me if I am wrong,
but my understanding of this PR is we're defining the interfaces for
creating policies

What's not clear to me is how will existing collection APIs like
create-collections/add-replica etc make use of it? Is that something that
has been discussed somewhere that I could read up on?



On Sat, Jul 25, 2020 at 2:03 PM Ilan Ginzburg <ilansolr@gmail.com> wrote:

> Thanks Gus!
> This makes a lot of sense but significantly increases IMO the scope and
> effort to define an "Autoscaling" framework interface.
>
> I'd be happy to try to see what concepts could be shared and how a generic
> plugin facade could be defined.
>
> What are the other types of plugins that would share such a unified
> approach? Do they already exist under another form or are just projects at
> this stage, like Autoscaling plugins?
>
> But... Assuming this is the first "facade" layer to be defined between
> Solr and external code, it might be hard to make it generic and get it
> right. There's value in starting simple, understanding the tradeoffs and
> generalizing later.
>
> Also I'd like to make sure we're not paying a performance "genericity tax"
> in Autoscaling for unneeded features.
>
> Ilan
>
> Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.heck@gmail.com> a écrit :
>
>> Scanned through the PR and read some of this thread. I likely have missed
>> much other discussion, so forgive me if I'm dredging up somethings that are
>> already discussed elsewhere.
>>
>> The idea of designing the interfaces defining what information is
>> available seems good here, but I worry that it's too auto-scaling focused.
>> In my imagination, I would see solr having a standard informational
>> interface that is useful to any plugin of any sort. Autoscaling should be
>> leveraging that and we should be enhancing that to enable autoscaling. The
>> current state of the system is one key type of information, but another
>> type of information that should exist within solr and be exposed to plugins
>> (including autoscaling) is events. When a new node joins there should be an
>> event for example so that plugins can listen for that rather than
>> incessantly polling and comparing the list of 100 nodes to a cached list of
>> 100 nodes.
>>
>> In the PR I see a bunch of classes all off in a separate package, which
>> looks like an autoscaling fiefdom which will be tempted if not forced to
>> duplicate lots of stuff relative to other plugins and/or core.
>>
>> As a side note I would think the metrics system could be a plugin that
>> leverages the same set of informational interfaces....
>>
>> So there should be 3 parts to this as I imagine it.
>>
>> 1) Enhancements to the **plugin system** that make information about the
>> cluster available solr to ALL plugins
>> 2) Enhancements to the **plugin system** API's provided to ALL plugins
>> that allow them to mutate solr safely.
>> 3) A plugin that we intend to support for our users currently using auto
>> scaling utilizes the enhanced information to provide a similar level of
>> functionality as is *promised* by our current documentation of autoscaling,
>> there might be some gaps or differences but we should be discussing what
>> they are and providing recommended workarounds for users that relied on
>> those promises to the users. Even if there were cases where we failed to
>> deliver, if there were at least some conditions under which we could
>> deliver the promised functionality those should be supported. Only if we
>> never were able to deliver and it never worked under any circumstance
>> should we rip stuff out entirely.
>>
>> Implicit in the above is the concept that there should be a facade
>> between plugins and the core of solr.
>>
>> WRT #1 which will necessarily involve information collected from remote
>> nodes, we need to be designing that thinking about what informational
>> guarantees it provides. Latency, consistency, delivery, etc. We also need
>> to think about what is exposed in a read-only fashion vs what plugins might
>> write back to solr. Certainly there will be a lot of information that most
>> plugins ignore, and we might consider having groupings of information and
>> interfaces or annotations that indicate what info is provided, but the
>> simplest default state is to just give plugins a reference to a class that
>> they can use to drill into information about the cluster as needed.
>> (SolrInformationBooth? ... or less tongue in cheek... enhance
>> SolrInfoBean? )
>>
>> Finally a fourth thing that occurs to me as I write is we need to
>> consider what information one plugin might make available to the rest of
>> the solr plugins. This might come later, and is hard because it's very hard
>> to anticipate what info might be generated by unknown plugins in the future.
>>
>> So some humorous, not seriously suggested but hopefully memorable class
>> names encapsulating the concepts:
>>
>> SolrInformationBooth (place to query)
>> SolrLoudspeaker (event announcements)
>> SolrControlLevers (mutate solr cluster)
>> SolrPluginFacebookPage (info published by the plugin that others can
>> watch)
>>
>> The "facade" provided to plugins by the plugin system should grow and
>> expand such that more and more plugins can rely on it. This effort should
>> grow it enough to move autoscaling onto it without dropping (much)
>> functionality that we've previously published.
>>
>> -Gus
>>
>> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan.asf@cominvent.com>
>> wrote:
>>
>>> Not clear to me what type of "alternative proposal" you're thinking of
>>> Jan
>>>
>>>
>>> That would be the responsibility of Noble and others who have concerns
>>> to detail - and try convince other peers.
>>> It’s hard for me as a spectator to know whether to agree with Noble
>>> without a clear picture of what the alternative API or approach would look
>>> like.
>>> I’m often a fan of loosely typed APIs since they tend to cause less
>>> boilerplate code, but strong typing may indeed be a sound choice in this
>>> API.
>>>
>>> Jan Høydahl
>>>
>>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilansolr@gmail.com>:
>>>
>>> ?
>>> In my opinion we have to (and therefore will) ship at least a basic prod
>>> ready implementation on top of the API that does simple things (not sure
>>> about rack, but for example balance cores and disk size without co locating
>>> replicas of same shard on same node).
>>> Without such an implementation, I suspect adoption will be low.
>>> Moreover, it's always a lot more friendly to start coding from a working
>>> example than from scratch.
>>>
>>> Not clear to me what type of "alternative proposal" you're thinking of
>>> Jan. Alternative API proposal? Alternative approach to replace Autoscaling?
>>>
>>> Ilan
>>>
>>> Ilan
>>>
>>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan.asf@cominvent.com>
>>> wrote:
>>>
>>>> Important discussion indeed.
>>>>
>>>> I don’t have time to dive deep into the PR or make up my mind whether
>>>> there is a simpler and more future proof way of designing these APIs. But I
>>>> understand that autoscaling is a complex beast and it is important we get
>>>> it right.
>>>>
>>>> One question regarding having to write code vs config. Is the plan to
>>>> ship some very simple light weight default placement rules ootb that gives
>>>> 80% of users what they need with simple config, or would every user need to
>>>> write code to e.g. spread replicas across hosts/racks? I’d be interested in
>>>> seeing an alternative proposal laid out, perhaps not in code but with a
>>>> design that can be compared and discussed.
>>>>
>>>> Jan Høydahl
>>>>
>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonputman@gmail.com>:
>>>>
>>>> ?
>>>> I think this is a valid thing to discuss on the dev list, since this
>>>> isn't just about code comments.
>>>> It seems to me that Ilan wants to discuss the philosophy around how to
>>>> design plugins and the interfaces in Solr which the plugins will talk to.
>>>> This is broad and affects much more than just the Autoscaling
>>>> framework.
>>>>
>>>> As a community & product, we have so far agreed that Solr should be
>>>> lighter weight and additional features should live in plugins that are
>>>> managed separately from Solr itself.
>>>> At that point we need to think about the lifetime and support of these
>>>> plugins. People love to refactor stuff in the solr core, which before
>>>> plugins wasn't a large issue.
>>>> However if we are now intending for many customers to rely on plugins,
>>>> then we need to come up with standards and guarantees so that these plugins
>>>> don't:
>>>>
>>>> - Stall people from upgrading Solr (minor or major versions)
>>>> - Hinder the development of Solr Core
>>>> - Cause us more headaches trying to keep multiple repos of plugins
>>>> up to date with recent versions of Solr
>>>>
>>>>
>>>> I am not completely sure where I stand right now, but this is
>>>> definitely something that we should be thinking about when migrating all of
>>>> this functionality to plugins.
>>>>
>>>> - Houston
>>>>
>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <ishan@apache.org>
>>>> wrote:
>>>>
>>>>> I think we should move the discussion back to the PR because it has
>>>>> more context and inline comments are possible. Having this discussion in 4
>>>>> places (jira, pr, slack and dev list is very hard to keep track of).
>>>>>
>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> [I’m moving a discussion from the PR
>>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list
>>>>>> for a wider audience. This is about replacing the now (in master) gone
>>>>>> Autoscaling framework with a way for clients to write their customized
>>>>>> placement code]
>>>>>>
>>>>>> It took me a long time to write this mail and it's quite long, sorry.
>>>>>> Please anybody interested in the future of Autoscaling (not only
>>>>>> those I cc'ed) do read it and provide feedback. Very impacting decisions
>>>>>> have to be made now.
>>>>>>
>>>>>> Thanks Noble for your feedback.
>>>>>> I believe it is important that we are aligned on what we build here,
>>>>>> esp. at the early defining stages (now).
>>>>>>
>>>>>> Let me try to elaborate on your concerns and provide in general the
>>>>>> rationale behind the approach.
>>>>>>
>>>>>> *> Anyone who wishes to implement this should not require to learn a
>>>>>> lot before even getting started*
>>>>>> For somebody who knows Solr (what is a Node, Collection, Shard,
>>>>>> Replica) and basic notions related to Autoscaling (getting variables
>>>>>> representing current state to make decisions), there’s not much to learn.
>>>>>> The framework uses the same concepts, often with the same names.
>>>>>>
>>>>>> *> I don't believe we should have a set of interfaces that duplicate
>>>>>> existing classes just for this functionality.*
>>>>>> Where appropriate we can have existing classes be the implementations
>>>>>> for these interfaces and be passed to the plugins, that would be perfectly
>>>>>> ok. The proposal doesn’t include implementations at this stage, therefore
>>>>>> there’s no duplication, or not yet... (we must get the interfaces right and
>>>>>> agreed upon before implementation). If some interface methods in the
>>>>>> proposal have a different name from equivalent methods in internal classes
>>>>>> we plan to use, of course let's rename one or the other.
>>>>>>
>>>>>> Existing internal abstractions are most of the time concrete classes
>>>>>> and not interfaces (Replica, Slice, DocCollection, ClusterState).
>>>>>> Making these visible to contrib code living elsewhere is making future
>>>>>> refactoring hard and contrib code will most likely end up reaching to
>>>>>> methods it shouldn’t be using. If we define a clean set of interfaces for
>>>>>> plugins, I wouldn’t hesitate to break external plugins that reach out to
>>>>>> other internal Solr classes, but will make everything possible to keep the
>>>>>> API backward compatible so existing plugins can be recompiled without
>>>>>> change.
>>>>>>
>>>>>> *> 24 interfaces to do this is definitely over engineering*
>>>>>> I don’t consider the number of classes or interfaces a metric of
>>>>>> complexity or of engineering quality. There are sample
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>>>>>> plugin implementations to serve as a base for plugin writers (and for us
>>>>>> defining this framework) and I believe the process is relatively simple.
>>>>>> Trying to do the same things with existing Solr classes might prove a lot
>>>>>> harder (but might be worth the effort for comparison purposes to make sure
>>>>>> we agree on the approach? For example, getting sister replicas of a given
>>>>>> replica in the proposed API is: replica.getShard()
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>>>>>> .getReplicas()
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>>>>>> Doing so with the internal classes likely involves getting the
>>>>>> DocCollection and Slice name from the Replica, then get the
>>>>>> DocCollection from the cluster state, there get the Slice based on
>>>>>> its name and finally getReplicas() from the Slice). I consider the
>>>>>> role of this new framework is to make life as easy as possible for writing
>>>>>> placement code and the like, make life easy for us to maintain it, make it
>>>>>> easy to write a simulation engine (should be at least an order of magnitude
>>>>>> simpler than the previous one), etc.
>>>>>>
>>>>>> An example regarding readability and number of interfaces: rather
>>>>>> than defining an enum with runtime annotation for building its instances (
>>>>>> Variable.Type
>>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>>>>>> and then very generic access methods, the proposal defines a specific
>>>>>> interface for each “variable type” (called properties
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>>>>>> Rather than concatenating strings to specify the data to return from a
>>>>>> remote node (based on snitches
>>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>>>>>> see doc
>>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>>>>>> the proposal is explicit and strongly typed (here
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
>>>>>> to get a specific system property from a node). This definitely does
>>>>>> increase the number of interfaces, but reduces IMO the effort to code to
>>>>>> these abstractions and provides a lot more compile time and IDE assistance.
>>>>>>
>>>>>> Goal is to hide all the boilerplate code and machinery (and to a
>>>>>> point - complexity) in the implementations of these interfaces rather than
>>>>>> have each plugin writer deal with the same problems.
>>>>>>
>>>>>> We’re moving from something that was complex and hard to read and
>>>>>> debug yet functionally extremely rich, to something simpler for us, more
>>>>>> demanding for users (write code rather than policy config if there's a need
>>>>>> for new behavior) but that should not be less "expressive" in any
>>>>>> significant way. One could even imagine reimplementing the former
>>>>>> Autoscaling config Domain Specific Language on top of these API (maybe as a
>>>>>> summer internship project :)
>>>>>>
>>>>>> *> This is a common mistake that we all do. When we design a feature
>>>>>> we think that is the most important thing.*
>>>>>> If by *"most important thing"* you mean investing the best
>>>>>> reasonable effort to do things right then yes.
>>>>>> If you mean trying to make a minor feature look more important and
>>>>>> inflated than it is, I disagree.
>>>>>> As a personal note, replica placement is not the aspect of SolrCloud
>>>>>> I'm most interested in, but the first bottleneck we hit when pushing the
>>>>>> scale of SolrCloud. I approach this with a state of mind "let's do it right
>>>>>> and get it out of the way" to move to topics I really want to work on
>>>>>> (around distribution in SolrCloud and the role of Overseer). Implementing
>>>>>> Autoscaling in a way that simplifies future refactoring (or that does not
>>>>>> make them harder than they already are) is therefore *very high* on
>>>>>> my priority list, to support modest changes (Slice to Shard
>>>>>> renaming) and more ambitious ones (replacing Zookeeper, removing Overseer,
>>>>>> you name it).
>>>>>>
>>>>>> Thanks for reading, again sorry for the long email, but I hope this
>>>>>> helps (at least helps the discussion),
>>>>>> Ilan
>>>>>>
>>>>>>
>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I don't believe we should have a set of interfaces that duplicate
>>>>>>> existing classes just for this functionality. This is a common mistake that
>>>>>>> we all do. When we design a feature we think that is the most important
>>>>>>> thing. We endup over designing and over engineering things. This feature
>>>>>>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>>>>>>> not require to learn a lot before even getting started. Let's try to have a
>>>>>>> minimal set of interfaces so that people who try to implement them do not
>>>>>>> have a huge learning cure.
>>>>>>>
>>>>>>> Let's try to understand the requirement
>>>>>>>
>>>>>>> - Solr wants a set of positions to place a few replicas
>>>>>>> - The implementation wants to know what is the current state of
>>>>>>> the cluster so that it can make those decisions
>>>>>>>
>>>>>>> 24 interfaces to do this is definitely over engineering
>>>>>>>
>>>>>>> —
>>>>>>> You are receiving this because you authored the thread.
>>>>>>> Reply to this email directly, view it on GitHub
>>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>>>>>> or unsubscribe
>>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>
Re: Approach for a new Autoscaling framework [ In reply to ]
Varun, you're correct.
This PR was built based on what's needed for creation (easiest starting
point for me and likely most urgent need). It's still totally WIP and
following steps include building the API required for move and other
placement based needs, then also everything related to triggers (see the
Jira).

Collection API commands (Solr provided implementation, not a plug-in) will
build the requests they need, then call the plug-in (custom one or a defaut
one), and use the returned "work items" (more types of work items will be
introduced of course) to do the job (know where to place or where to move
or what to remove or add etc.)

Ilan

Le dim. 26 juil. 2020 à 04:13, Varun Thacker <varun@vthacker.in> a écrit :

> Hi Ilan,
>
> I like where we're going with
> https://github.com/apache/lucene-solr/pull/1684 . Correct me if I am
> wrong, but my understanding of this PR is we're defining the interfaces for
> creating policies
>
> What's not clear to me is how will existing collection APIs like
> create-collections/add-replica etc make use of it? Is that something that
> has been discussed somewhere that I could read up on?
>
>
>
> On Sat, Jul 25, 2020 at 2:03 PM Ilan Ginzburg <ilansolr@gmail.com> wrote:
>
>> Thanks Gus!
>> This makes a lot of sense but significantly increases IMO the scope and
>> effort to define an "Autoscaling" framework interface.
>>
>> I'd be happy to try to see what concepts could be shared and how a
>> generic plugin facade could be defined.
>>
>> What are the other types of plugins that would share such a unified
>> approach? Do they already exist under another form or are just projects at
>> this stage, like Autoscaling plugins?
>>
>> But... Assuming this is the first "facade" layer to be defined between
>> Solr and external code, it might be hard to make it generic and get it
>> right. There's value in starting simple, understanding the tradeoffs and
>> generalizing later.
>>
>> Also I'd like to make sure we're not paying a performance "genericity
>> tax" in Autoscaling for unneeded features.
>>
>> Ilan
>>
>> Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.heck@gmail.com> a écrit :
>>
>>> Scanned through the PR and read some of this thread. I likely have
>>> missed much other discussion, so forgive me if I'm dredging up somethings
>>> that are already discussed elsewhere.
>>>
>>> The idea of designing the interfaces defining what information is
>>> available seems good here, but I worry that it's too auto-scaling focused.
>>> In my imagination, I would see solr having a standard informational
>>> interface that is useful to any plugin of any sort. Autoscaling should be
>>> leveraging that and we should be enhancing that to enable autoscaling. The
>>> current state of the system is one key type of information, but another
>>> type of information that should exist within solr and be exposed to plugins
>>> (including autoscaling) is events. When a new node joins there should be an
>>> event for example so that plugins can listen for that rather than
>>> incessantly polling and comparing the list of 100 nodes to a cached list of
>>> 100 nodes.
>>>
>>> In the PR I see a bunch of classes all off in a separate package, which
>>> looks like an autoscaling fiefdom which will be tempted if not forced to
>>> duplicate lots of stuff relative to other plugins and/or core.
>>>
>>> As a side note I would think the metrics system could be a plugin that
>>> leverages the same set of informational interfaces....
>>>
>>> So there should be 3 parts to this as I imagine it.
>>>
>>> 1) Enhancements to the **plugin system** that make information about the
>>> cluster available solr to ALL plugins
>>> 2) Enhancements to the **plugin system** API's provided to ALL plugins
>>> that allow them to mutate solr safely.
>>> 3) A plugin that we intend to support for our users currently using auto
>>> scaling utilizes the enhanced information to provide a similar level of
>>> functionality as is *promised* by our current documentation of autoscaling,
>>> there might be some gaps or differences but we should be discussing what
>>> they are and providing recommended workarounds for users that relied on
>>> those promises to the users. Even if there were cases where we failed to
>>> deliver, if there were at least some conditions under which we could
>>> deliver the promised functionality those should be supported. Only if we
>>> never were able to deliver and it never worked under any circumstance
>>> should we rip stuff out entirely.
>>>
>>> Implicit in the above is the concept that there should be a facade
>>> between plugins and the core of solr.
>>>
>>> WRT #1 which will necessarily involve information collected from remote
>>> nodes, we need to be designing that thinking about what informational
>>> guarantees it provides. Latency, consistency, delivery, etc. We also need
>>> to think about what is exposed in a read-only fashion vs what plugins might
>>> write back to solr. Certainly there will be a lot of information that most
>>> plugins ignore, and we might consider having groupings of information and
>>> interfaces or annotations that indicate what info is provided, but the
>>> simplest default state is to just give plugins a reference to a class that
>>> they can use to drill into information about the cluster as needed.
>>> (SolrInformationBooth? ... or less tongue in cheek... enhance
>>> SolrInfoBean? )
>>>
>>> Finally a fourth thing that occurs to me as I write is we need to
>>> consider what information one plugin might make available to the rest of
>>> the solr plugins. This might come later, and is hard because it's very hard
>>> to anticipate what info might be generated by unknown plugins in the future.
>>>
>>> So some humorous, not seriously suggested but hopefully memorable class
>>> names encapsulating the concepts:
>>>
>>> SolrInformationBooth (place to query)
>>> SolrLoudspeaker (event announcements)
>>> SolrControlLevers (mutate solr cluster)
>>> SolrPluginFacebookPage (info published by the plugin that others can
>>> watch)
>>>
>>> The "facade" provided to plugins by the plugin system should grow and
>>> expand such that more and more plugins can rely on it. This effort should
>>> grow it enough to move autoscaling onto it without dropping (much)
>>> functionality that we've previously published.
>>>
>>> -Gus
>>>
>>> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan.asf@cominvent.com>
>>> wrote:
>>>
>>>> Not clear to me what type of "alternative proposal" you're thinking of
>>>> Jan
>>>>
>>>>
>>>> That would be the responsibility of Noble and others who have concerns
>>>> to detail - and try convince other peers.
>>>> It’s hard for me as a spectator to know whether to agree with Noble
>>>> without a clear picture of what the alternative API or approach would look
>>>> like.
>>>> I’m often a fan of loosely typed APIs since they tend to cause less
>>>> boilerplate code, but strong typing may indeed be a sound choice in this
>>>> API.
>>>>
>>>> Jan Høydahl
>>>>
>>>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilansolr@gmail.com>:
>>>>
>>>> ?
>>>> In my opinion we have to (and therefore will) ship at least a basic
>>>> prod ready implementation on top of the API that does simple things (not
>>>> sure about rack, but for example balance cores and disk size without co
>>>> locating replicas of same shard on same node).
>>>> Without such an implementation, I suspect adoption will be low.
>>>> Moreover, it's always a lot more friendly to start coding from a working
>>>> example than from scratch.
>>>>
>>>> Not clear to me what type of "alternative proposal" you're thinking of
>>>> Jan. Alternative API proposal? Alternative approach to replace Autoscaling?
>>>>
>>>> Ilan
>>>>
>>>> Ilan
>>>>
>>>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan.asf@cominvent.com>
>>>> wrote:
>>>>
>>>>> Important discussion indeed.
>>>>>
>>>>> I don’t have time to dive deep into the PR or make up my mind whether
>>>>> there is a simpler and more future proof way of designing these APIs. But I
>>>>> understand that autoscaling is a complex beast and it is important we get
>>>>> it right.
>>>>>
>>>>> One question regarding having to write code vs config. Is the plan to
>>>>> ship some very simple light weight default placement rules ootb that gives
>>>>> 80% of users what they need with simple config, or would every user need to
>>>>> write code to e.g. spread replicas across hosts/racks? I’d be interested in
>>>>> seeing an alternative proposal laid out, perhaps not in code but with a
>>>>> design that can be compared and discussed.
>>>>>
>>>>> Jan Høydahl
>>>>>
>>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonputman@gmail.com
>>>>> >:
>>>>>
>>>>> ?
>>>>> I think this is a valid thing to discuss on the dev list, since this
>>>>> isn't just about code comments.
>>>>> It seems to me that Ilan wants to discuss the philosophy around how to
>>>>> design plugins and the interfaces in Solr which the plugins will talk to.
>>>>> This is broad and affects much more than just the Autoscaling
>>>>> framework.
>>>>>
>>>>> As a community & product, we have so far agreed that Solr should be
>>>>> lighter weight and additional features should live in plugins that are
>>>>> managed separately from Solr itself.
>>>>> At that point we need to think about the lifetime and support of these
>>>>> plugins. People love to refactor stuff in the solr core, which before
>>>>> plugins wasn't a large issue.
>>>>> However if we are now intending for many customers to rely on plugins,
>>>>> then we need to come up with standards and guarantees so that these plugins
>>>>> don't:
>>>>>
>>>>> - Stall people from upgrading Solr (minor or major versions)
>>>>> - Hinder the development of Solr Core
>>>>> - Cause us more headaches trying to keep multiple repos of plugins
>>>>> up to date with recent versions of Solr
>>>>>
>>>>>
>>>>> I am not completely sure where I stand right now, but this is
>>>>> definitely something that we should be thinking about when migrating all of
>>>>> this functionality to plugins.
>>>>>
>>>>> - Houston
>>>>>
>>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <ishan@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I think we should move the discussion back to the PR because it has
>>>>>> more context and inline comments are possible. Having this discussion in 4
>>>>>> places (jira, pr, slack and dev list is very hard to keep track of).
>>>>>>
>>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> [I’m moving a discussion from the PR
>>>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list
>>>>>>> for a wider audience. This is about replacing the now (in master) gone
>>>>>>> Autoscaling framework with a way for clients to write their customized
>>>>>>> placement code]
>>>>>>>
>>>>>>> It took me a long time to write this mail and it's quite long, sorry.
>>>>>>> Please anybody interested in the future of Autoscaling (not only
>>>>>>> those I cc'ed) do read it and provide feedback. Very impacting decisions
>>>>>>> have to be made now.
>>>>>>>
>>>>>>> Thanks Noble for your feedback.
>>>>>>> I believe it is important that we are aligned on what we build here,
>>>>>>> esp. at the early defining stages (now).
>>>>>>>
>>>>>>> Let me try to elaborate on your concerns and provide in general the
>>>>>>> rationale behind the approach.
>>>>>>>
>>>>>>> *> Anyone who wishes to implement this should not require to learn a
>>>>>>> lot before even getting started*
>>>>>>> For somebody who knows Solr (what is a Node, Collection, Shard,
>>>>>>> Replica) and basic notions related to Autoscaling (getting variables
>>>>>>> representing current state to make decisions), there’s not much to learn.
>>>>>>> The framework uses the same concepts, often with the same names.
>>>>>>>
>>>>>>> *> I don't believe we should have a set of interfaces that duplicate
>>>>>>> existing classes just for this functionality.*
>>>>>>> Where appropriate we can have existing classes be the
>>>>>>> implementations for these interfaces and be passed to the plugins, that
>>>>>>> would be perfectly ok. The proposal doesn’t include implementations at this
>>>>>>> stage, therefore there’s no duplication, or not yet... (we must get the
>>>>>>> interfaces right and agreed upon before implementation). If some interface
>>>>>>> methods in the proposal have a different name from equivalent methods in
>>>>>>> internal classes we plan to use, of course let's rename one or the other.
>>>>>>>
>>>>>>> Existing internal abstractions are most of the time concrete classes
>>>>>>> and not interfaces (Replica, Slice, DocCollection, ClusterState).
>>>>>>> Making these visible to contrib code living elsewhere is making future
>>>>>>> refactoring hard and contrib code will most likely end up reaching to
>>>>>>> methods it shouldn’t be using. If we define a clean set of interfaces for
>>>>>>> plugins, I wouldn’t hesitate to break external plugins that reach out to
>>>>>>> other internal Solr classes, but will make everything possible to keep the
>>>>>>> API backward compatible so existing plugins can be recompiled without
>>>>>>> change.
>>>>>>>
>>>>>>> *> 24 interfaces to do this is definitely over engineering*
>>>>>>> I don’t consider the number of classes or interfaces a metric of
>>>>>>> complexity or of engineering quality. There are sample
>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>>>>>>> plugin implementations to serve as a base for plugin writers (and for us
>>>>>>> defining this framework) and I believe the process is relatively simple.
>>>>>>> Trying to do the same things with existing Solr classes might prove a lot
>>>>>>> harder (but might be worth the effort for comparison purposes to make sure
>>>>>>> we agree on the approach? For example, getting sister replicas of a given
>>>>>>> replica in the proposed API is: replica.getShard()
>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>>>>>>> .getReplicas()
>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>>>>>>> Doing so with the internal classes likely involves getting the
>>>>>>> DocCollection and Slice name from the Replica, then get the
>>>>>>> DocCollection from the cluster state, there get the Slice based on
>>>>>>> its name and finally getReplicas() from the Slice). I consider the
>>>>>>> role of this new framework is to make life as easy as possible for writing
>>>>>>> placement code and the like, make life easy for us to maintain it, make it
>>>>>>> easy to write a simulation engine (should be at least an order of magnitude
>>>>>>> simpler than the previous one), etc.
>>>>>>>
>>>>>>> An example regarding readability and number of interfaces: rather
>>>>>>> than defining an enum with runtime annotation for building its instances (
>>>>>>> Variable.Type
>>>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>>>>>>> and then very generic access methods, the proposal defines a specific
>>>>>>> interface for each “variable type” (called properties
>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>>>>>>> Rather than concatenating strings to specify the data to return from a
>>>>>>> remote node (based on snitches
>>>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>>>>>>> see doc
>>>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>>>>>>> the proposal is explicit and strongly typed (here
>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
>>>>>>> to get a specific system property from a node). This definitely does
>>>>>>> increase the number of interfaces, but reduces IMO the effort to code to
>>>>>>> these abstractions and provides a lot more compile time and IDE assistance.
>>>>>>>
>>>>>>> Goal is to hide all the boilerplate code and machinery (and to a
>>>>>>> point - complexity) in the implementations of these interfaces rather than
>>>>>>> have each plugin writer deal with the same problems.
>>>>>>>
>>>>>>> We’re moving from something that was complex and hard to read and
>>>>>>> debug yet functionally extremely rich, to something simpler for us, more
>>>>>>> demanding for users (write code rather than policy config if there's a need
>>>>>>> for new behavior) but that should not be less "expressive" in any
>>>>>>> significant way. One could even imagine reimplementing the former
>>>>>>> Autoscaling config Domain Specific Language on top of these API (maybe as a
>>>>>>> summer internship project :)
>>>>>>>
>>>>>>> *> This is a common mistake that we all do. When we design a feature
>>>>>>> we think that is the most important thing.*
>>>>>>> If by *"most important thing"* you mean investing the best
>>>>>>> reasonable effort to do things right then yes.
>>>>>>> If you mean trying to make a minor feature look more important and
>>>>>>> inflated than it is, I disagree.
>>>>>>> As a personal note, replica placement is not the aspect of SolrCloud
>>>>>>> I'm most interested in, but the first bottleneck we hit when pushing the
>>>>>>> scale of SolrCloud. I approach this with a state of mind "let's do it right
>>>>>>> and get it out of the way" to move to topics I really want to work on
>>>>>>> (around distribution in SolrCloud and the role of Overseer). Implementing
>>>>>>> Autoscaling in a way that simplifies future refactoring (or that does not
>>>>>>> make them harder than they already are) is therefore *very high* on
>>>>>>> my priority list, to support modest changes (Slice to Shard
>>>>>>> renaming) and more ambitious ones (replacing Zookeeper, removing Overseer,
>>>>>>> you name it).
>>>>>>>
>>>>>>> Thanks for reading, again sorry for the long email, but I hope this
>>>>>>> helps (at least helps the discussion),
>>>>>>> Ilan
>>>>>>>
>>>>>>>
>>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I don't believe we should have a set of interfaces that duplicate
>>>>>>>> existing classes just for this functionality. This is a common mistake that
>>>>>>>> we all do. When we design a feature we think that is the most important
>>>>>>>> thing. We endup over designing and over engineering things. This feature
>>>>>>>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>>>>>>>> not require to learn a lot before even getting started. Let's try to have a
>>>>>>>> minimal set of interfaces so that people who try to implement them do not
>>>>>>>> have a huge learning cure.
>>>>>>>>
>>>>>>>> Let's try to understand the requirement
>>>>>>>>
>>>>>>>> - Solr wants a set of positions to place a few replicas
>>>>>>>> - The implementation wants to know what is the current state of
>>>>>>>> the cluster so that it can make those decisions
>>>>>>>>
>>>>>>>> 24 interfaces to do this is definitely over engineering
>>>>>>>>
>>>>>>>> —
>>>>>>>> You are receiving this because you authored the thread.
>>>>>>>> Reply to this email directly, view it on GitHub
>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>>>>>>> or unsubscribe
>>>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>>>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>
Re: Approach for a new Autoscaling framework [ In reply to ]
"There's value in starting simple, understanding the tradeoffs and
generalizing later"

Yes, this is what I am alluding to when I said the facade should "grow and
expand such that more plugins can rely on it". One plugin that might
re-use some of the same informational API's is the HealthCheckHandler. One
of the difficult things about the current situation is that it is quite
difficult to identify what is and is not a plugin in the code base.
PluginBag<T> means that any interface could indicate a plugin if someone
has added a plugin bag for that type somewhere... I think we really ought
to have a base interface for all plugins. Another issue is that the
"plug-in" designation doesn't really have a lot of meaning in solr other
than it can be loaded by a config somewhere. Metrics currently have
reporters that "plug in" but are held in a plain hashmap not a PluginBag.
The metrics system is another place where the info API's used by auto
scaling might be reused.

As I understand it, the state of the plugin system is that we've plumbed in
tools for fetching jars, replicating them to the nodes, installing them at
the cluster level and deploying them which is all great (I learned much of
this from Dave Smiley's talk at a local metup
<https://servicenow.zoom.us/rec/play/uJd7dbipqj43ToWVtgSDUfYqW9S-Lqis0iAa_aYEmRy9BnkBM1aiNLAXNrCbLaJTTjnUcxGOkabW7NjC>
-
starts at 0:30:29) . I'm unaware of what's been done relative to the
environment in which plugins live and may not be up to date on developments
since then. As painful as it is, I think we will regret it if we don't
cordon off internal (behind the facade) and external (the facade) api's for
plugins. Other issues that come up include class-loading especially as
concerns different versions of dependencies... differing from something in
solr and differing from something loaded by some other plugin. I
recently did some work in JesterJ to enable such, and I would like to look
at what we are doing there (in some other disucssion, perhaps after I
refresh my memory on SolrResourceLoader's details etc) . I suspect that has
been thought about. Also, there should be an easy way for a plugin to load
a resource and have it be shared across cores. Many moons ago I contributed
a means by which to share large resources across cores and for
multi-tenant systems with lots of collections all created from the same
config, this can be important (the motivating case involved a gigabyte
sized map of geo locations to be applied at query time that was being
duplicated across 40 cores and crushing the servers).

So something has to be the starting point for thinking about the
"environment" in which plugins reside once they are deployed... Autoscaling
poked its head up ;)... I think it's going to be big enough we really
should build it on a foundation we want to keep.

The alternative of course is that everything in solr is available for
plugins to touch and we just break any number of plugins any time we change
anything... That's the current state really, but the majority of useful
plugins are in the same code base and have unit tests that force us to
maintain them. As that goes away, and as we (hopefully) encourage an
ecosystem of 3rd party plugins this becomes a bigger issue because we *won't
know* what we are breaking.

-Gus

On Sat, Jul 25, 2020 at 5:03 PM Ilan Ginzburg <ilansolr@gmail.com> wrote:

> Thanks Gus!
> This makes a lot of sense but significantly increases IMO the scope and
> effort to define an "Autoscaling" framework interface.
>
> I'd be happy to try to see what concepts could be shared and how a generic
> plugin facade could be defined.
>
> What are the other types of plugins that would share such a unified
> approach? Do they already exist under another form or are just projects at
> this stage, like Autoscaling plugins?
>
> But... Assuming this is the first "facade" layer to be defined between
> Solr and external code, it might be hard to make it generic and get it
> right. There's value in starting simple, understanding the tradeoffs and
> generalizing later.
>
> Also I'd like to make sure we're not paying a performance "genericity tax"
> in Autoscaling for unneeded features.
>
> Ilan
>
> Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.heck@gmail.com> a écrit :
>
>> Scanned through the PR and read some of this thread. I likely have missed
>> much other discussion, so forgive me if I'm dredging up somethings that are
>> already discussed elsewhere.
>>
>> The idea of designing the interfaces defining what information is
>> available seems good here, but I worry that it's too auto-scaling focused.
>> In my imagination, I would see solr having a standard informational
>> interface that is useful to any plugin of any sort. Autoscaling should be
>> leveraging that and we should be enhancing that to enable autoscaling. The
>> current state of the system is one key type of information, but another
>> type of information that should exist within solr and be exposed to plugins
>> (including autoscaling) is events. When a new node joins there should be an
>> event for example so that plugins can listen for that rather than
>> incessantly polling and comparing the list of 100 nodes to a cached list of
>> 100 nodes.
>>
>> In the PR I see a bunch of classes all off in a separate package, which
>> looks like an autoscaling fiefdom which will be tempted if not forced to
>> duplicate lots of stuff relative to other plugins and/or core.
>>
>> As a side note I would think the metrics system could be a plugin that
>> leverages the same set of informational interfaces....
>>
>> So there should be 3 parts to this as I imagine it.
>>
>> 1) Enhancements to the **plugin system** that make information about the
>> cluster available solr to ALL plugins
>> 2) Enhancements to the **plugin system** API's provided to ALL plugins
>> that allow them to mutate solr safely.
>> 3) A plugin that we intend to support for our users currently using auto
>> scaling utilizes the enhanced information to provide a similar level of
>> functionality as is *promised* by our current documentation of autoscaling,
>> there might be some gaps or differences but we should be discussing what
>> they are and providing recommended workarounds for users that relied on
>> those promises to the users. Even if there were cases where we failed to
>> deliver, if there were at least some conditions under which we could
>> deliver the promised functionality those should be supported. Only if we
>> never were able to deliver and it never worked under any circumstance
>> should we rip stuff out entirely.
>>
>> Implicit in the above is the concept that there should be a facade
>> between plugins and the core of solr.
>>
>> WRT #1 which will necessarily involve information collected from remote
>> nodes, we need to be designing that thinking about what informational
>> guarantees it provides. Latency, consistency, delivery, etc. We also need
>> to think about what is exposed in a read-only fashion vs what plugins might
>> write back to solr. Certainly there will be a lot of information that most
>> plugins ignore, and we might consider having groupings of information and
>> interfaces or annotations that indicate what info is provided, but the
>> simplest default state is to just give plugins a reference to a class that
>> they can use to drill into information about the cluster as needed.
>> (SolrInformationBooth? ... or less tongue in cheek... enhance
>> SolrInfoBean? )
>>
>> Finally a fourth thing that occurs to me as I write is we need to
>> consider what information one plugin might make available to the rest of
>> the solr plugins. This might come later, and is hard because it's very hard
>> to anticipate what info might be generated by unknown plugins in the future.
>>
>> So some humorous, not seriously suggested but hopefully memorable class
>> names encapsulating the concepts:
>>
>> SolrInformationBooth (place to query)
>> SolrLoudspeaker (event announcements)
>> SolrControlLevers (mutate solr cluster)
>> SolrPluginFacebookPage (info published by the plugin that others can
>> watch)
>>
>> The "facade" provided to plugins by the plugin system should grow and
>> expand such that more and more plugins can rely on it. This effort should
>> grow it enough to move autoscaling onto it without dropping (much)
>> functionality that we've previously published.
>>
>> -Gus
>>
>> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan.asf@cominvent.com>
>> wrote:
>>
>>> Not clear to me what type of "alternative proposal" you're thinking of
>>> Jan
>>>
>>>
>>> That would be the responsibility of Noble and others who have concerns
>>> to detail - and try convince other peers.
>>> It’s hard for me as a spectator to know whether to agree with Noble
>>> without a clear picture of what the alternative API or approach would look
>>> like.
>>> I’m often a fan of loosely typed APIs since they tend to cause less
>>> boilerplate code, but strong typing may indeed be a sound choice in this
>>> API.
>>>
>>> Jan Høydahl
>>>
>>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilansolr@gmail.com>:
>>>
>>> ?
>>> In my opinion we have to (and therefore will) ship at least a basic prod
>>> ready implementation on top of the API that does simple things (not sure
>>> about rack, but for example balance cores and disk size without co locating
>>> replicas of same shard on same node).
>>> Without such an implementation, I suspect adoption will be low.
>>> Moreover, it's always a lot more friendly to start coding from a working
>>> example than from scratch.
>>>
>>> Not clear to me what type of "alternative proposal" you're thinking of
>>> Jan. Alternative API proposal? Alternative approach to replace Autoscaling?
>>>
>>> Ilan
>>>
>>> Ilan
>>>
>>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan.asf@cominvent.com>
>>> wrote:
>>>
>>>> Important discussion indeed.
>>>>
>>>> I don’t have time to dive deep into the PR or make up my mind whether
>>>> there is a simpler and more future proof way of designing these APIs. But I
>>>> understand that autoscaling is a complex beast and it is important we get
>>>> it right.
>>>>
>>>> One question regarding having to write code vs config. Is the plan to
>>>> ship some very simple light weight default placement rules ootb that gives
>>>> 80% of users what they need with simple config, or would every user need to
>>>> write code to e.g. spread replicas across hosts/racks? I’d be interested in
>>>> seeing an alternative proposal laid out, perhaps not in code but with a
>>>> design that can be compared and discussed.
>>>>
>>>> Jan Høydahl
>>>>
>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonputman@gmail.com>:
>>>>
>>>> ?
>>>> I think this is a valid thing to discuss on the dev list, since this
>>>> isn't just about code comments.
>>>> It seems to me that Ilan wants to discuss the philosophy around how to
>>>> design plugins and the interfaces in Solr which the plugins will talk to.
>>>> This is broad and affects much more than just the Autoscaling
>>>> framework.
>>>>
>>>> As a community & product, we have so far agreed that Solr should be
>>>> lighter weight and additional features should live in plugins that are
>>>> managed separately from Solr itself.
>>>> At that point we need to think about the lifetime and support of these
>>>> plugins. People love to refactor stuff in the solr core, which before
>>>> plugins wasn't a large issue.
>>>> However if we are now intending for many customers to rely on plugins,
>>>> then we need to come up with standards and guarantees so that these plugins
>>>> don't:
>>>>
>>>> - Stall people from upgrading Solr (minor or major versions)
>>>> - Hinder the development of Solr Core
>>>> - Cause us more headaches trying to keep multiple repos of plugins
>>>> up to date with recent versions of Solr
>>>>
>>>>
>>>> I am not completely sure where I stand right now, but this is
>>>> definitely something that we should be thinking about when migrating all of
>>>> this functionality to plugins.
>>>>
>>>> - Houston
>>>>
>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <ishan@apache.org>
>>>> wrote:
>>>>
>>>>> I think we should move the discussion back to the PR because it has
>>>>> more context and inline comments are possible. Having this discussion in 4
>>>>> places (jira, pr, slack and dev list is very hard to keep track of).
>>>>>
>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> [I’m moving a discussion from the PR
>>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list
>>>>>> for a wider audience. This is about replacing the now (in master) gone
>>>>>> Autoscaling framework with a way for clients to write their customized
>>>>>> placement code]
>>>>>>
>>>>>> It took me a long time to write this mail and it's quite long, sorry.
>>>>>> Please anybody interested in the future of Autoscaling (not only
>>>>>> those I cc'ed) do read it and provide feedback. Very impacting decisions
>>>>>> have to be made now.
>>>>>>
>>>>>> Thanks Noble for your feedback.
>>>>>> I believe it is important that we are aligned on what we build here,
>>>>>> esp. at the early defining stages (now).
>>>>>>
>>>>>> Let me try to elaborate on your concerns and provide in general the
>>>>>> rationale behind the approach.
>>>>>>
>>>>>> *> Anyone who wishes to implement this should not require to learn a
>>>>>> lot before even getting started*
>>>>>> For somebody who knows Solr (what is a Node, Collection, Shard,
>>>>>> Replica) and basic notions related to Autoscaling (getting variables
>>>>>> representing current state to make decisions), there’s not much to learn.
>>>>>> The framework uses the same concepts, often with the same names.
>>>>>>
>>>>>> *> I don't believe we should have a set of interfaces that duplicate
>>>>>> existing classes just for this functionality.*
>>>>>> Where appropriate we can have existing classes be the implementations
>>>>>> for these interfaces and be passed to the plugins, that would be perfectly
>>>>>> ok. The proposal doesn’t include implementations at this stage, therefore
>>>>>> there’s no duplication, or not yet... (we must get the interfaces right and
>>>>>> agreed upon before implementation). If some interface methods in the
>>>>>> proposal have a different name from equivalent methods in internal classes
>>>>>> we plan to use, of course let's rename one or the other.
>>>>>>
>>>>>> Existing internal abstractions are most of the time concrete classes
>>>>>> and not interfaces (Replica, Slice, DocCollection, ClusterState).
>>>>>> Making these visible to contrib code living elsewhere is making future
>>>>>> refactoring hard and contrib code will most likely end up reaching to
>>>>>> methods it shouldn’t be using. If we define a clean set of interfaces for
>>>>>> plugins, I wouldn’t hesitate to break external plugins that reach out to
>>>>>> other internal Solr classes, but will make everything possible to keep the
>>>>>> API backward compatible so existing plugins can be recompiled without
>>>>>> change.
>>>>>>
>>>>>> *> 24 interfaces to do this is definitely over engineering*
>>>>>> I don’t consider the number of classes or interfaces a metric of
>>>>>> complexity or of engineering quality. There are sample
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>>>>>> plugin implementations to serve as a base for plugin writers (and for us
>>>>>> defining this framework) and I believe the process is relatively simple.
>>>>>> Trying to do the same things with existing Solr classes might prove a lot
>>>>>> harder (but might be worth the effort for comparison purposes to make sure
>>>>>> we agree on the approach? For example, getting sister replicas of a given
>>>>>> replica in the proposed API is: replica.getShard()
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>>>>>> .getReplicas()
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>>>>>> Doing so with the internal classes likely involves getting the
>>>>>> DocCollection and Slice name from the Replica, then get the
>>>>>> DocCollection from the cluster state, there get the Slice based on
>>>>>> its name and finally getReplicas() from the Slice). I consider the
>>>>>> role of this new framework is to make life as easy as possible for writing
>>>>>> placement code and the like, make life easy for us to maintain it, make it
>>>>>> easy to write a simulation engine (should be at least an order of magnitude
>>>>>> simpler than the previous one), etc.
>>>>>>
>>>>>> An example regarding readability and number of interfaces: rather
>>>>>> than defining an enum with runtime annotation for building its instances (
>>>>>> Variable.Type
>>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>>>>>> and then very generic access methods, the proposal defines a specific
>>>>>> interface for each “variable type” (called properties
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>>>>>> Rather than concatenating strings to specify the data to return from a
>>>>>> remote node (based on snitches
>>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>>>>>> see doc
>>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>>>>>> the proposal is explicit and strongly typed (here
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
>>>>>> to get a specific system property from a node). This definitely does
>>>>>> increase the number of interfaces, but reduces IMO the effort to code to
>>>>>> these abstractions and provides a lot more compile time and IDE assistance.
>>>>>>
>>>>>> Goal is to hide all the boilerplate code and machinery (and to a
>>>>>> point - complexity) in the implementations of these interfaces rather than
>>>>>> have each plugin writer deal with the same problems.
>>>>>>
>>>>>> We’re moving from something that was complex and hard to read and
>>>>>> debug yet functionally extremely rich, to something simpler for us, more
>>>>>> demanding for users (write code rather than policy config if there's a need
>>>>>> for new behavior) but that should not be less "expressive" in any
>>>>>> significant way. One could even imagine reimplementing the former
>>>>>> Autoscaling config Domain Specific Language on top of these API (maybe as a
>>>>>> summer internship project :)
>>>>>>
>>>>>> *> This is a common mistake that we all do. When we design a feature
>>>>>> we think that is the most important thing.*
>>>>>> If by *"most important thing"* you mean investing the best
>>>>>> reasonable effort to do things right then yes.
>>>>>> If you mean trying to make a minor feature look more important and
>>>>>> inflated than it is, I disagree.
>>>>>> As a personal note, replica placement is not the aspect of SolrCloud
>>>>>> I'm most interested in, but the first bottleneck we hit when pushing the
>>>>>> scale of SolrCloud. I approach this with a state of mind "let's do it right
>>>>>> and get it out of the way" to move to topics I really want to work on
>>>>>> (around distribution in SolrCloud and the role of Overseer). Implementing
>>>>>> Autoscaling in a way that simplifies future refactoring (or that does not
>>>>>> make them harder than they already are) is therefore *very high* on
>>>>>> my priority list, to support modest changes (Slice to Shard
>>>>>> renaming) and more ambitious ones (replacing Zookeeper, removing Overseer,
>>>>>> you name it).
>>>>>>
>>>>>> Thanks for reading, again sorry for the long email, but I hope this
>>>>>> helps (at least helps the discussion),
>>>>>> Ilan
>>>>>>
>>>>>>
>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I don't believe we should have a set of interfaces that duplicate
>>>>>>> existing classes just for this functionality. This is a common mistake that
>>>>>>> we all do. When we design a feature we think that is the most important
>>>>>>> thing. We endup over designing and over engineering things. This feature
>>>>>>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>>>>>>> not require to learn a lot before even getting started. Let's try to have a
>>>>>>> minimal set of interfaces so that people who try to implement them do not
>>>>>>> have a huge learning cure.
>>>>>>>
>>>>>>> Let's try to understand the requirement
>>>>>>>
>>>>>>> - Solr wants a set of positions to place a few replicas
>>>>>>> - The implementation wants to know what is the current state of
>>>>>>> the cluster so that it can make those decisions
>>>>>>>
>>>>>>> 24 interfaces to do this is definitely over engineering
>>>>>>>
>>>>>>> —
>>>>>>> You are receiving this because you authored the thread.
>>>>>>> Reply to this email directly, view it on GitHub
>>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>>>>>> or unsubscribe
>>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)
Re: Approach for a new Autoscaling framework [ In reply to ]
On Sun, Jul 26, 2020 at 1:05 AM Ilan Ginzburg <ilansolr@gmail.com> wrote:

> Varun, you're correct.
> This PR was built based on what's needed for creation (easiest starting
> point for me and likely most urgent need). It's still totally WIP and
> following steps include building the API required for move and other
> placement based needs, then also everything related to triggers (see the
> Jira).
>
> Collection API commands (Solr provided implementation, not a plug-in) will
> build the requests they need, then call the plug-in (custom one or a defaut
> one), and use the returned "work items" (more types of work items will be
> introduced of course) to do the job (know where to place or where to move
> or what to remove or add etc.)
>

This sounds perfect!

I'd be interested to see how can we use SamplePluginMinimizeCores for say
create collection but use FooPluginMinimizeLoad for add-replica

>
> Ilan
>
> Le dim. 26 juil. 2020 à 04:13, Varun Thacker <varun@vthacker.in> a écrit :
>
>> Hi Ilan,
>>
>> I like where we're going with
>> https://github.com/apache/lucene-solr/pull/1684 . Correct me if I am
>> wrong, but my understanding of this PR is we're defining the interfaces for
>> creating policies
>>
>> What's not clear to me is how will existing collection APIs like
>> create-collections/add-replica etc make use of it? Is that something that
>> has been discussed somewhere that I could read up on?
>>
>>
>>
>> On Sat, Jul 25, 2020 at 2:03 PM Ilan Ginzburg <ilansolr@gmail.com> wrote:
>>
>>> Thanks Gus!
>>> This makes a lot of sense but significantly increases IMO the scope and
>>> effort to define an "Autoscaling" framework interface.
>>>
>>> I'd be happy to try to see what concepts could be shared and how a
>>> generic plugin facade could be defined.
>>>
>>> What are the other types of plugins that would share such a unified
>>> approach? Do they already exist under another form or are just projects at
>>> this stage, like Autoscaling plugins?
>>>
>>> But... Assuming this is the first "facade" layer to be defined between
>>> Solr and external code, it might be hard to make it generic and get it
>>> right. There's value in starting simple, understanding the tradeoffs and
>>> generalizing later.
>>>
>>> Also I'd like to make sure we're not paying a performance "genericity
>>> tax" in Autoscaling for unneeded features.
>>>
>>> Ilan
>>>
>>> Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.heck@gmail.com> a écrit :
>>>
>>>> Scanned through the PR and read some of this thread. I likely have
>>>> missed much other discussion, so forgive me if I'm dredging up somethings
>>>> that are already discussed elsewhere.
>>>>
>>>> The idea of designing the interfaces defining what information is
>>>> available seems good here, but I worry that it's too auto-scaling focused.
>>>> In my imagination, I would see solr having a standard informational
>>>> interface that is useful to any plugin of any sort. Autoscaling should be
>>>> leveraging that and we should be enhancing that to enable autoscaling. The
>>>> current state of the system is one key type of information, but another
>>>> type of information that should exist within solr and be exposed to plugins
>>>> (including autoscaling) is events. When a new node joins there should be an
>>>> event for example so that plugins can listen for that rather than
>>>> incessantly polling and comparing the list of 100 nodes to a cached list of
>>>> 100 nodes.
>>>>
>>>> In the PR I see a bunch of classes all off in a separate package, which
>>>> looks like an autoscaling fiefdom which will be tempted if not forced to
>>>> duplicate lots of stuff relative to other plugins and/or core.
>>>>
>>>> As a side note I would think the metrics system could be a plugin that
>>>> leverages the same set of informational interfaces....
>>>>
>>>> So there should be 3 parts to this as I imagine it.
>>>>
>>>> 1) Enhancements to the **plugin system** that make information about
>>>> the cluster available solr to ALL plugins
>>>> 2) Enhancements to the **plugin system** API's provided to ALL plugins
>>>> that allow them to mutate solr safely.
>>>> 3) A plugin that we intend to support for our users currently using
>>>> auto scaling utilizes the enhanced information to provide a similar level
>>>> of functionality as is *promised* by our current documentation of
>>>> autoscaling, there might be some gaps or differences but we should be
>>>> discussing what they are and providing recommended workarounds for users
>>>> that relied on those promises to the users. Even if there were cases where
>>>> we failed to deliver, if there were at least some conditions under which we
>>>> could deliver the promised functionality those should be supported. Only if
>>>> we never were able to deliver and it never worked under any circumstance
>>>> should we rip stuff out entirely.
>>>>
>>>> Implicit in the above is the concept that there should be a facade
>>>> between plugins and the core of solr.
>>>>
>>>> WRT #1 which will necessarily involve information collected from remote
>>>> nodes, we need to be designing that thinking about what informational
>>>> guarantees it provides. Latency, consistency, delivery, etc. We also need
>>>> to think about what is exposed in a read-only fashion vs what plugins might
>>>> write back to solr. Certainly there will be a lot of information that most
>>>> plugins ignore, and we might consider having groupings of information and
>>>> interfaces or annotations that indicate what info is provided, but the
>>>> simplest default state is to just give plugins a reference to a class that
>>>> they can use to drill into information about the cluster as needed.
>>>> (SolrInformationBooth? ... or less tongue in cheek... enhance
>>>> SolrInfoBean? )
>>>>
>>>> Finally a fourth thing that occurs to me as I write is we need to
>>>> consider what information one plugin might make available to the rest of
>>>> the solr plugins. This might come later, and is hard because it's very hard
>>>> to anticipate what info might be generated by unknown plugins in the future.
>>>>
>>>> So some humorous, not seriously suggested but hopefully memorable class
>>>> names encapsulating the concepts:
>>>>
>>>> SolrInformationBooth (place to query)
>>>> SolrLoudspeaker (event announcements)
>>>> SolrControlLevers (mutate solr cluster)
>>>> SolrPluginFacebookPage (info published by the plugin that others can
>>>> watch)
>>>>
>>>> The "facade" provided to plugins by the plugin system should grow and
>>>> expand such that more and more plugins can rely on it. This effort should
>>>> grow it enough to move autoscaling onto it without dropping (much)
>>>> functionality that we've previously published.
>>>>
>>>> -Gus
>>>>
>>>> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan.asf@cominvent.com>
>>>> wrote:
>>>>
>>>>> Not clear to me what type of "alternative proposal" you're thinking of
>>>>> Jan
>>>>>
>>>>>
>>>>> That would be the responsibility of Noble and others who have concerns
>>>>> to detail - and try convince other peers.
>>>>> It’s hard for me as a spectator to know whether to agree with Noble
>>>>> without a clear picture of what the alternative API or approach would look
>>>>> like.
>>>>> I’m often a fan of loosely typed APIs since they tend to cause less
>>>>> boilerplate code, but strong typing may indeed be a sound choice in this
>>>>> API.
>>>>>
>>>>> Jan Høydahl
>>>>>
>>>>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilansolr@gmail.com>:
>>>>>
>>>>> ?
>>>>> In my opinion we have to (and therefore will) ship at least a basic
>>>>> prod ready implementation on top of the API that does simple things (not
>>>>> sure about rack, but for example balance cores and disk size without co
>>>>> locating replicas of same shard on same node).
>>>>> Without such an implementation, I suspect adoption will be low.
>>>>> Moreover, it's always a lot more friendly to start coding from a working
>>>>> example than from scratch.
>>>>>
>>>>> Not clear to me what type of "alternative proposal" you're thinking of
>>>>> Jan. Alternative API proposal? Alternative approach to replace Autoscaling?
>>>>>
>>>>> Ilan
>>>>>
>>>>> Ilan
>>>>>
>>>>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan.asf@cominvent.com>
>>>>> wrote:
>>>>>
>>>>>> Important discussion indeed.
>>>>>>
>>>>>> I don’t have time to dive deep into the PR or make up my mind whether
>>>>>> there is a simpler and more future proof way of designing these APIs. But I
>>>>>> understand that autoscaling is a complex beast and it is important we get
>>>>>> it right.
>>>>>>
>>>>>> One question regarding having to write code vs config. Is the plan to
>>>>>> ship some very simple light weight default placement rules ootb that gives
>>>>>> 80% of users what they need with simple config, or would every user need to
>>>>>> write code to e.g. spread replicas across hosts/racks? I’d be interested in
>>>>>> seeing an alternative proposal laid out, perhaps not in code but with a
>>>>>> design that can be compared and discussed.
>>>>>>
>>>>>> Jan Høydahl
>>>>>>
>>>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonputman@gmail.com
>>>>>> >:
>>>>>>
>>>>>> ?
>>>>>> I think this is a valid thing to discuss on the dev list, since this
>>>>>> isn't just about code comments.
>>>>>> It seems to me that Ilan wants to discuss the philosophy around how
>>>>>> to design plugins and the interfaces in Solr which the plugins will talk to.
>>>>>> This is broad and affects much more than just the Autoscaling
>>>>>> framework.
>>>>>>
>>>>>> As a community & product, we have so far agreed that Solr should be
>>>>>> lighter weight and additional features should live in plugins that are
>>>>>> managed separately from Solr itself.
>>>>>> At that point we need to think about the lifetime and support of
>>>>>> these plugins. People love to refactor stuff in the solr core, which before
>>>>>> plugins wasn't a large issue.
>>>>>> However if we are now intending for many customers to rely on
>>>>>> plugins, then we need to come up with standards and guarantees so that
>>>>>> these plugins don't:
>>>>>>
>>>>>> - Stall people from upgrading Solr (minor or major versions)
>>>>>> - Hinder the development of Solr Core
>>>>>> - Cause us more headaches trying to keep multiple repos of
>>>>>> plugins up to date with recent versions of Solr
>>>>>>
>>>>>>
>>>>>> I am not completely sure where I stand right now, but this is
>>>>>> definitely something that we should be thinking about when migrating all of
>>>>>> this functionality to plugins.
>>>>>>
>>>>>> - Houston
>>>>>>
>>>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <
>>>>>> ishan@apache.org> wrote:
>>>>>>
>>>>>>> I think we should move the discussion back to the PR because it has
>>>>>>> more context and inline comments are possible. Having this discussion in 4
>>>>>>> places (jira, pr, slack and dev list is very hard to keep track of).
>>>>>>>
>>>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> [I’m moving a discussion from the PR
>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>>>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list
>>>>>>>> for a wider audience. This is about replacing the now (in master) gone
>>>>>>>> Autoscaling framework with a way for clients to write their customized
>>>>>>>> placement code]
>>>>>>>>
>>>>>>>> It took me a long time to write this mail and it's quite long,
>>>>>>>> sorry.
>>>>>>>> Please anybody interested in the future of Autoscaling (not only
>>>>>>>> those I cc'ed) do read it and provide feedback. Very impacting decisions
>>>>>>>> have to be made now.
>>>>>>>>
>>>>>>>> Thanks Noble for your feedback.
>>>>>>>> I believe it is important that we are aligned on what we build
>>>>>>>> here, esp. at the early defining stages (now).
>>>>>>>>
>>>>>>>> Let me try to elaborate on your concerns and provide in general the
>>>>>>>> rationale behind the approach.
>>>>>>>>
>>>>>>>> *> Anyone who wishes to implement this should not require to learn
>>>>>>>> a lot before even getting started*
>>>>>>>> For somebody who knows Solr (what is a Node, Collection, Shard,
>>>>>>>> Replica) and basic notions related to Autoscaling (getting variables
>>>>>>>> representing current state to make decisions), there’s not much to learn.
>>>>>>>> The framework uses the same concepts, often with the same names.
>>>>>>>>
>>>>>>>> *> I don't believe we should have a set of interfaces that
>>>>>>>> duplicate existing classes just for this functionality.*
>>>>>>>> Where appropriate we can have existing classes be the
>>>>>>>> implementations for these interfaces and be passed to the plugins, that
>>>>>>>> would be perfectly ok. The proposal doesn’t include implementations at this
>>>>>>>> stage, therefore there’s no duplication, or not yet... (we must get the
>>>>>>>> interfaces right and agreed upon before implementation). If some interface
>>>>>>>> methods in the proposal have a different name from equivalent methods in
>>>>>>>> internal classes we plan to use, of course let's rename one or the other.
>>>>>>>>
>>>>>>>> Existing internal abstractions are most of the time concrete
>>>>>>>> classes and not interfaces (Replica, Slice, DocCollection,
>>>>>>>> ClusterState). Making these visible to contrib code living
>>>>>>>> elsewhere is making future refactoring hard and contrib code will most
>>>>>>>> likely end up reaching to methods it shouldn’t be using. If we define a
>>>>>>>> clean set of interfaces for plugins, I wouldn’t hesitate to break external
>>>>>>>> plugins that reach out to other internal Solr classes, but will make
>>>>>>>> everything possible to keep the API backward compatible so existing plugins
>>>>>>>> can be recompiled without change.
>>>>>>>>
>>>>>>>> *> 24 interfaces to do this is definitely over engineering*
>>>>>>>> I don’t consider the number of classes or interfaces a metric of
>>>>>>>> complexity or of engineering quality. There are sample
>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>>>>>>>> plugin implementations to serve as a base for plugin writers (and for us
>>>>>>>> defining this framework) and I believe the process is relatively simple.
>>>>>>>> Trying to do the same things with existing Solr classes might prove a lot
>>>>>>>> harder (but might be worth the effort for comparison purposes to make sure
>>>>>>>> we agree on the approach? For example, getting sister replicas of a given
>>>>>>>> replica in the proposed API is: replica.getShard()
>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>>>>>>>> .getReplicas()
>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>>>>>>>> Doing so with the internal classes likely involves getting the
>>>>>>>> DocCollection and Slice name from the Replica, then get the
>>>>>>>> DocCollection from the cluster state, there get the Slice based on
>>>>>>>> its name and finally getReplicas() from the Slice). I consider the
>>>>>>>> role of this new framework is to make life as easy as possible for writing
>>>>>>>> placement code and the like, make life easy for us to maintain it, make it
>>>>>>>> easy to write a simulation engine (should be at least an order of magnitude
>>>>>>>> simpler than the previous one), etc.
>>>>>>>>
>>>>>>>> An example regarding readability and number of interfaces: rather
>>>>>>>> than defining an enum with runtime annotation for building its instances (
>>>>>>>> Variable.Type
>>>>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>>>>>>>> and then very generic access methods, the proposal defines a specific
>>>>>>>> interface for each “variable type” (called properties
>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>>>>>>>> Rather than concatenating strings to specify the data to return from a
>>>>>>>> remote node (based on snitches
>>>>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>>>>>>>> see doc
>>>>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>>>>>>>> the proposal is explicit and strongly typed (here
>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
>>>>>>>> to get a specific system property from a node). This definitely does
>>>>>>>> increase the number of interfaces, but reduces IMO the effort to code to
>>>>>>>> these abstractions and provides a lot more compile time and IDE assistance.
>>>>>>>>
>>>>>>>> Goal is to hide all the boilerplate code and machinery (and to a
>>>>>>>> point - complexity) in the implementations of these interfaces rather than
>>>>>>>> have each plugin writer deal with the same problems.
>>>>>>>>
>>>>>>>> We’re moving from something that was complex and hard to read and
>>>>>>>> debug yet functionally extremely rich, to something simpler for us, more
>>>>>>>> demanding for users (write code rather than policy config if there's a need
>>>>>>>> for new behavior) but that should not be less "expressive" in any
>>>>>>>> significant way. One could even imagine reimplementing the former
>>>>>>>> Autoscaling config Domain Specific Language on top of these API (maybe as a
>>>>>>>> summer internship project :)
>>>>>>>>
>>>>>>>> *> This is a common mistake that we all do. When we design a
>>>>>>>> feature we think that is the most important thing.*
>>>>>>>> If by *"most important thing"* you mean investing the best
>>>>>>>> reasonable effort to do things right then yes.
>>>>>>>> If you mean trying to make a minor feature look more important and
>>>>>>>> inflated than it is, I disagree.
>>>>>>>> As a personal note, replica placement is not the aspect of
>>>>>>>> SolrCloud I'm most interested in, but the first bottleneck we hit when
>>>>>>>> pushing the scale of SolrCloud. I approach this with a state of mind "let's
>>>>>>>> do it right and get it out of the way" to move to topics I really want to
>>>>>>>> work on (around distribution in SolrCloud and the role of Overseer).
>>>>>>>> Implementing Autoscaling in a way that simplifies future refactoring (or
>>>>>>>> that does not make them harder than they already are) is therefore *very
>>>>>>>> high* on my priority list, to support modest changes (Slice to
>>>>>>>> Shard renaming) and more ambitious ones (replacing Zookeeper,
>>>>>>>> removing Overseer, you name it).
>>>>>>>>
>>>>>>>> Thanks for reading, again sorry for the long email, but I hope this
>>>>>>>> helps (at least helps the discussion),
>>>>>>>> Ilan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I don't believe we should have a set of interfaces that duplicate
>>>>>>>>> existing classes just for this functionality. This is a common mistake that
>>>>>>>>> we all do. When we design a feature we think that is the most important
>>>>>>>>> thing. We endup over designing and over engineering things. This feature
>>>>>>>>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>>>>>>>>> not require to learn a lot before even getting started. Let's try to have a
>>>>>>>>> minimal set of interfaces so that people who try to implement them do not
>>>>>>>>> have a huge learning cure.
>>>>>>>>>
>>>>>>>>> Let's try to understand the requirement
>>>>>>>>>
>>>>>>>>> - Solr wants a set of positions to place a few replicas
>>>>>>>>> - The implementation wants to know what is the current state
>>>>>>>>> of the cluster so that it can make those decisions
>>>>>>>>>
>>>>>>>>> 24 interfaces to do this is definitely over engineering
>>>>>>>>>
>>>>>>>>> —
>>>>>>>>> You are receiving this because you authored the thread.
>>>>>>>>> Reply to this email directly, view it on GitHub
>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>>>>>>>> or unsubscribe
>>>>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>>
Re: Approach for a new Autoscaling framework [ In reply to ]
To everyone and especially Gus: I think the "plugin" word in this thread
is basically a stop-word to the intent/scope of the thread. A plugin to
Solr both has been and will be nothing more than a class that's loaded
*dynamically* by a configurable name -- as opposed to a class within Solr
that isn't pluggable (*statically* referenced). Whether a class is
statically loaded or dynamically loaded, it has some sort of API to itself
where it receives and provides other abstractions provided by Solr. I
*think* what's being proposed in this thread are some better higher level
abstractions within Solr that could be used to hide implementation details
that are found in some APIs currently in Solr. Good 'ol software
engineering practices. Am I missing something?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jul 26, 2020 at 6:11 PM Varun Thacker <varun@vthacker.in> wrote:

>
>
> On Sun, Jul 26, 2020 at 1:05 AM Ilan Ginzburg <ilansolr@gmail.com> wrote:
>
>> Varun, you're correct.
>> This PR was built based on what's needed for creation (easiest starting
>> point for me and likely most urgent need). It's still totally WIP and
>> following steps include building the API required for move and other
>> placement based needs, then also everything related to triggers (see the
>> Jira).
>>
>> Collection API commands (Solr provided implementation, not a plug-in)
>> will build the requests they need, then call the plug-in (custom one or a
>> defaut one), and use the returned "work items" (more types of work items
>> will be introduced of course) to do the job (know where to place or where
>> to move or what to remove or add etc.)
>>
>
> This sounds perfect!
>
> I'd be interested to see how can we use SamplePluginMinimizeCores for say
> create collection but use FooPluginMinimizeLoad for add-replica
>
>>
>> Ilan
>>
>> Le dim. 26 juil. 2020 à 04:13, Varun Thacker <varun@vthacker.in> a
>> écrit :
>>
>>> Hi Ilan,
>>>
>>> I like where we're going with
>>> https://github.com/apache/lucene-solr/pull/1684 . Correct me if I am
>>> wrong, but my understanding of this PR is we're defining the interfaces for
>>> creating policies
>>>
>>> What's not clear to me is how will existing collection APIs like
>>> create-collections/add-replica etc make use of it? Is that something that
>>> has been discussed somewhere that I could read up on?
>>>
>>>
>>>
>>> On Sat, Jul 25, 2020 at 2:03 PM Ilan Ginzburg <ilansolr@gmail.com>
>>> wrote:
>>>
>>>> Thanks Gus!
>>>> This makes a lot of sense but significantly increases IMO the scope and
>>>> effort to define an "Autoscaling" framework interface.
>>>>
>>>> I'd be happy to try to see what concepts could be shared and how a
>>>> generic plugin facade could be defined.
>>>>
>>>> What are the other types of plugins that would share such a unified
>>>> approach? Do they already exist under another form or are just projects at
>>>> this stage, like Autoscaling plugins?
>>>>
>>>> But... Assuming this is the first "facade" layer to be defined between
>>>> Solr and external code, it might be hard to make it generic and get it
>>>> right. There's value in starting simple, understanding the tradeoffs and
>>>> generalizing later.
>>>>
>>>> Also I'd like to make sure we're not paying a performance "genericity
>>>> tax" in Autoscaling for unneeded features.
>>>>
>>>> Ilan
>>>>
>>>> Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.heck@gmail.com> a écrit :
>>>>
>>>>> Scanned through the PR and read some of this thread. I likely have
>>>>> missed much other discussion, so forgive me if I'm dredging up somethings
>>>>> that are already discussed elsewhere.
>>>>>
>>>>> The idea of designing the interfaces defining what information is
>>>>> available seems good here, but I worry that it's too auto-scaling focused.
>>>>> In my imagination, I would see solr having a standard informational
>>>>> interface that is useful to any plugin of any sort. Autoscaling should be
>>>>> leveraging that and we should be enhancing that to enable autoscaling. The
>>>>> current state of the system is one key type of information, but another
>>>>> type of information that should exist within solr and be exposed to plugins
>>>>> (including autoscaling) is events. When a new node joins there should be an
>>>>> event for example so that plugins can listen for that rather than
>>>>> incessantly polling and comparing the list of 100 nodes to a cached list of
>>>>> 100 nodes.
>>>>>
>>>>> In the PR I see a bunch of classes all off in a separate package,
>>>>> which looks like an autoscaling fiefdom which will be tempted if not forced
>>>>> to duplicate lots of stuff relative to other plugins and/or core.
>>>>>
>>>>> As a side note I would think the metrics system could be a plugin that
>>>>> leverages the same set of informational interfaces....
>>>>>
>>>>> So there should be 3 parts to this as I imagine it.
>>>>>
>>>>> 1) Enhancements to the **plugin system** that make information about
>>>>> the cluster available solr to ALL plugins
>>>>> 2) Enhancements to the **plugin system** API's provided to ALL plugins
>>>>> that allow them to mutate solr safely.
>>>>> 3) A plugin that we intend to support for our users currently using
>>>>> auto scaling utilizes the enhanced information to provide a similar level
>>>>> of functionality as is *promised* by our current documentation of
>>>>> autoscaling, there might be some gaps or differences but we should be
>>>>> discussing what they are and providing recommended workarounds for users
>>>>> that relied on those promises to the users. Even if there were cases where
>>>>> we failed to deliver, if there were at least some conditions under which we
>>>>> could deliver the promised functionality those should be supported. Only if
>>>>> we never were able to deliver and it never worked under any circumstance
>>>>> should we rip stuff out entirely.
>>>>>
>>>>> Implicit in the above is the concept that there should be a facade
>>>>> between plugins and the core of solr.
>>>>>
>>>>> WRT #1 which will necessarily involve information collected from
>>>>> remote nodes, we need to be designing that thinking about what
>>>>> informational guarantees it provides. Latency, consistency, delivery, etc.
>>>>> We also need to think about what is exposed in a read-only fashion vs what
>>>>> plugins might write back to solr. Certainly there will be a lot of
>>>>> information that most plugins ignore, and we might consider having
>>>>> groupings of information and interfaces or annotations that indicate what
>>>>> info is provided, but the simplest default state is to just give plugins a
>>>>> reference to a class that they can use to drill into information about the
>>>>> cluster as needed. (SolrInformationBooth? ... or less tongue in cheek...
>>>>> enhance SolrInfoBean? )
>>>>>
>>>>> Finally a fourth thing that occurs to me as I write is we need to
>>>>> consider what information one plugin might make available to the rest of
>>>>> the solr plugins. This might come later, and is hard because it's very hard
>>>>> to anticipate what info might be generated by unknown plugins in the future.
>>>>>
>>>>> So some humorous, not seriously suggested but hopefully memorable
>>>>> class names encapsulating the concepts:
>>>>>
>>>>> SolrInformationBooth (place to query)
>>>>> SolrLoudspeaker (event announcements)
>>>>> SolrControlLevers (mutate solr cluster)
>>>>> SolrPluginFacebookPage (info published by the plugin that others can
>>>>> watch)
>>>>>
>>>>> The "facade" provided to plugins by the plugin system should grow and
>>>>> expand such that more and more plugins can rely on it. This effort should
>>>>> grow it enough to move autoscaling onto it without dropping (much)
>>>>> functionality that we've previously published.
>>>>>
>>>>> -Gus
>>>>>
>>>>> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan.asf@cominvent.com>
>>>>> wrote:
>>>>>
>>>>>> Not clear to me what type of "alternative proposal" you're thinking
>>>>>> of Jan
>>>>>>
>>>>>>
>>>>>> That would be the responsibility of Noble and others who have
>>>>>> concerns to detail - and try convince other peers.
>>>>>> It’s hard for me as a spectator to know whether to agree with Noble
>>>>>> without a clear picture of what the alternative API or approach would look
>>>>>> like.
>>>>>> I’m often a fan of loosely typed APIs since they tend to cause less
>>>>>> boilerplate code, but strong typing may indeed be a sound choice in this
>>>>>> API.
>>>>>>
>>>>>> Jan Høydahl
>>>>>>
>>>>>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilansolr@gmail.com>:
>>>>>>
>>>>>> ?
>>>>>> In my opinion we have to (and therefore will) ship at least a basic
>>>>>> prod ready implementation on top of the API that does simple things (not
>>>>>> sure about rack, but for example balance cores and disk size without co
>>>>>> locating replicas of same shard on same node).
>>>>>> Without such an implementation, I suspect adoption will be low.
>>>>>> Moreover, it's always a lot more friendly to start coding from a working
>>>>>> example than from scratch.
>>>>>>
>>>>>> Not clear to me what type of "alternative proposal" you're thinking
>>>>>> of Jan. Alternative API proposal? Alternative approach to replace
>>>>>> Autoscaling?
>>>>>>
>>>>>> Ilan
>>>>>>
>>>>>> Ilan
>>>>>>
>>>>>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan.asf@cominvent.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Important discussion indeed.
>>>>>>>
>>>>>>> I don’t have time to dive deep into the PR or make up my mind
>>>>>>> whether there is a simpler and more future proof way of designing these
>>>>>>> APIs. But I understand that autoscaling is a complex beast and it is
>>>>>>> important we get it right.
>>>>>>>
>>>>>>> One question regarding having to write code vs config. Is the plan
>>>>>>> to ship some very simple light weight default placement rules ootb that
>>>>>>> gives 80% of users what they need with simple config, or would every user
>>>>>>> need to write code to e.g. spread replicas across hosts/racks? I’d be
>>>>>>> interested in seeing an alternative proposal laid out, perhaps not in code
>>>>>>> but with a design that can be compared and discussed.
>>>>>>>
>>>>>>> Jan Høydahl
>>>>>>>
>>>>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <
>>>>>>> houstonputman@gmail.com>:
>>>>>>>
>>>>>>> ?
>>>>>>> I think this is a valid thing to discuss on the dev list, since this
>>>>>>> isn't just about code comments.
>>>>>>> It seems to me that Ilan wants to discuss the philosophy around how
>>>>>>> to design plugins and the interfaces in Solr which the plugins will talk to.
>>>>>>> This is broad and affects much more than just the Autoscaling
>>>>>>> framework.
>>>>>>>
>>>>>>> As a community & product, we have so far agreed that Solr should be
>>>>>>> lighter weight and additional features should live in plugins that are
>>>>>>> managed separately from Solr itself.
>>>>>>> At that point we need to think about the lifetime and support of
>>>>>>> these plugins. People love to refactor stuff in the solr core, which before
>>>>>>> plugins wasn't a large issue.
>>>>>>> However if we are now intending for many customers to rely on
>>>>>>> plugins, then we need to come up with standards and guarantees so that
>>>>>>> these plugins don't:
>>>>>>>
>>>>>>> - Stall people from upgrading Solr (minor or major versions)
>>>>>>> - Hinder the development of Solr Core
>>>>>>> - Cause us more headaches trying to keep multiple repos of
>>>>>>> plugins up to date with recent versions of Solr
>>>>>>>
>>>>>>>
>>>>>>> I am not completely sure where I stand right now, but this is
>>>>>>> definitely something that we should be thinking about when migrating all of
>>>>>>> this functionality to plugins.
>>>>>>>
>>>>>>> - Houston
>>>>>>>
>>>>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <
>>>>>>> ishan@apache.org> wrote:
>>>>>>>
>>>>>>>> I think we should move the discussion back to the PR because it has
>>>>>>>> more context and inline comments are possible. Having this discussion in 4
>>>>>>>> places (jira, pr, slack and dev list is very hard to keep track of).
>>>>>>>>
>>>>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> [I’m moving a discussion from the PR
>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>>>>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev
>>>>>>>>> list for a wider audience. This is about replacing the now (in master) gone
>>>>>>>>> Autoscaling framework with a way for clients to write their customized
>>>>>>>>> placement code]
>>>>>>>>>
>>>>>>>>> It took me a long time to write this mail and it's quite long,
>>>>>>>>> sorry.
>>>>>>>>> Please anybody interested in the future of Autoscaling (not only
>>>>>>>>> those I cc'ed) do read it and provide feedback. Very impacting decisions
>>>>>>>>> have to be made now.
>>>>>>>>>
>>>>>>>>> Thanks Noble for your feedback.
>>>>>>>>> I believe it is important that we are aligned on what we build
>>>>>>>>> here, esp. at the early defining stages (now).
>>>>>>>>>
>>>>>>>>> Let me try to elaborate on your concerns and provide in general
>>>>>>>>> the rationale behind the approach.
>>>>>>>>>
>>>>>>>>> *> Anyone who wishes to implement this should not require to learn
>>>>>>>>> a lot before even getting started*
>>>>>>>>> For somebody who knows Solr (what is a Node, Collection, Shard,
>>>>>>>>> Replica) and basic notions related to Autoscaling (getting variables
>>>>>>>>> representing current state to make decisions), there’s not much to learn.
>>>>>>>>> The framework uses the same concepts, often with the same names.
>>>>>>>>>
>>>>>>>>> *> I don't believe we should have a set of interfaces that
>>>>>>>>> duplicate existing classes just for this functionality.*
>>>>>>>>> Where appropriate we can have existing classes be the
>>>>>>>>> implementations for these interfaces and be passed to the plugins, that
>>>>>>>>> would be perfectly ok. The proposal doesn’t include implementations at this
>>>>>>>>> stage, therefore there’s no duplication, or not yet... (we must get the
>>>>>>>>> interfaces right and agreed upon before implementation). If some interface
>>>>>>>>> methods in the proposal have a different name from equivalent methods in
>>>>>>>>> internal classes we plan to use, of course let's rename one or the other.
>>>>>>>>>
>>>>>>>>> Existing internal abstractions are most of the time concrete
>>>>>>>>> classes and not interfaces (Replica, Slice, DocCollection,
>>>>>>>>> ClusterState). Making these visible to contrib code living
>>>>>>>>> elsewhere is making future refactoring hard and contrib code will most
>>>>>>>>> likely end up reaching to methods it shouldn’t be using. If we define a
>>>>>>>>> clean set of interfaces for plugins, I wouldn’t hesitate to break external
>>>>>>>>> plugins that reach out to other internal Solr classes, but will make
>>>>>>>>> everything possible to keep the API backward compatible so existing plugins
>>>>>>>>> can be recompiled without change.
>>>>>>>>>
>>>>>>>>> *> 24 interfaces to do this is definitely over engineering*
>>>>>>>>> I don’t consider the number of classes or interfaces a metric of
>>>>>>>>> complexity or of engineering quality. There are sample
>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>>>>>>>>> plugin implementations to serve as a base for plugin writers (and for us
>>>>>>>>> defining this framework) and I believe the process is relatively simple.
>>>>>>>>> Trying to do the same things with existing Solr classes might prove a lot
>>>>>>>>> harder (but might be worth the effort for comparison purposes to make sure
>>>>>>>>> we agree on the approach? For example, getting sister replicas of a given
>>>>>>>>> replica in the proposed API is: replica.getShard()
>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>>>>>>>>> .getReplicas()
>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>>>>>>>>> Doing so with the internal classes likely involves getting the
>>>>>>>>> DocCollection and Slice name from the Replica, then get the
>>>>>>>>> DocCollection from the cluster state, there get the Slice based
>>>>>>>>> on its name and finally getReplicas() from the Slice). I consider
>>>>>>>>> the role of this new framework is to make life as easy as possible for
>>>>>>>>> writing placement code and the like, make life easy for us to maintain it,
>>>>>>>>> make it easy to write a simulation engine (should be at least an order of
>>>>>>>>> magnitude simpler than the previous one), etc.
>>>>>>>>>
>>>>>>>>> An example regarding readability and number of interfaces: rather
>>>>>>>>> than defining an enum with runtime annotation for building its instances (
>>>>>>>>> Variable.Type
>>>>>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>>>>>>>>> and then very generic access methods, the proposal defines a specific
>>>>>>>>> interface for each “variable type” (called properties
>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>>>>>>>>> Rather than concatenating strings to specify the data to return from a
>>>>>>>>> remote node (based on snitches
>>>>>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>>>>>>>>> see doc
>>>>>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>>>>>>>>> the proposal is explicit and strongly typed (here
>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
>>>>>>>>> to get a specific system property from a node). This definitely does
>>>>>>>>> increase the number of interfaces, but reduces IMO the effort to code to
>>>>>>>>> these abstractions and provides a lot more compile time and IDE assistance.
>>>>>>>>>
>>>>>>>>> Goal is to hide all the boilerplate code and machinery (and to a
>>>>>>>>> point - complexity) in the implementations of these interfaces rather than
>>>>>>>>> have each plugin writer deal with the same problems.
>>>>>>>>>
>>>>>>>>> We’re moving from something that was complex and hard to read and
>>>>>>>>> debug yet functionally extremely rich, to something simpler for us, more
>>>>>>>>> demanding for users (write code rather than policy config if there's a need
>>>>>>>>> for new behavior) but that should not be less "expressive" in any
>>>>>>>>> significant way. One could even imagine reimplementing the former
>>>>>>>>> Autoscaling config Domain Specific Language on top of these API (maybe as a
>>>>>>>>> summer internship project :)
>>>>>>>>>
>>>>>>>>> *> This is a common mistake that we all do. When we design a
>>>>>>>>> feature we think that is the most important thing.*
>>>>>>>>> If by *"most important thing"* you mean investing the best
>>>>>>>>> reasonable effort to do things right then yes.
>>>>>>>>> If you mean trying to make a minor feature look more important and
>>>>>>>>> inflated than it is, I disagree.
>>>>>>>>> As a personal note, replica placement is not the aspect of
>>>>>>>>> SolrCloud I'm most interested in, but the first bottleneck we hit when
>>>>>>>>> pushing the scale of SolrCloud. I approach this with a state of mind "let's
>>>>>>>>> do it right and get it out of the way" to move to topics I really want to
>>>>>>>>> work on (around distribution in SolrCloud and the role of Overseer).
>>>>>>>>> Implementing Autoscaling in a way that simplifies future refactoring (or
>>>>>>>>> that does not make them harder than they already are) is therefore *very
>>>>>>>>> high* on my priority list, to support modest changes (Slice to
>>>>>>>>> Shard renaming) and more ambitious ones (replacing Zookeeper,
>>>>>>>>> removing Overseer, you name it).
>>>>>>>>>
>>>>>>>>> Thanks for reading, again sorry for the long email, but I hope
>>>>>>>>> this helps (at least helps the discussion),
>>>>>>>>> Ilan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I don't believe we should have a set of interfaces that duplicate
>>>>>>>>>> existing classes just for this functionality. This is a common mistake that
>>>>>>>>>> we all do. When we design a feature we think that is the most important
>>>>>>>>>> thing. We endup over designing and over engineering things. This feature
>>>>>>>>>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>>>>>>>>>> not require to learn a lot before even getting started. Let's try to have a
>>>>>>>>>> minimal set of interfaces so that people who try to implement them do not
>>>>>>>>>> have a huge learning cure.
>>>>>>>>>>
>>>>>>>>>> Let's try to understand the requirement
>>>>>>>>>>
>>>>>>>>>> - Solr wants a set of positions to place a few replicas
>>>>>>>>>> - The implementation wants to know what is the current state
>>>>>>>>>> of the cluster so that it can make those decisions
>>>>>>>>>>
>>>>>>>>>> 24 interfaces to do this is definitely over engineering
>>>>>>>>>>
>>>>>>>>>> —
>>>>>>>>>> You are receiving this because you authored the thread.
>>>>>>>>>> Reply to this email directly, view it on GitHub
>>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>>>>>>>>> or unsubscribe
>>>>>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> http://www.needhamsoftware.com (work)
>>>>> http://www.the111shift.com (play)
>>>>>
>>>>
Re: Approach for a new Autoscaling framework [ In reply to ]
Good 'ol software engineering practices is certainly the core of what I
intend, though I am also raising the question of whether or not we want to
consign some defined set of things that plug-in to the far side of those
API's, and whether or not that entails a more explicit notion of what
constitutes a "plugin".

On Mon, Jul 27, 2020 at 9:18 AM David Smiley <dsmiley@apache.org> wrote:

> To everyone and especially Gus: I think the "plugin" word in this thread
> is basically a stop-word to the intent/scope of the thread. A plugin to
> Solr both has been and will be nothing more than a class that's loaded
> *dynamically* by a configurable name -- as opposed to a class within Solr
> that isn't pluggable (*statically* referenced). Whether a class is
> statically loaded or dynamically loaded, it has some sort of API to itself
> where it receives and provides other abstractions provided by Solr. I
> *think* what's being proposed in this thread are some better higher level
> abstractions within Solr that could be used to hide implementation details
> that are found in some APIs currently in Solr. Good 'ol software
> engineering practices. Am I missing something?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Jul 26, 2020 at 6:11 PM Varun Thacker <varun@vthacker.in> wrote:
>
>>
>>
>> On Sun, Jul 26, 2020 at 1:05 AM Ilan Ginzburg <ilansolr@gmail.com> wrote:
>>
>>> Varun, you're correct.
>>> This PR was built based on what's needed for creation (easiest starting
>>> point for me and likely most urgent need). It's still totally WIP and
>>> following steps include building the API required for move and other
>>> placement based needs, then also everything related to triggers (see the
>>> Jira).
>>>
>>> Collection API commands (Solr provided implementation, not a plug-in)
>>> will build the requests they need, then call the plug-in (custom one or a
>>> defaut one), and use the returned "work items" (more types of work items
>>> will be introduced of course) to do the job (know where to place or where
>>> to move or what to remove or add etc.)
>>>
>>
>> This sounds perfect!
>>
>> I'd be interested to see how can we use SamplePluginMinimizeCores for say
>> create collection but use FooPluginMinimizeLoad for add-replica
>>
>>>
>>> Ilan
>>>
>>> Le dim. 26 juil. 2020 à 04:13, Varun Thacker <varun@vthacker.in> a
>>> écrit :
>>>
>>>> Hi Ilan,
>>>>
>>>> I like where we're going with
>>>> https://github.com/apache/lucene-solr/pull/1684 . Correct me if I am
>>>> wrong, but my understanding of this PR is we're defining the interfaces for
>>>> creating policies
>>>>
>>>> What's not clear to me is how will existing collection APIs like
>>>> create-collections/add-replica etc make use of it? Is that something that
>>>> has been discussed somewhere that I could read up on?
>>>>
>>>>
>>>>
>>>> On Sat, Jul 25, 2020 at 2:03 PM Ilan Ginzburg <ilansolr@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Gus!
>>>>> This makes a lot of sense but significantly increases IMO the scope
>>>>> and effort to define an "Autoscaling" framework interface.
>>>>>
>>>>> I'd be happy to try to see what concepts could be shared and how a
>>>>> generic plugin facade could be defined.
>>>>>
>>>>> What are the other types of plugins that would share such a unified
>>>>> approach? Do they already exist under another form or are just projects at
>>>>> this stage, like Autoscaling plugins?
>>>>>
>>>>> But... Assuming this is the first "facade" layer to be defined between
>>>>> Solr and external code, it might be hard to make it generic and get it
>>>>> right. There's value in starting simple, understanding the tradeoffs and
>>>>> generalizing later.
>>>>>
>>>>> Also I'd like to make sure we're not paying a performance "genericity
>>>>> tax" in Autoscaling for unneeded features.
>>>>>
>>>>> Ilan
>>>>>
>>>>> Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.heck@gmail.com> a écrit :
>>>>>
>>>>>> Scanned through the PR and read some of this thread. I likely have
>>>>>> missed much other discussion, so forgive me if I'm dredging up somethings
>>>>>> that are already discussed elsewhere.
>>>>>>
>>>>>> The idea of designing the interfaces defining what information is
>>>>>> available seems good here, but I worry that it's too auto-scaling focused.
>>>>>> In my imagination, I would see solr having a standard informational
>>>>>> interface that is useful to any plugin of any sort. Autoscaling should be
>>>>>> leveraging that and we should be enhancing that to enable autoscaling. The
>>>>>> current state of the system is one key type of information, but another
>>>>>> type of information that should exist within solr and be exposed to plugins
>>>>>> (including autoscaling) is events. When a new node joins there should be an
>>>>>> event for example so that plugins can listen for that rather than
>>>>>> incessantly polling and comparing the list of 100 nodes to a cached list of
>>>>>> 100 nodes.
>>>>>>
>>>>>> In the PR I see a bunch of classes all off in a separate package,
>>>>>> which looks like an autoscaling fiefdom which will be tempted if not forced
>>>>>> to duplicate lots of stuff relative to other plugins and/or core.
>>>>>>
>>>>>> As a side note I would think the metrics system could be a plugin
>>>>>> that leverages the same set of informational interfaces....
>>>>>>
>>>>>> So there should be 3 parts to this as I imagine it.
>>>>>>
>>>>>> 1) Enhancements to the **plugin system** that make information about
>>>>>> the cluster available solr to ALL plugins
>>>>>> 2) Enhancements to the **plugin system** API's provided to ALL
>>>>>> plugins that allow them to mutate solr safely.
>>>>>> 3) A plugin that we intend to support for our users currently using
>>>>>> auto scaling utilizes the enhanced information to provide a similar level
>>>>>> of functionality as is *promised* by our current documentation of
>>>>>> autoscaling, there might be some gaps or differences but we should be
>>>>>> discussing what they are and providing recommended workarounds for users
>>>>>> that relied on those promises to the users. Even if there were cases where
>>>>>> we failed to deliver, if there were at least some conditions under which we
>>>>>> could deliver the promised functionality those should be supported. Only if
>>>>>> we never were able to deliver and it never worked under any circumstance
>>>>>> should we rip stuff out entirely.
>>>>>>
>>>>>> Implicit in the above is the concept that there should be a facade
>>>>>> between plugins and the core of solr.
>>>>>>
>>>>>> WRT #1 which will necessarily involve information collected from
>>>>>> remote nodes, we need to be designing that thinking about what
>>>>>> informational guarantees it provides. Latency, consistency, delivery, etc.
>>>>>> We also need to think about what is exposed in a read-only fashion vs what
>>>>>> plugins might write back to solr. Certainly there will be a lot of
>>>>>> information that most plugins ignore, and we might consider having
>>>>>> groupings of information and interfaces or annotations that indicate what
>>>>>> info is provided, but the simplest default state is to just give plugins a
>>>>>> reference to a class that they can use to drill into information about the
>>>>>> cluster as needed. (SolrInformationBooth? ... or less tongue in cheek...
>>>>>> enhance SolrInfoBean? )
>>>>>>
>>>>>> Finally a fourth thing that occurs to me as I write is we need to
>>>>>> consider what information one plugin might make available to the rest of
>>>>>> the solr plugins. This might come later, and is hard because it's very hard
>>>>>> to anticipate what info might be generated by unknown plugins in the future.
>>>>>>
>>>>>> So some humorous, not seriously suggested but hopefully memorable
>>>>>> class names encapsulating the concepts:
>>>>>>
>>>>>> SolrInformationBooth (place to query)
>>>>>> SolrLoudspeaker (event announcements)
>>>>>> SolrControlLevers (mutate solr cluster)
>>>>>> SolrPluginFacebookPage (info published by the plugin that others can
>>>>>> watch)
>>>>>>
>>>>>> The "facade" provided to plugins by the plugin system should grow and
>>>>>> expand such that more and more plugins can rely on it. This effort should
>>>>>> grow it enough to move autoscaling onto it without dropping (much)
>>>>>> functionality that we've previously published.
>>>>>>
>>>>>> -Gus
>>>>>>
>>>>>> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan.asf@cominvent.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Not clear to me what type of "alternative proposal" you're thinking
>>>>>>> of Jan
>>>>>>>
>>>>>>>
>>>>>>> That would be the responsibility of Noble and others who have
>>>>>>> concerns to detail - and try convince other peers.
>>>>>>> It’s hard for me as a spectator to know whether to agree with Noble
>>>>>>> without a clear picture of what the alternative API or approach would look
>>>>>>> like.
>>>>>>> I’m often a fan of loosely typed APIs since they tend to cause less
>>>>>>> boilerplate code, but strong typing may indeed be a sound choice in this
>>>>>>> API.
>>>>>>>
>>>>>>> Jan Høydahl
>>>>>>>
>>>>>>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilansolr@gmail.com>:
>>>>>>>
>>>>>>> ?
>>>>>>> In my opinion we have to (and therefore will) ship at least a basic
>>>>>>> prod ready implementation on top of the API that does simple things (not
>>>>>>> sure about rack, but for example balance cores and disk size without co
>>>>>>> locating replicas of same shard on same node).
>>>>>>> Without such an implementation, I suspect adoption will be low.
>>>>>>> Moreover, it's always a lot more friendly to start coding from a working
>>>>>>> example than from scratch.
>>>>>>>
>>>>>>> Not clear to me what type of "alternative proposal" you're thinking
>>>>>>> of Jan. Alternative API proposal? Alternative approach to replace
>>>>>>> Autoscaling?
>>>>>>>
>>>>>>> Ilan
>>>>>>>
>>>>>>> Ilan
>>>>>>>
>>>>>>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan.asf@cominvent.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Important discussion indeed.
>>>>>>>>
>>>>>>>> I don’t have time to dive deep into the PR or make up my mind
>>>>>>>> whether there is a simpler and more future proof way of designing these
>>>>>>>> APIs. But I understand that autoscaling is a complex beast and it is
>>>>>>>> important we get it right.
>>>>>>>>
>>>>>>>> One question regarding having to write code vs config. Is the plan
>>>>>>>> to ship some very simple light weight default placement rules ootb that
>>>>>>>> gives 80% of users what they need with simple config, or would every user
>>>>>>>> need to write code to e.g. spread replicas across hosts/racks? I’d be
>>>>>>>> interested in seeing an alternative proposal laid out, perhaps not in code
>>>>>>>> but with a design that can be compared and discussed.
>>>>>>>>
>>>>>>>> Jan Høydahl
>>>>>>>>
>>>>>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <
>>>>>>>> houstonputman@gmail.com>:
>>>>>>>>
>>>>>>>> ?
>>>>>>>> I think this is a valid thing to discuss on the dev list, since
>>>>>>>> this isn't just about code comments.
>>>>>>>> It seems to me that Ilan wants to discuss the philosophy around how
>>>>>>>> to design plugins and the interfaces in Solr which the plugins will talk to.
>>>>>>>> This is broad and affects much more than just the Autoscaling
>>>>>>>> framework.
>>>>>>>>
>>>>>>>> As a community & product, we have so far agreed that Solr should be
>>>>>>>> lighter weight and additional features should live in plugins that are
>>>>>>>> managed separately from Solr itself.
>>>>>>>> At that point we need to think about the lifetime and support of
>>>>>>>> these plugins. People love to refactor stuff in the solr core, which before
>>>>>>>> plugins wasn't a large issue.
>>>>>>>> However if we are now intending for many customers to rely on
>>>>>>>> plugins, then we need to come up with standards and guarantees so that
>>>>>>>> these plugins don't:
>>>>>>>>
>>>>>>>> - Stall people from upgrading Solr (minor or major versions)
>>>>>>>> - Hinder the development of Solr Core
>>>>>>>> - Cause us more headaches trying to keep multiple repos of
>>>>>>>> plugins up to date with recent versions of Solr
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not completely sure where I stand right now, but this is
>>>>>>>> definitely something that we should be thinking about when migrating all of
>>>>>>>> this functionality to plugins.
>>>>>>>>
>>>>>>>> - Houston
>>>>>>>>
>>>>>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <
>>>>>>>> ishan@apache.org> wrote:
>>>>>>>>
>>>>>>>>> I think we should move the discussion back to the PR because it
>>>>>>>>> has more context and inline comments are possible. Having this discussion
>>>>>>>>> in 4 places (jira, pr, slack and dev list is very hard to keep track of).
>>>>>>>>>
>>>>>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilansolr@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> [I’m moving a discussion from the PR
>>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>>>>>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev
>>>>>>>>>> list for a wider audience. This is about replacing the now (in master) gone
>>>>>>>>>> Autoscaling framework with a way for clients to write their customized
>>>>>>>>>> placement code]
>>>>>>>>>>
>>>>>>>>>> It took me a long time to write this mail and it's quite long,
>>>>>>>>>> sorry.
>>>>>>>>>> Please anybody interested in the future of Autoscaling (not only
>>>>>>>>>> those I cc'ed) do read it and provide feedback. Very impacting decisions
>>>>>>>>>> have to be made now.
>>>>>>>>>>
>>>>>>>>>> Thanks Noble for your feedback.
>>>>>>>>>> I believe it is important that we are aligned on what we build
>>>>>>>>>> here, esp. at the early defining stages (now).
>>>>>>>>>>
>>>>>>>>>> Let me try to elaborate on your concerns and provide in general
>>>>>>>>>> the rationale behind the approach.
>>>>>>>>>>
>>>>>>>>>> *> Anyone who wishes to implement this should not require to
>>>>>>>>>> learn a lot before even getting started*
>>>>>>>>>> For somebody who knows Solr (what is a Node, Collection, Shard,
>>>>>>>>>> Replica) and basic notions related to Autoscaling (getting variables
>>>>>>>>>> representing current state to make decisions), there’s not much to learn.
>>>>>>>>>> The framework uses the same concepts, often with the same names.
>>>>>>>>>>
>>>>>>>>>> *> I don't believe we should have a set of interfaces that
>>>>>>>>>> duplicate existing classes just for this functionality.*
>>>>>>>>>> Where appropriate we can have existing classes be the
>>>>>>>>>> implementations for these interfaces and be passed to the plugins, that
>>>>>>>>>> would be perfectly ok. The proposal doesn’t include implementations at this
>>>>>>>>>> stage, therefore there’s no duplication, or not yet... (we must get the
>>>>>>>>>> interfaces right and agreed upon before implementation). If some interface
>>>>>>>>>> methods in the proposal have a different name from equivalent methods in
>>>>>>>>>> internal classes we plan to use, of course let's rename one or the other.
>>>>>>>>>>
>>>>>>>>>> Existing internal abstractions are most of the time concrete
>>>>>>>>>> classes and not interfaces (Replica, Slice, DocCollection,
>>>>>>>>>> ClusterState). Making these visible to contrib code living
>>>>>>>>>> elsewhere is making future refactoring hard and contrib code will most
>>>>>>>>>> likely end up reaching to methods it shouldn’t be using. If we define a
>>>>>>>>>> clean set of interfaces for plugins, I wouldn’t hesitate to break external
>>>>>>>>>> plugins that reach out to other internal Solr classes, but will make
>>>>>>>>>> everything possible to keep the API backward compatible so existing plugins
>>>>>>>>>> can be recompiled without change.
>>>>>>>>>>
>>>>>>>>>> *> 24 interfaces to do this is definitely over engineering*
>>>>>>>>>> I don’t consider the number of classes or interfaces a metric of
>>>>>>>>>> complexity or of engineering quality. There are sample
>>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>>>>>>>>>> plugin implementations to serve as a base for plugin writers (and for us
>>>>>>>>>> defining this framework) and I believe the process is relatively simple.
>>>>>>>>>> Trying to do the same things with existing Solr classes might prove a lot
>>>>>>>>>> harder (but might be worth the effort for comparison purposes to make sure
>>>>>>>>>> we agree on the approach? For example, getting sister replicas of a given
>>>>>>>>>> replica in the proposed API is: replica.getShard()
>>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>>>>>>>>>> .getReplicas()
>>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>>>>>>>>>> Doing so with the internal classes likely involves getting the
>>>>>>>>>> DocCollection and Slice name from the Replica, then get the
>>>>>>>>>> DocCollection from the cluster state, there get the Slice based
>>>>>>>>>> on its name and finally getReplicas() from the Slice). I
>>>>>>>>>> consider the role of this new framework is to make life as easy as possible
>>>>>>>>>> for writing placement code and the like, make life easy for us to maintain
>>>>>>>>>> it, make it easy to write a simulation engine (should be at least an order
>>>>>>>>>> of magnitude simpler than the previous one), etc.
>>>>>>>>>>
>>>>>>>>>> An example regarding readability and number of interfaces: rather
>>>>>>>>>> than defining an enum with runtime annotation for building its instances (
>>>>>>>>>> Variable.Type
>>>>>>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>>>>>>>>>> and then very generic access methods, the proposal defines a specific
>>>>>>>>>> interface for each “variable type” (called properties
>>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>>>>>>>>>> Rather than concatenating strings to specify the data to return from a
>>>>>>>>>> remote node (based on snitches
>>>>>>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>>>>>>>>>> see doc
>>>>>>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>>>>>>>>>> the proposal is explicit and strongly typed (here
>>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> example
>>>>>>>>>> to get a specific system property from a node). This definitely does
>>>>>>>>>> increase the number of interfaces, but reduces IMO the effort to code to
>>>>>>>>>> these abstractions and provides a lot more compile time and IDE assistance.
>>>>>>>>>>
>>>>>>>>>> Goal is to hide all the boilerplate code and machinery (and to a
>>>>>>>>>> point - complexity) in the implementations of these interfaces rather than
>>>>>>>>>> have each plugin writer deal with the same problems.
>>>>>>>>>>
>>>>>>>>>> We’re moving from something that was complex and hard to read and
>>>>>>>>>> debug yet functionally extremely rich, to something simpler for us, more
>>>>>>>>>> demanding for users (write code rather than policy config if there's a need
>>>>>>>>>> for new behavior) but that should not be less "expressive" in any
>>>>>>>>>> significant way. One could even imagine reimplementing the former
>>>>>>>>>> Autoscaling config Domain Specific Language on top of these API (maybe as a
>>>>>>>>>> summer internship project :)
>>>>>>>>>>
>>>>>>>>>> *> This is a common mistake that we all do. When we design a
>>>>>>>>>> feature we think that is the most important thing.*
>>>>>>>>>> If by *"most important thing"* you mean investing the best
>>>>>>>>>> reasonable effort to do things right then yes.
>>>>>>>>>> If you mean trying to make a minor feature look more important
>>>>>>>>>> and inflated than it is, I disagree.
>>>>>>>>>> As a personal note, replica placement is not the aspect of
>>>>>>>>>> SolrCloud I'm most interested in, but the first bottleneck we hit when
>>>>>>>>>> pushing the scale of SolrCloud. I approach this with a state of mind "let's
>>>>>>>>>> do it right and get it out of the way" to move to topics I really want to
>>>>>>>>>> work on (around distribution in SolrCloud and the role of Overseer).
>>>>>>>>>> Implementing Autoscaling in a way that simplifies future refactoring (or
>>>>>>>>>> that does not make them harder than they already are) is therefore *very
>>>>>>>>>> high* on my priority list, to support modest changes (Slice to
>>>>>>>>>> Shard renaming) and more ambitious ones (replacing Zookeeper,
>>>>>>>>>> removing Overseer, you name it).
>>>>>>>>>>
>>>>>>>>>> Thanks for reading, again sorry for the long email, but I hope
>>>>>>>>>> this helps (at least helps the discussion),
>>>>>>>>>> Ilan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notifications@github.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I don't believe we should have a set of interfaces that
>>>>>>>>>>> duplicate existing classes just for this functionality. This is a common
>>>>>>>>>>> mistake that we all do. When we design a feature we think that is the most
>>>>>>>>>>> important thing. We endup over designing and over engineering things. This
>>>>>>>>>>> feature will remain a tiny part of Solr. Anyone who wishes to implement
>>>>>>>>>>> this should not require to learn a lot before even getting started. Let's
>>>>>>>>>>> try to have a minimal set of interfaces so that people who try to implement
>>>>>>>>>>> them do not have a huge learning cure.
>>>>>>>>>>>
>>>>>>>>>>> Let's try to understand the requirement
>>>>>>>>>>>
>>>>>>>>>>> - Solr wants a set of positions to place a few replicas
>>>>>>>>>>> - The implementation wants to know what is the current state
>>>>>>>>>>> of the cluster so that it can make those decisions
>>>>>>>>>>>
>>>>>>>>>>> 24 interfaces to do this is definitely over engineering
>>>>>>>>>>>
>>>>>>>>>>> —
>>>>>>>>>>> You are receiving this because you authored the thread.
>>>>>>>>>>> Reply to this email directly, view it on GitHub
>>>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>>>>>>>>>> or unsubscribe
>>>>>>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> http://www.needhamsoftware.com (work)
>>>>>> http://www.the111shift.com (play)
>>>>>>
>>>>>

--
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)