Mailing List Archive

[DISCUSS][SOLR] Multiple Repos For Contributions
Solr Devs,

We've slowly been moving into a multi-repository model, and I wanted to
bring some more attention to it and have a more focused discussion. We've
recently embarked upon the acceptance of solr-operator as a distinct
repo[1] under the care of the Lucene (soon to be Solr) PMC. I expect that
there will be more cases of this as we transition additional contribs out
of core, or as more plugins, packages, and integrations mature. Some will
make sense as externally maintained code bases, but I believe other
contributions may benefit our community more as part of the Apache
Foundation.

I think there was a very insightful comment[2] made by GP regarding
adopting a similar model to Apache Commons governance, bringing attention
to it here because I fear it may have gotten lost deep in the thread. Based
on observations of Commons and a few other Apache projects with multi-repo
setups, there thankfully does not appear to be a limit on how many
repositories a PMC can maintain. The size and scope of each individual
repository can vary greatly. I see potential ideas for anything that could
be standalone and not tied to a release cycle (Admin UI, DIH, etc...), or
anything that bridges integrations between Solr and other systems (k8s,
HDFS, etc...).

The risks that new repos face are similar to the risks they would have
encountered as contrib modules, but I don't think they should dissuade us.
Each project would need to start with a champion or sponsor and a
discussion on the mailing list. From there, we can vote to accept the code,
or just the idea if there is no code yet, as a community and create the
repo. As part of a natural lifecycle, if there's not enough momentum or
adoption over time, then we can update the README and docs and "retire"
certain projects. The exact mechanisms can be undetermined for now; maybe
it's a repo rename, maybe it's marking the repo read-only, maybe it's
something else.

The Commons model is that everyone is a committer on everything. There are
other governance models, like Hadoop, with "area committers" who are
limited to the specific repositories they have contributed frequently to.
I'm not sure which model ultimately suits us better, but I think that
leveraging area committers would allow us to recognize and empower
contributors sooner and more frequently. Releases would still need to be
voted on and approved by the singular PMC.

There's no real action items here, it's more of a discussion prompt. If it
looks like we have general consensus to this approach, then I'll start
putting together individual proposals for a few repos to exercise the
process and get more contributions going. I'll probably put the proposals
together even if there's no replies here, but I'd much rather have some
acknowledgement from the community that I'm headed in a sustainable
direction!

Mike

[1]:
https://lists.apache.org/thread.html/rb90f530155dc6edc6f1ccd5f056db1618142fdfcbd32d83f539d984b%40%3Cdev.lucene.apache.org%3E
[2]:
https://lists.apache.org/thread.html/r9965cb693369d927a942f805c134bfeb45c5e80f447ad0fe2f663fae%40%3Cdev.lucene.apache.org%3E
Re: [DISCUSS][SOLR] Multiple Repos For Contributions [ In reply to ]
Thanks for shining a spotlight on this Mike.
I have some questions to consider. I'll call these additional repos,
"external contribs", or just contribs for short here; perhaps our internal
contribs would migrate.

Q: Would each contrib be released at its own cadence unrelated to Solr? I
suppose so.
Q: Would each contrib have it's own release vote? I suppose so, as it has
its own artifact. I think the ASF requires this.
Q: Is it "okay" to release new Solr versions that break any of these
external contribs? Knowingly or unknowingly -- does it matter?
Q: What technical work is needed to extricate an internal contrib to an
external?
* source control history. (note: i've done this git history in a single
folder extraction before, with a popular Stackoverflow answer)
* mandatory ASF files, e.g. license, notice
* more files that we may want: CHANGES.txt
* More build files; copying the rules/setup/standards of the Solr
mothership and will become divergent over time no doubt. Or just KISS
principle; no sharing; simple Maven projects.
Q: Could & should many contribs live in one repo (no more internal
contribs), yet each still have its own release cycle? This could make
sharing build infrastructure easier, and detecting Solr compatibility with
them easier. Although it would mean sharing GitHub project area, thus
sharing issues/PRs.
Q: Should we create a separate JIRA for these contribs... or ditch JIRA
entirely for them, relying on GitHub alone?
Q: Would contribs be treated as first class citizens in the Solr Reference
Guide (they are still in the ASF after all), or would they be banished like
the DIH was?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Nov 12, 2020 at 6:40 PM Mike Drob <mdrob@apache.org> wrote:

> Solr Devs,
>
> We've slowly been moving into a multi-repository model, and I wanted to
> bring some more attention to it and have a more focused discussion. We've
> recently embarked upon the acceptance of solr-operator as a distinct
> repo[1] under the care of the Lucene (soon to be Solr) PMC. I expect that
> there will be more cases of this as we transition additional contribs out
> of core, or as more plugins, packages, and integrations mature. Some will
> make sense as externally maintained code bases, but I believe other
> contributions may benefit our community more as part of the Apache
> Foundation.
>
> I think there was a very insightful comment[2] made by GP regarding
> adopting a similar model to Apache Commons governance, bringing attention
> to it here because I fear it may have gotten lost deep in the thread. Based
> on observations of Commons and a few other Apache projects with multi-repo
> setups, there thankfully does not appear to be a limit on how many
> repositories a PMC can maintain. The size and scope of each individual
> repository can vary greatly. I see potential ideas for anything that could
> be standalone and not tied to a release cycle (Admin UI, DIH, etc...), or
> anything that bridges integrations between Solr and other systems (k8s,
> HDFS, etc...).
>
> The risks that new repos face are similar to the risks they would have
> encountered as contrib modules, but I don't think they should dissuade us.
> Each project would need to start with a champion or sponsor and a
> discussion on the mailing list. From there, we can vote to accept the code,
> or just the idea if there is no code yet, as a community and create the
> repo. As part of a natural lifecycle, if there's not enough momentum or
> adoption over time, then we can update the README and docs and "retire"
> certain projects. The exact mechanisms can be undetermined for now; maybe
> it's a repo rename, maybe it's marking the repo read-only, maybe it's
> something else.
>
> The Commons model is that everyone is a committer on everything. There are
> other governance models, like Hadoop, with "area committers" who are
> limited to the specific repositories they have contributed frequently to.
> I'm not sure which model ultimately suits us better, but I think that
> leveraging area committers would allow us to recognize and empower
> contributors sooner and more frequently. Releases would still need to be
> voted on and approved by the singular PMC.
>
> There's no real action items here, it's more of a discussion prompt. If it
> looks like we have general consensus to this approach, then I'll start
> putting together individual proposals for a few repos to exercise the
> process and get more contributions going. I'll probably put the proposals
> together even if there's no replies here, but I'd much rather have some
> acknowledgement from the community that I'm headed in a sustainable
> direction!
>
> Mike
>
> [1]:
> https://lists.apache.org/thread.html/rb90f530155dc6edc6f1ccd5f056db1618142fdfcbd32d83f539d984b%40%3Cdev.lucene.apache.org%3E
> [2]:
> https://lists.apache.org/thread.html/r9965cb693369d927a942f805c134bfeb45c5e80f447ad0fe2f663fae%40%3Cdev.lucene.apache.org%3E
>
Re: [DISCUSS][SOLR] Multiple Repos For Contributions [ In reply to ]
Thanks for sending this email, Mike and thanks for the follow up, David.

The idea of having multiple repos under the project seems like the
reasonable way to go for our project. This allows us to support more
features/tooling/etc. without having to link them to Solr or Lucene
releases.

An important thing here is to understand that if it comes from under the
same umbrella, it should be treated with the same care and respect - at
least we should attempt to.

Q: Is it "okay" to release new Solr versions that break any of these
> external contribs? Knowingly or unknowingly -- does it matter?

I think it's really important to understand that breaking compat here
should be a well thought off thing, especially as that's the
differentiating factor for code that resides under the project vs. external
repos. It doesn't mean that compat breaks can't happen, it's just that
there would be more responsibility to providing a smooth upgrade path for
users in case of compat breaks.

From my perspective, the code in the external repos here would be just like
the code in the core repo, just with a different release cadence.

Q: Would contribs be treated as first class citizens in the Solr Reference
> Guide (they are still in the ASF after all), or would they be banished like
> the DIH was?

The repos are supposed to grow, and with that, adding more to the current
ref guide would be just bad user experience. In addition, the different
release cadence would make it difficult to support documentation for the
code in these repos via the ref guide that would be released with the core.
We should certainly aim for the same quality of documentation, but not make
it to be a part of the ref guide.



On Sat, Nov 14, 2020 at 8:54 PM David Smiley <dsmiley@apache.org> wrote:

> Thanks for shining a spotlight on this Mike.
> I have some questions to consider. I'll call these additional repos,
> "external contribs", or just contribs for short here; perhaps our internal
> contribs would migrate.
>
> Q: Would each contrib be released at its own cadence unrelated to Solr? I
> suppose so.
> Q: Would each contrib have it's own release vote? I suppose so, as it has
> its own artifact. I think the ASF requires this.
> Q: Is it "okay" to release new Solr versions that break any of these
> external contribs? Knowingly or unknowingly -- does it matter?
> Q: What technical work is needed to extricate an internal contrib to an
> external?
> * source control history. (note: i've done this git history in a single
> folder extraction before, with a popular Stackoverflow answer)
> * mandatory ASF files, e.g. license, notice
> * more files that we may want: CHANGES.txt
> * More build files; copying the rules/setup/standards of the Solr
> mothership and will become divergent over time no doubt. Or just KISS
> principle; no sharing; simple Maven projects.
> Q: Could & should many contribs live in one repo (no more internal
> contribs), yet each still have its own release cycle? This could make
> sharing build infrastructure easier, and detecting Solr compatibility with
> them easier. Although it would mean sharing GitHub project area, thus
> sharing issues/PRs.
> Q: Should we create a separate JIRA for these contribs... or ditch JIRA
> entirely for them, relying on GitHub alone?
> Q: Would contribs be treated as first class citizens in the Solr Reference
> Guide (they are still in the ASF after all), or would they be banished like
> the DIH was?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Thu, Nov 12, 2020 at 6:40 PM Mike Drob <mdrob@apache.org> wrote:
>
>> Solr Devs,
>>
>> We've slowly been moving into a multi-repository model, and I wanted to
>> bring some more attention to it and have a more focused discussion. We've
>> recently embarked upon the acceptance of solr-operator as a distinct
>> repo[1] under the care of the Lucene (soon to be Solr) PMC. I expect that
>> there will be more cases of this as we transition additional contribs out
>> of core, or as more plugins, packages, and integrations mature. Some will
>> make sense as externally maintained code bases, but I believe other
>> contributions may benefit our community more as part of the Apache
>> Foundation.
>>
>> I think there was a very insightful comment[2] made by GP regarding
>> adopting a similar model to Apache Commons governance, bringing attention
>> to it here because I fear it may have gotten lost deep in the thread. Based
>> on observations of Commons and a few other Apache projects with multi-repo
>> setups, there thankfully does not appear to be a limit on how many
>> repositories a PMC can maintain. The size and scope of each individual
>> repository can vary greatly. I see potential ideas for anything that could
>> be standalone and not tied to a release cycle (Admin UI, DIH, etc...), or
>> anything that bridges integrations between Solr and other systems (k8s,
>> HDFS, etc...).
>>
>> The risks that new repos face are similar to the risks they would have
>> encountered as contrib modules, but I don't think they should dissuade us.
>> Each project would need to start with a champion or sponsor and a
>> discussion on the mailing list. From there, we can vote to accept the code,
>> or just the idea if there is no code yet, as a community and create the
>> repo. As part of a natural lifecycle, if there's not enough momentum or
>> adoption over time, then we can update the README and docs and "retire"
>> certain projects. The exact mechanisms can be undetermined for now; maybe
>> it's a repo rename, maybe it's marking the repo read-only, maybe it's
>> something else.
>>
>> The Commons model is that everyone is a committer on everything. There
>> are other governance models, like Hadoop, with "area committers" who are
>> limited to the specific repositories they have contributed frequently to.
>> I'm not sure which model ultimately suits us better, but I think that
>> leveraging area committers would allow us to recognize and empower
>> contributors sooner and more frequently. Releases would still need to be
>> voted on and approved by the singular PMC.
>>
>> There's no real action items here, it's more of a discussion prompt. If
>> it looks like we have general consensus to this approach, then I'll start
>> putting together individual proposals for a few repos to exercise the
>> process and get more contributions going. I'll probably put the proposals
>> together even if there's no replies here, but I'd much rather have some
>> acknowledgement from the community that I'm headed in a sustainable
>> direction!
>>
>> Mike
>>
>> [1]:
>> https://lists.apache.org/thread.html/rb90f530155dc6edc6f1ccd5f056db1618142fdfcbd32d83f539d984b%40%3Cdev.lucene.apache.org%3E
>> [2]:
>> https://lists.apache.org/thread.html/r9965cb693369d927a942f805c134bfeb45c5e80f447ad0fe2f663fae%40%3Cdev.lucene.apache.org%3E
>>
>
Re: [DISCUSS][SOLR] Multiple Repos For Contributions [ In reply to ]
Thank you for the replies so far.

I think that each contrib would necessarily have to have their own release
schedule and release vote. I suspect that there might be frequent releases
at first, and then these will smooth out into basically once per major
release. I also think that contribs having releases could reduce the number
of minor releases that we need to do, if a certain feature is well
contained.

Compatibility breaks will happen, but I feel like we should try to avoid
them. Sometimes they're inevitable though, and we'll need to clearly mark
that version X of the contrib is only compatible with version X of Solr,
and for newer versions of Solr you have to use Y. Maybe we'll be able to
release contrib Y first, and have it bridge the Solr releases. I think
we'll need to invest in CI tooling to catch these kinds of situations
sooner.

> * More build files; copying the rules/setup/standards of the Solr
mothership and will become divergent over time no doubt. Or just KISS
principle; no sharing; simple Maven projects.

I wonder what the Gradle equivalent would be here. In maven-land, we can
define a parent pom and attach a bunch of configuration and rules and
plugins to it, and reuse across repositories and projects. Maybe the gradle
build rules turn into an externally referenced project as well. I don't
know what we'll need, but being able to apply all of our validation and
precommit rules consistently to the contribs seems important.

> Q: Could & should many contribs live in one repo (no more internal
contribs), yet each still have its own release cycle? This could make
sharing build infrastructure easier, and detecting Solr compatibility with
them easier. Although it would mean sharing GitHub project area, thus
sharing issues/PRs.

I don't know. It would make source releases more complicated, which are
what the ASF releases provide. I think it would make testing a contrib
against multiple versions of Solr more difficult as well.

> Q: Should we create a separate JIRA for these contribs... or ditch JIRA
entirely for them, relying on GitHub alone?

I'd start with same JIRA, with a separate component or label. I don't think
GH issues would be good because it becomes harder to link between core and
contrib issues in case of compat or tandme feature development.

> Q: Would contribs be treated as first class citizens in the Solr
Reference Guide (they are still in the ASF after all), or would they be
banished like the DIH was?
Probably a link in the reference guide to a list of contribs, and then each
contrib has its own documentation.

On Tue, Nov 17, 2020 at 10:00 AM Anshum Gupta <anshum@apache.org> wrote:

> Thanks for sending this email, Mike and thanks for the follow up, David.
>
> The idea of having multiple repos under the project seems like the
> reasonable way to go for our project. This allows us to support more
> features/tooling/etc. without having to link them to Solr or Lucene
> releases.
>
> An important thing here is to understand that if it comes from under the
> same umbrella, it should be treated with the same care and respect - at
> least we should attempt to.
>
> Q: Is it "okay" to release new Solr versions that break any of these
>> external contribs? Knowingly or unknowingly -- does it matter?
>
> I think it's really important to understand that breaking compat here
> should be a well thought off thing, especially as that's the
> differentiating factor for code that resides under the project vs. external
> repos. It doesn't mean that compat breaks can't happen, it's just that
> there would be more responsibility to providing a smooth upgrade path for
> users in case of compat breaks.
>
> From my perspective, the code in the external repos here would be just
> like the code in the core repo, just with a different release cadence.
>
> Q: Would contribs be treated as first class citizens in the Solr Reference
>> Guide (they are still in the ASF after all), or would they be banished like
>> the DIH was?
>
> The repos are supposed to grow, and with that, adding more to the current
> ref guide would be just bad user experience. In addition, the different
> release cadence would make it difficult to support documentation for the
> code in these repos via the ref guide that would be released with the core.
> We should certainly aim for the same quality of documentation, but not make
> it to be a part of the ref guide.
>
>
>
> On Sat, Nov 14, 2020 at 8:54 PM David Smiley <dsmiley@apache.org> wrote:
>
>> Thanks for shining a spotlight on this Mike.
>> I have some questions to consider. I'll call these additional repos,
>> "external contribs", or just contribs for short here; perhaps our internal
>> contribs would migrate.
>>
>> Q: Would each contrib be released at its own cadence unrelated to Solr?
>> I suppose so.
>> Q: Would each contrib have it's own release vote? I suppose so, as it
>> has its own artifact. I think the ASF requires this.
>> Q: Is it "okay" to release new Solr versions that break any of these
>> external contribs? Knowingly or unknowingly -- does it matter?
>> Q: What technical work is needed to extricate an internal contrib to an
>> external?
>> * source control history. (note: i've done this git history in a single
>> folder extraction before, with a popular Stackoverflow answer)
>> * mandatory ASF files, e.g. license, notice
>> * more files that we may want: CHANGES.txt
>> * More build files; copying the rules/setup/standards of the Solr
>> mothership and will become divergent over time no doubt. Or just KISS
>> principle; no sharing; simple Maven projects.
>> Q: Could & should many contribs live in one repo (no more internal
>> contribs), yet each still have its own release cycle? This could make
>> sharing build infrastructure easier, and detecting Solr compatibility with
>> them easier. Although it would mean sharing GitHub project area, thus
>> sharing issues/PRs.
>> Q: Should we create a separate JIRA for these contribs... or ditch JIRA
>> entirely for them, relying on GitHub alone?
>> Q: Would contribs be treated as first class citizens in the Solr
>> Reference Guide (they are still in the ASF after all), or would they be
>> banished like the DIH was?
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Thu, Nov 12, 2020 at 6:40 PM Mike Drob <mdrob@apache.org> wrote:
>>
>>> Solr Devs,
>>>
>>> We've slowly been moving into a multi-repository model, and I wanted to
>>> bring some more attention to it and have a more focused discussion. We've
>>> recently embarked upon the acceptance of solr-operator as a distinct
>>> repo[1] under the care of the Lucene (soon to be Solr) PMC. I expect that
>>> there will be more cases of this as we transition additional contribs out
>>> of core, or as more plugins, packages, and integrations mature. Some will
>>> make sense as externally maintained code bases, but I believe other
>>> contributions may benefit our community more as part of the Apache
>>> Foundation.
>>>
>>> I think there was a very insightful comment[2] made by GP regarding
>>> adopting a similar model to Apache Commons governance, bringing attention
>>> to it here because I fear it may have gotten lost deep in the thread. Based
>>> on observations of Commons and a few other Apache projects with multi-repo
>>> setups, there thankfully does not appear to be a limit on how many
>>> repositories a PMC can maintain. The size and scope of each individual
>>> repository can vary greatly. I see potential ideas for anything that could
>>> be standalone and not tied to a release cycle (Admin UI, DIH, etc...), or
>>> anything that bridges integrations between Solr and other systems (k8s,
>>> HDFS, etc...).
>>>
>>> The risks that new repos face are similar to the risks they would have
>>> encountered as contrib modules, but I don't think they should dissuade us.
>>> Each project would need to start with a champion or sponsor and a
>>> discussion on the mailing list. From there, we can vote to accept the code,
>>> or just the idea if there is no code yet, as a community and create the
>>> repo. As part of a natural lifecycle, if there's not enough momentum or
>>> adoption over time, then we can update the README and docs and "retire"
>>> certain projects. The exact mechanisms can be undetermined for now; maybe
>>> it's a repo rename, maybe it's marking the repo read-only, maybe it's
>>> something else.
>>>
>>> The Commons model is that everyone is a committer on everything. There
>>> are other governance models, like Hadoop, with "area committers" who are
>>> limited to the specific repositories they have contributed frequently to.
>>> I'm not sure which model ultimately suits us better, but I think that
>>> leveraging area committers would allow us to recognize and empower
>>> contributors sooner and more frequently. Releases would still need to be
>>> voted on and approved by the singular PMC.
>>>
>>> There's no real action items here, it's more of a discussion prompt. If
>>> it looks like we have general consensus to this approach, then I'll start
>>> putting together individual proposals for a few repos to exercise the
>>> process and get more contributions going. I'll probably put the proposals
>>> together even if there's no replies here, but I'd much rather have some
>>> acknowledgement from the community that I'm headed in a sustainable
>>> direction!
>>>
>>> Mike
>>>
>>> [1]:
>>> https://lists.apache.org/thread.html/rb90f530155dc6edc6f1ccd5f056db1618142fdfcbd32d83f539d984b%40%3Cdev.lucene.apache.org%3E
>>> [2]:
>>> https://lists.apache.org/thread.html/r9965cb693369d927a942f805c134bfeb45c5e80f447ad0fe2f663fae%40%3Cdev.lucene.apache.org%3E
>>>
>>
Re: [DISCUSS][SOLR] Multiple Repos For Contributions [ In reply to ]
>
> > Q: Should we create a separate JIRA for these contribs... or ditch JIRA
> entirely for them, relying on GitHub alone?
>
> I'd start with same JIRA, with a separate component or label. I don't
> think GH issues would be good because it becomes harder to link between
> core and contrib issues in case of compat or tandme feature development.
>

By "hard to link", are you basically saying pasting URLs is hard ;-). ?
There was a committer meeting in Montreal where some folks like Jan Hoydal
and Varun (if I'm not mistaken; I may be) advocated for considering more
GitHub centric issue tracking. I was not in favor of that... however for
contribs/modules that get their own separate repos, it affords an
opportunity for a break with the past in the interests of simplicity and
familiarity for what contributors are already familiar with.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Nov 17, 2020 at 7:42 PM Mike Drob <mdrob@apache.org> wrote:

> Thank you for the replies so far.
>
> I think that each contrib would necessarily have to have their own release
> schedule and release vote. I suspect that there might be frequent releases
> at first, and then these will smooth out into basically once per major
> release. I also think that contribs having releases could reduce the number
> of minor releases that we need to do, if a certain feature is well
> contained.
>
> Compatibility breaks will happen, but I feel like we should try to avoid
> them. Sometimes they're inevitable though, and we'll need to clearly mark
> that version X of the contrib is only compatible with version X of Solr,
> and for newer versions of Solr you have to use Y. Maybe we'll be able to
> release contrib Y first, and have it bridge the Solr releases. I think
> we'll need to invest in CI tooling to catch these kinds of situations
> sooner.
>
> > * More build files; copying the rules/setup/standards of the Solr
> mothership and will become divergent over time no doubt. Or just KISS
> principle; no sharing; simple Maven projects.
>
> I wonder what the Gradle equivalent would be here. In maven-land, we can
> define a parent pom and attach a bunch of configuration and rules and
> plugins to it, and reuse across repositories and projects. Maybe the gradle
> build rules turn into an externally referenced project as well. I don't
> know what we'll need, but being able to apply all of our validation and
> precommit rules consistently to the contribs seems important.
>
> > Q: Could & should many contribs live in one repo (no more internal
> contribs), yet each still have its own release cycle? This could make
> sharing build infrastructure easier, and detecting Solr compatibility with
> them easier. Although it would mean sharing GitHub project area, thus
> sharing issues/PRs.
>
> I don't know. It would make source releases more complicated, which are
> what the ASF releases provide. I think it would make testing a contrib
> against multiple versions of Solr more difficult as well.
>
> > Q: Should we create a separate JIRA for these contribs... or ditch JIRA
> entirely for them, relying on GitHub alone?
>
> I'd start with same JIRA, with a separate component or label. I don't
> think GH issues would be good because it becomes harder to link between
> core and contrib issues in case of compat or tandme feature development.
>
> > Q: Would contribs be treated as first class citizens in the Solr
> Reference Guide (they are still in the ASF after all), or would they be
> banished like the DIH was?
> Probably a link in the reference guide to a list of contribs, and then
> each contrib has its own documentation.
>
> On Tue, Nov 17, 2020 at 10:00 AM Anshum Gupta <anshum@apache.org> wrote:
>
>> Thanks for sending this email, Mike and thanks for the follow up, David.
>>
>> The idea of having multiple repos under the project seems like the
>> reasonable way to go for our project. This allows us to support more
>> features/tooling/etc. without having to link them to Solr or Lucene
>> releases.
>>
>> An important thing here is to understand that if it comes from under the
>> same umbrella, it should be treated with the same care and respect - at
>> least we should attempt to.
>>
>> Q: Is it "okay" to release new Solr versions that break any of these
>>> external contribs? Knowingly or unknowingly -- does it matter?
>>
>> I think it's really important to understand that breaking compat here
>> should be a well thought off thing, especially as that's the
>> differentiating factor for code that resides under the project vs. external
>> repos. It doesn't mean that compat breaks can't happen, it's just that
>> there would be more responsibility to providing a smooth upgrade path for
>> users in case of compat breaks.
>>
>> From my perspective, the code in the external repos here would be just
>> like the code in the core repo, just with a different release cadence.
>>
>> Q: Would contribs be treated as first class citizens in the Solr
>>> Reference Guide (they are still in the ASF after all), or would they be
>>> banished like the DIH was?
>>
>> The repos are supposed to grow, and with that, adding more to the current
>> ref guide would be just bad user experience. In addition, the different
>> release cadence would make it difficult to support documentation for the
>> code in these repos via the ref guide that would be released with the core.
>> We should certainly aim for the same quality of documentation, but not make
>> it to be a part of the ref guide.
>>
>>
>>
>> On Sat, Nov 14, 2020 at 8:54 PM David Smiley <dsmiley@apache.org> wrote:
>>
>>> Thanks for shining a spotlight on this Mike.
>>> I have some questions to consider. I'll call these additional repos,
>>> "external contribs", or just contribs for short here; perhaps our internal
>>> contribs would migrate.
>>>
>>> Q: Would each contrib be released at its own cadence unrelated to Solr?
>>> I suppose so.
>>> Q: Would each contrib have it's own release vote? I suppose so, as it
>>> has its own artifact. I think the ASF requires this.
>>> Q: Is it "okay" to release new Solr versions that break any of these
>>> external contribs? Knowingly or unknowingly -- does it matter?
>>> Q: What technical work is needed to extricate an internal contrib to an
>>> external?
>>> * source control history. (note: i've done this git history in a single
>>> folder extraction before, with a popular Stackoverflow answer)
>>> * mandatory ASF files, e.g. license, notice
>>> * more files that we may want: CHANGES.txt
>>> * More build files; copying the rules/setup/standards of the Solr
>>> mothership and will become divergent over time no doubt. Or just KISS
>>> principle; no sharing; simple Maven projects.
>>> Q: Could & should many contribs live in one repo (no more internal
>>> contribs), yet each still have its own release cycle? This could make
>>> sharing build infrastructure easier, and detecting Solr compatibility with
>>> them easier. Although it would mean sharing GitHub project area, thus
>>> sharing issues/PRs.
>>> Q: Should we create a separate JIRA for these contribs... or ditch JIRA
>>> entirely for them, relying on GitHub alone?
>>> Q: Would contribs be treated as first class citizens in the Solr
>>> Reference Guide (they are still in the ASF after all), or would they be
>>> banished like the DIH was?
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Thu, Nov 12, 2020 at 6:40 PM Mike Drob <mdrob@apache.org> wrote:
>>>
>>>> Solr Devs,
>>>>
>>>> We've slowly been moving into a multi-repository model, and I wanted to
>>>> bring some more attention to it and have a more focused discussion. We've
>>>> recently embarked upon the acceptance of solr-operator as a distinct
>>>> repo[1] under the care of the Lucene (soon to be Solr) PMC. I expect that
>>>> there will be more cases of this as we transition additional contribs out
>>>> of core, or as more plugins, packages, and integrations mature. Some will
>>>> make sense as externally maintained code bases, but I believe other
>>>> contributions may benefit our community more as part of the Apache
>>>> Foundation.
>>>>
>>>> I think there was a very insightful comment[2] made by GP regarding
>>>> adopting a similar model to Apache Commons governance, bringing attention
>>>> to it here because I fear it may have gotten lost deep in the thread. Based
>>>> on observations of Commons and a few other Apache projects with multi-repo
>>>> setups, there thankfully does not appear to be a limit on how many
>>>> repositories a PMC can maintain. The size and scope of each individual
>>>> repository can vary greatly. I see potential ideas for anything that could
>>>> be standalone and not tied to a release cycle (Admin UI, DIH, etc...), or
>>>> anything that bridges integrations between Solr and other systems (k8s,
>>>> HDFS, etc...).
>>>>
>>>> The risks that new repos face are similar to the risks they would have
>>>> encountered as contrib modules, but I don't think they should dissuade us.
>>>> Each project would need to start with a champion or sponsor and a
>>>> discussion on the mailing list. From there, we can vote to accept the code,
>>>> or just the idea if there is no code yet, as a community and create the
>>>> repo. As part of a natural lifecycle, if there's not enough momentum or
>>>> adoption over time, then we can update the README and docs and "retire"
>>>> certain projects. The exact mechanisms can be undetermined for now; maybe
>>>> it's a repo rename, maybe it's marking the repo read-only, maybe it's
>>>> something else.
>>>>
>>>> The Commons model is that everyone is a committer on everything. There
>>>> are other governance models, like Hadoop, with "area committers" who are
>>>> limited to the specific repositories they have contributed frequently to.
>>>> I'm not sure which model ultimately suits us better, but I think that
>>>> leveraging area committers would allow us to recognize and empower
>>>> contributors sooner and more frequently. Releases would still need to be
>>>> voted on and approved by the singular PMC.
>>>>
>>>> There's no real action items here, it's more of a discussion prompt. If
>>>> it looks like we have general consensus to this approach, then I'll start
>>>> putting together individual proposals for a few repos to exercise the
>>>> process and get more contributions going. I'll probably put the proposals
>>>> together even if there's no replies here, but I'd much rather have some
>>>> acknowledgement from the community that I'm headed in a sustainable
>>>> direction!
>>>>
>>>> Mike
>>>>
>>>> [1]:
>>>> https://lists.apache.org/thread.html/rb90f530155dc6edc6f1ccd5f056db1618142fdfcbd32d83f539d984b%40%3Cdev.lucene.apache.org%3E
>>>> [2]:
>>>> https://lists.apache.org/thread.html/r9965cb693369d927a942f805c134bfeb45c5e80f447ad0fe2f663fae%40%3Cdev.lucene.apache.org%3E
>>>>
>>>
Re: [DISCUSS][SOLR] Multiple Repos For Contributions [ In reply to ]
Let each sub project decide for themselves. PYLUCENE has its own svn repo and its own Jira space.
Solr-operator should be allowed to continue with GH issues and PRs i.m.o. No need to force them into JIRA as long as the ASF allows projects to choose.

Jan

> 24. nov. 2020 kl. 20:59 skrev David Smiley <dsmiley@apache.org>:
>
> > Q: Should we create a separate JIRA for these contribs... or ditch JIRA entirely for them, relying on GitHub alone?
>
> I'd start with same JIRA, with a separate component or label. I don't think GH issues would be good because it becomes harder to link between core and contrib issues in case of compat or tandme feature development.
>
> By "hard to link", are you basically saying pasting URLs is hard ;-). ? There was a committer meeting in Montreal where some folks like Jan Hoydal and Varun (if I'm not mistaken; I may be) advocated for considering more GitHub centric issue tracking. I was not in favor of that... however for contribs/modules that get their own separate repos, it affords an opportunity for a break with the past in the interests of simplicity and familiarity for what contributors are already familiar with.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>
> On Tue, Nov 17, 2020 at 7:42 PM Mike Drob <mdrob@apache.org <mailto:mdrob@apache.org>> wrote:
> Thank you for the replies so far.
>
> I think that each contrib would necessarily have to have their own release schedule and release vote. I suspect that there might be frequent releases at first, and then these will smooth out into basically once per major release. I also think that contribs having releases could reduce the number of minor releases that we need to do, if a certain feature is well contained.
>
> Compatibility breaks will happen, but I feel like we should try to avoid them. Sometimes they're inevitable though, and we'll need to clearly mark that version X of the contrib is only compatible with version X of Solr, and for newer versions of Solr you have to use Y. Maybe we'll be able to release contrib Y first, and have it bridge the Solr releases. I think we'll need to invest in CI tooling to catch these kinds of situations sooner.
>
> > * More build files; copying the rules/setup/standards of the Solr mothership and will become divergent over time no doubt. Or just KISS principle; no sharing; simple Maven projects.
>
> I wonder what the Gradle equivalent would be here. In maven-land, we can define a parent pom and attach a bunch of configuration and rules and plugins to it, and reuse across repositories and projects. Maybe the gradle build rules turn into an externally referenced project as well. I don't know what we'll need, but being able to apply all of our validation and precommit rules consistently to the contribs seems important.
>
> > Q: Could & should many contribs live in one repo (no more internal contribs), yet each still have its own release cycle? This could make sharing build infrastructure easier, and detecting Solr compatibility with them easier. Although it would mean sharing GitHub project area, thus sharing issues/PRs.
>
> I don't know. It would make source releases more complicated, which are what the ASF releases provide. I think it would make testing a contrib against multiple versions of Solr more difficult as well.
>
> > Q: Should we create a separate JIRA for these contribs... or ditch JIRA entirely for them, relying on GitHub alone?
>
> I'd start with same JIRA, with a separate component or label. I don't think GH issues would be good because it becomes harder to link between core and contrib issues in case of compat or tandme feature development.
>
> > Q: Would contribs be treated as first class citizens in the Solr Reference Guide (they are still in the ASF after all), or would they be banished like the DIH was?
> Probably a link in the reference guide to a list of contribs, and then each contrib has its own documentation.
>
> On Tue, Nov 17, 2020 at 10:00 AM Anshum Gupta <anshum@apache.org <mailto:anshum@apache.org>> wrote:
> Thanks for sending this email, Mike and thanks for the follow up, David.
>
> The idea of having multiple repos under the project seems like the reasonable way to go for our project. This allows us to support more features/tooling/etc. without having to link them to Solr or Lucene releases.
>
> An important thing here is to understand that if it comes from under the same umbrella, it should be treated with the same care and respect - at least we should attempt to.
>
> Q: Is it "okay" to release new Solr versions that break any of these external contribs? Knowingly or unknowingly -- does it matter?
> I think it's really important to understand that breaking compat here should be a well thought off thing, especially as that's the differentiating factor for code that resides under the project vs. external repos. It doesn't mean that compat breaks can't happen, it's just that there would be more responsibility to providing a smooth upgrade path for users in case of compat breaks.
>
> From my perspective, the code in the external repos here would be just like the code in the core repo, just with a different release cadence.
>
> Q: Would contribs be treated as first class citizens in the Solr Reference Guide (they are still in the ASF after all), or would they be banished like the DIH was?
> The repos are supposed to grow, and with that, adding more to the current ref guide would be just bad user experience. In addition, the different release cadence would make it difficult to support documentation for the code in these repos via the ref guide that would be released with the core. We should certainly aim for the same quality of documentation, but not make it to be a part of the ref guide.
>
>
>
> On Sat, Nov 14, 2020 at 8:54 PM David Smiley <dsmiley@apache.org <mailto:dsmiley@apache.org>> wrote:
> Thanks for shining a spotlight on this Mike.
> I have some questions to consider. I'll call these additional repos, "external contribs", or just contribs for short here; perhaps our internal contribs would migrate.
>
> Q: Would each contrib be released at its own cadence unrelated to Solr? I suppose so.
> Q: Would each contrib have it's own release vote? I suppose so, as it has its own artifact. I think the ASF requires this.
> Q: Is it "okay" to release new Solr versions that break any of these external contribs? Knowingly or unknowingly -- does it matter?
> Q: What technical work is needed to extricate an internal contrib to an external?
> * source control history. (note: i've done this git history in a single folder extraction before, with a popular Stackoverflow answer)
> * mandatory ASF files, e.g. license, notice
> * more files that we may want: CHANGES.txt
> * More build files; copying the rules/setup/standards of the Solr mothership and will become divergent over time no doubt. Or just KISS principle; no sharing; simple Maven projects.
> Q: Could & should many contribs live in one repo (no more internal contribs), yet each still have its own release cycle? This could make sharing build infrastructure easier, and detecting Solr compatibility with them easier. Although it would mean sharing GitHub project area, thus sharing issues/PRs.
> Q: Should we create a separate JIRA for these contribs... or ditch JIRA entirely for them, relying on GitHub alone?
> Q: Would contribs be treated as first class citizens in the Solr Reference Guide (they are still in the ASF after all), or would they be banished like the DIH was?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley <http://www.linkedin.com/in/davidwsmiley>
>
> On Thu, Nov 12, 2020 at 6:40 PM Mike Drob <mdrob@apache.org <mailto:mdrob@apache.org>> wrote:
> Solr Devs,
>
> We've slowly been moving into a multi-repository model, and I wanted to bring some more attention to it and have a more focused discussion. We've recently embarked upon the acceptance of solr-operator as a distinct repo[1] under the care of the Lucene (soon to be Solr) PMC. I expect that there will be more cases of this as we transition additional contribs out of core, or as more plugins, packages, and integrations mature. Some will make sense as externally maintained code bases, but I believe other contributions may benefit our community more as part of the Apache Foundation.
>
> I think there was a very insightful comment[2] made by GP regarding adopting a similar model to Apache Commons governance, bringing attention to it here because I fear it may have gotten lost deep in the thread. Based on observations of Commons and a few other Apache projects with multi-repo setups, there thankfully does not appear to be a limit on how many repositories a PMC can maintain. The size and scope of each individual repository can vary greatly. I see potential ideas for anything that could be standalone and not tied to a release cycle (Admin UI, DIH, etc...), or anything that bridges integrations between Solr and other systems (k8s, HDFS, etc...).
>
> The risks that new repos face are similar to the risks they would have encountered as contrib modules, but I don't think they should dissuade us. Each project would need to start with a champion or sponsor and a discussion on the mailing list. From there, we can vote to accept the code, or just the idea if there is no code yet, as a community and create the repo. As part of a natural lifecycle, if there's not enough momentum or adoption over time, then we can update the README and docs and "retire" certain projects. The exact mechanisms can be undetermined for now; maybe it's a repo rename, maybe it's marking the repo read-only, maybe it's something else.
>
> The Commons model is that everyone is a committer on everything. There are other governance models, like Hadoop, with "area committers" who are limited to the specific repositories they have contributed frequently to. I'm not sure which model ultimately suits us better, but I think that leveraging area committers would allow us to recognize and empower contributors sooner and more frequently. Releases would still need to be voted on and approved by the singular PMC.
>
> There's no real action items here, it's more of a discussion prompt. If it looks like we have general consensus to this approach, then I'll start putting together individual proposals for a few repos to exercise the process and get more contributions going. I'll probably put the proposals together even if there's no replies here, but I'd much rather have some acknowledgement from the community that I'm headed in a sustainable direction!
>
> Mike
>
> [1]:https://lists.apache.org/thread.html/rb90f530155dc6edc6f1ccd5f056db1618142fdfcbd32d83f539d984b%40%3Cdev.lucene.apache.org%3E <https://lists.apache.org/thread.html/rb90f530155dc6edc6f1ccd5f056db1618142fdfcbd32d83f539d984b%40%3Cdev.lucene.apache.org%3E>
> [2]:https://lists.apache.org/thread.html/r9965cb693369d927a942f805c134bfeb45c5e80f447ad0fe2f663fae%40%3Cdev.lucene.apache.org%3E <https://lists.apache.org/thread.html/r9965cb693369d927a942f805c134bfeb45c5e80f447ad0fe2f663fae%40%3Cdev.lucene.apache.org%3E>
Re: [DISCUSS][SOLR] Multiple Repos For Contributions [ In reply to ]
> then I'll start putting together individual proposals for a few repos to exercise the process and get more contributions going

solr-operator is a great example of PMC-maintained code that makes
sense to have in a separate repository. It's written primarily in a
different language, it's an integration with 3rd party software, etc.

But there are downsides to managing multiple repositories that make me
hesitant about the idea more generally. There's no easy way to
prevent changes in one repo from unintentionally breaking another.
There's at least some duplication in the maintenance of things all
repos need (build systems, etc.). It may add overhead on release
volunteers and the PMC if there are more releases.

I'm not sure how much those'll cause problems in practice. Hopefully
they'll be minimal, but it's possible they won't be. They might end
up outweighing the benefits. I'm not saying we should be afraid of
additional repositories where it makes sense for the domain. But
maybe it'd make sense to use solr-operator as a test case for a few
releases before putting in the effort to move out our current contribs
or change our process of adopting new ones. Since this is more about
long term management, and less about getting in a particular feature
or value for users, we've got a cool opportunity to let the
solr-operator experiment play out before we necessarily need to decide
how to handle similar scenarios.

Just my two cents.

Jason

On Wed, Nov 25, 2020 at 8:23 AM Jan Høydahl <jan.asf@cominvent.com> wrote:
>
> Let each sub project decide for themselves. PYLUCENE has its own svn repo and its own Jira space.
> Solr-operator should be allowed to continue with GH issues and PRs i.m.o. No need to force them into JIRA as long as the ASF allows projects to choose.
>
> Jan
>
> 24. nov. 2020 kl. 20:59 skrev David Smiley <dsmiley@apache.org>:
>
>> > Q: Should we create a separate JIRA for these contribs... or ditch JIRA entirely for them, relying on GitHub alone?
>>
>> I'd start with same JIRA, with a separate component or label. I don't think GH issues would be good because it becomes harder to link between core and contrib issues in case of compat or tandme feature development.
>
>
> By "hard to link", are you basically saying pasting URLs is hard ;-). ? There was a committer meeting in Montreal where some folks like Jan Hoydal and Varun (if I'm not mistaken; I may be) advocated for considering more GitHub centric issue tracking. I was not in favor of that... however for contribs/modules that get their own separate repos, it affords an opportunity for a break with the past in the interests of simplicity and familiarity for what contributors are already familiar with.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Nov 17, 2020 at 7:42 PM Mike Drob <mdrob@apache.org> wrote:
>>
>> Thank you for the replies so far.
>>
>> I think that each contrib would necessarily have to have their own release schedule and release vote. I suspect that there might be frequent releases at first, and then these will smooth out into basically once per major release. I also think that contribs having releases could reduce the number of minor releases that we need to do, if a certain feature is well contained.
>>
>> Compatibility breaks will happen, but I feel like we should try to avoid them. Sometimes they're inevitable though, and we'll need to clearly mark that version X of the contrib is only compatible with version X of Solr, and for newer versions of Solr you have to use Y. Maybe we'll be able to release contrib Y first, and have it bridge the Solr releases. I think we'll need to invest in CI tooling to catch these kinds of situations sooner.
>>
>> > * More build files; copying the rules/setup/standards of the Solr mothership and will become divergent over time no doubt. Or just KISS principle; no sharing; simple Maven projects.
>>
>> I wonder what the Gradle equivalent would be here. In maven-land, we can define a parent pom and attach a bunch of configuration and rules and plugins to it, and reuse across repositories and projects. Maybe the gradle build rules turn into an externally referenced project as well. I don't know what we'll need, but being able to apply all of our validation and precommit rules consistently to the contribs seems important.
>>
>> > Q: Could & should many contribs live in one repo (no more internal contribs), yet each still have its own release cycle? This could make sharing build infrastructure easier, and detecting Solr compatibility with them easier. Although it would mean sharing GitHub project area, thus sharing issues/PRs.
>>
>> I don't know. It would make source releases more complicated, which are what the ASF releases provide. I think it would make testing a contrib against multiple versions of Solr more difficult as well.
>>
>> > Q: Should we create a separate JIRA for these contribs... or ditch JIRA entirely for them, relying on GitHub alone?
>>
>> I'd start with same JIRA, with a separate component or label. I don't think GH issues would be good because it becomes harder to link between core and contrib issues in case of compat or tandme feature development.
>>
>> > Q: Would contribs be treated as first class citizens in the Solr Reference Guide (they are still in the ASF after all), or would they be banished like the DIH was?
>> Probably a link in the reference guide to a list of contribs, and then each contrib has its own documentation.
>>
>> On Tue, Nov 17, 2020 at 10:00 AM Anshum Gupta <anshum@apache.org> wrote:
>>>
>>> Thanks for sending this email, Mike and thanks for the follow up, David.
>>>
>>> The idea of having multiple repos under the project seems like the reasonable way to go for our project. This allows us to support more features/tooling/etc. without having to link them to Solr or Lucene releases.
>>>
>>> An important thing here is to understand that if it comes from under the same umbrella, it should be treated with the same care and respect - at least we should attempt to.
>>>
>>>> Q: Is it "okay" to release new Solr versions that break any of these external contribs? Knowingly or unknowingly -- does it matter?
>>>
>>> I think it's really important to understand that breaking compat here should be a well thought off thing, especially as that's the differentiating factor for code that resides under the project vs. external repos. It doesn't mean that compat breaks can't happen, it's just that there would be more responsibility to providing a smooth upgrade path for users in case of compat breaks.
>>>
>>> From my perspective, the code in the external repos here would be just like the code in the core repo, just with a different release cadence.
>>>
>>>> Q: Would contribs be treated as first class citizens in the Solr Reference Guide (they are still in the ASF after all), or would they be banished like the DIH was?
>>>
>>> The repos are supposed to grow, and with that, adding more to the current ref guide would be just bad user experience. In addition, the different release cadence would make it difficult to support documentation for the code in these repos via the ref guide that would be released with the core. We should certainly aim for the same quality of documentation, but not make it to be a part of the ref guide.
>>>
>>>
>>>
>>> On Sat, Nov 14, 2020 at 8:54 PM David Smiley <dsmiley@apache.org> wrote:
>>>>
>>>> Thanks for shining a spotlight on this Mike.
>>>> I have some questions to consider. I'll call these additional repos, "external contribs", or just contribs for short here; perhaps our internal contribs would migrate.
>>>>
>>>> Q: Would each contrib be released at its own cadence unrelated to Solr? I suppose so.
>>>> Q: Would each contrib have it's own release vote? I suppose so, as it has its own artifact. I think the ASF requires this.
>>>> Q: Is it "okay" to release new Solr versions that break any of these external contribs? Knowingly or unknowingly -- does it matter?
>>>> Q: What technical work is needed to extricate an internal contrib to an external?
>>>> * source control history. (note: i've done this git history in a single folder extraction before, with a popular Stackoverflow answer)
>>>> * mandatory ASF files, e.g. license, notice
>>>> * more files that we may want: CHANGES.txt
>>>> * More build files; copying the rules/setup/standards of the Solr mothership and will become divergent over time no doubt. Or just KISS principle; no sharing; simple Maven projects.
>>>> Q: Could & should many contribs live in one repo (no more internal contribs), yet each still have its own release cycle? This could make sharing build infrastructure easier, and detecting Solr compatibility with them easier. Although it would mean sharing GitHub project area, thus sharing issues/PRs.
>>>> Q: Should we create a separate JIRA for these contribs... or ditch JIRA entirely for them, relying on GitHub alone?
>>>> Q: Would contribs be treated as first class citizens in the Solr Reference Guide (they are still in the ASF after all), or would they be banished like the DIH was?
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Thu, Nov 12, 2020 at 6:40 PM Mike Drob <mdrob@apache.org> wrote:
>>>>>
>>>>> Solr Devs,
>>>>>
>>>>> We've slowly been moving into a multi-repository model, and I wanted to bring some more attention to it and have a more focused discussion. We've recently embarked upon the acceptance of solr-operator as a distinct repo[1] under the care of the Lucene (soon to be Solr) PMC. I expect that there will be more cases of this as we transition additional contribs out of core, or as more plugins, packages, and integrations mature. Some will make sense as externally maintained code bases, but I believe other contributions may benefit our community more as part of the Apache Foundation.
>>>>>
>>>>> I think there was a very insightful comment[2] made by GP regarding adopting a similar model to Apache Commons governance, bringing attention to it here because I fear it may have gotten lost deep in the thread. Based on observations of Commons and a few other Apache projects with multi-repo setups, there thankfully does not appear to be a limit on how many repositories a PMC can maintain. The size and scope of each individual repository can vary greatly. I see potential ideas for anything that could be standalone and not tied to a release cycle (Admin UI, DIH, etc...), or anything that bridges integrations between Solr and other systems (k8s, HDFS, etc...).
>>>>>
>>>>> The risks that new repos face are similar to the risks they would have encountered as contrib modules, but I don't think they should dissuade us. Each project would need to start with a champion or sponsor and a discussion on the mailing list. From there, we can vote to accept the code, or just the idea if there is no code yet, as a community and create the repo. As part of a natural lifecycle, if there's not enough momentum or adoption over time, then we can update the README and docs and "retire" certain projects. The exact mechanisms can be undetermined for now; maybe it's a repo rename, maybe it's marking the repo read-only, maybe it's something else.
>>>>>
>>>>> The Commons model is that everyone is a committer on everything. There are other governance models, like Hadoop, with "area committers" who are limited to the specific repositories they have contributed frequently to. I'm not sure which model ultimately suits us better, but I think that leveraging area committers would allow us to recognize and empower contributors sooner and more frequently. Releases would still need to be voted on and approved by the singular PMC.
>>>>>
>>>>> There's no real action items here, it's more of a discussion prompt. If it looks like we have general consensus to this approach, then I'll start putting together individual proposals for a few repos to exercise the process and get more contributions going. I'll probably put the proposals together even if there's no replies here, but I'd much rather have some acknowledgement from the community that I'm headed in a sustainable direction!
>>>>>
>>>>> Mike
>>>>>
>>>>> [1]:https://lists.apache.org/thread.html/rb90f530155dc6edc6f1ccd5f056db1618142fdfcbd32d83f539d984b%40%3Cdev.lucene.apache.org%3E
>>>>> [2]:https://lists.apache.org/thread.html/r9965cb693369d927a942f805c134bfeb45c5e80f447ad0fe2f663fae%40%3Cdev.lucene.apache.org%3E
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [DISCUSS][SOLR] Multiple Repos For Contributions [ In reply to ]
> By "hard to link", are you basically saying pasting URLs is hard

Both JIRA and GH give nice visual indicators to the state of an issue when
it is linked from another issue. When you manually create links, you lose
that. That's all I was saying.

I like JIRA because everything is in JIRA and I do not envy the person who
has to move mountains to migrate it to whatever new system. I think Jan is
right here - let each project choose for themselves.

> But there are downsides to managing multiple repositories that make me
> hesitant about the idea more generally. There's no easy way to
> prevent changes in one repo from unintentionally breaking another.
> There's at least some duplication in the maintenance of things all
> repos need (build systems, etc.). It may add overhead on release
> volunteers and the PMC if there are more releases.

I tried to address some of these in my second mail to the thread. We'll
need to invest in additional CI, and possibly in some ways to refactor and
DRY the build config.

I don't think adding overhead on releases is a problem - releases are kind
of the reason we are all here anyway. If there's not enough motivation to
make releases happen, then they won't happen, and that will be the end of
it.

> But maybe it'd make sense to use solr-operator as a test case for a few
> releases before putting in the effort to move out our current contribs
> or change our process of adopting new ones

I agree with this, but for a huge caveat. Big things have been deprecated
and removed, and users are going to be very confused on what to do when 9.0
comes around. I don't know to what extent it will impact adoption. But I'd
rather have a plan and a narrative then going into it hoping for the best.
At the same time, I really don't want to have to wait for the solr-operator
experiment to settle before we can get a green light on 9.0 in general. But
that's a different conversation.

Mike

On Wed, Nov 25, 2020 at 10:46 AM Jason Gerlowski <gerlowskija@gmail.com>
wrote:

> > then I'll start putting together individual proposals for a few repos to
> exercise the process and get more contributions going
>
> solr-operator is a great example of PMC-maintained code that makes
> sense to have in a separate repository. It's written primarily in a
> different language, it's an integration with 3rd party software, etc.
>
> But there are downsides to managing multiple repositories that make me
> hesitant about the idea more generally. There's no easy way to
> prevent changes in one repo from unintentionally breaking another.
> There's at least some duplication in the maintenance of things all
> repos need (build systems, etc.). It may add overhead on release
> volunteers and the PMC if there are more releases.
>
> I'm not sure how much those'll cause problems in practice. Hopefully
> they'll be minimal, but it's possible they won't be. They might end
> up outweighing the benefits. I'm not saying we should be afraid of
> additional repositories where it makes sense for the domain. But
> maybe it'd make sense to use solr-operator as a test case for a few
> releases before putting in the effort to move out our current contribs
> or change our process of adopting new ones. Since this is more about
> long term management, and less about getting in a particular feature
> or value for users, we've got a cool opportunity to let the
> solr-operator experiment play out before we necessarily need to decide
> how to handle similar scenarios.
>
> Just my two cents.
>
> Jason
>
> On Wed, Nov 25, 2020 at 8:23 AM Jan Høydahl <jan.asf@cominvent.com> wrote:
> >
> > Let each sub project decide for themselves. PYLUCENE has its own svn
> repo and its own Jira space.
> > Solr-operator should be allowed to continue with GH issues and PRs
> i.m.o. No need to force them into JIRA as long as the ASF allows projects
> to choose.
> >
> > Jan
> >
> > 24. nov. 2020 kl. 20:59 skrev David Smiley <dsmiley@apache.org>:
> >
> >> > Q: Should we create a separate JIRA for these contribs... or ditch
> JIRA entirely for them, relying on GitHub alone?
> >>
> >> I'd start with same JIRA, with a separate component or label. I don't
> think GH issues would be good because it becomes harder to link between
> core and contrib issues in case of compat or tandme feature development.
> >
> >
> > By "hard to link", are you basically saying pasting URLs is hard ;-).
> ? There was a committer meeting in Montreal where some folks like Jan
> Hoydal and Varun (if I'm not mistaken; I may be) advocated for considering
> more GitHub centric issue tracking. I was not in favor of that... however
> for contribs/modules that get their own separate repos, it affords an
> opportunity for a break with the past in the interests of simplicity and
> familiarity for what contributors are already familiar with.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Tue, Nov 17, 2020 at 7:42 PM Mike Drob <mdrob@apache.org> wrote:
> >>
> >> Thank you for the replies so far.
> >>
> >> I think that each contrib would necessarily have to have their own
> release schedule and release vote. I suspect that there might be frequent
> releases at first, and then these will smooth out into basically once per
> major release. I also think that contribs having releases could reduce the
> number of minor releases that we need to do, if a certain feature is well
> contained.
> >>
> >> Compatibility breaks will happen, but I feel like we should try to
> avoid them. Sometimes they're inevitable though, and we'll need to clearly
> mark that version X of the contrib is only compatible with version X of
> Solr, and for newer versions of Solr you have to use Y. Maybe we'll be able
> to release contrib Y first, and have it bridge the Solr releases. I think
> we'll need to invest in CI tooling to catch these kinds of situations
> sooner.
> >>
> >> > * More build files; copying the rules/setup/standards of the Solr
> mothership and will become divergent over time no doubt. Or just KISS
> principle; no sharing; simple Maven projects.
> >>
> >> I wonder what the Gradle equivalent would be here. In maven-land, we
> can define a parent pom and attach a bunch of configuration and rules and
> plugins to it, and reuse across repositories and projects. Maybe the gradle
> build rules turn into an externally referenced project as well. I don't
> know what we'll need, but being able to apply all of our validation and
> precommit rules consistently to the contribs seems important.
> >>
> >> > Q: Could & should many contribs live in one repo (no more internal
> contribs), yet each still have its own release cycle? This could make
> sharing build infrastructure easier, and detecting Solr compatibility with
> them easier. Although it would mean sharing GitHub project area, thus
> sharing issues/PRs.
> >>
> >> I don't know. It would make source releases more complicated, which are
> what the ASF releases provide. I think it would make testing a contrib
> against multiple versions of Solr more difficult as well.
> >>
> >> > Q: Should we create a separate JIRA for these contribs... or ditch
> JIRA entirely for them, relying on GitHub alone?
> >>
> >> I'd start with same JIRA, with a separate component or label. I don't
> think GH issues would be good because it becomes harder to link between
> core and contrib issues in case of compat or tandme feature development.
> >>
> >> > Q: Would contribs be treated as first class citizens in the Solr
> Reference Guide (they are still in the ASF after all), or would they be
> banished like the DIH was?
> >> Probably a link in the reference guide to a list of contribs, and then
> each contrib has its own documentation.
> >>
> >> On Tue, Nov 17, 2020 at 10:00 AM Anshum Gupta <anshum@apache.org>
> wrote:
> >>>
> >>> Thanks for sending this email, Mike and thanks for the follow up,
> David.
> >>>
> >>> The idea of having multiple repos under the project seems like the
> reasonable way to go for our project. This allows us to support more
> features/tooling/etc. without having to link them to Solr or Lucene
> releases.
> >>>
> >>> An important thing here is to understand that if it comes from under
> the same umbrella, it should be treated with the same care and respect - at
> least we should attempt to.
> >>>
> >>>> Q: Is it "okay" to release new Solr versions that break any of these
> external contribs? Knowingly or unknowingly -- does it matter?
> >>>
> >>> I think it's really important to understand that breaking compat here
> should be a well thought off thing, especially as that's the
> differentiating factor for code that resides under the project vs. external
> repos. It doesn't mean that compat breaks can't happen, it's just that
> there would be more responsibility to providing a smooth upgrade path for
> users in case of compat breaks.
> >>>
> >>> From my perspective, the code in the external repos here would be just
> like the code in the core repo, just with a different release cadence.
> >>>
> >>>> Q: Would contribs be treated as first class citizens in the Solr
> Reference Guide (they are still in the ASF after all), or would they be
> banished like the DIH was?
> >>>
> >>> The repos are supposed to grow, and with that, adding more to the
> current ref guide would be just bad user experience. In addition, the
> different release cadence would make it difficult to support documentation
> for the code in these repos via the ref guide that would be released with
> the core. We should certainly aim for the same quality of documentation,
> but not make it to be a part of the ref guide.
> >>>
> >>>
> >>>
> >>> On Sat, Nov 14, 2020 at 8:54 PM David Smiley <dsmiley@apache.org>
> wrote:
> >>>>
> >>>> Thanks for shining a spotlight on this Mike.
> >>>> I have some questions to consider. I'll call these additional repos,
> "external contribs", or just contribs for short here; perhaps our internal
> contribs would migrate.
> >>>>
> >>>> Q: Would each contrib be released at its own cadence unrelated to
> Solr? I suppose so.
> >>>> Q: Would each contrib have it's own release vote? I suppose so, as
> it has its own artifact. I think the ASF requires this.
> >>>> Q: Is it "okay" to release new Solr versions that break any of these
> external contribs? Knowingly or unknowingly -- does it matter?
> >>>> Q: What technical work is needed to extricate an internal contrib to
> an external?
> >>>> * source control history. (note: i've done this git history in a
> single folder extraction before, with a popular Stackoverflow answer)
> >>>> * mandatory ASF files, e.g. license, notice
> >>>> * more files that we may want: CHANGES.txt
> >>>> * More build files; copying the rules/setup/standards of the Solr
> mothership and will become divergent over time no doubt. Or just KISS
> principle; no sharing; simple Maven projects.
> >>>> Q: Could & should many contribs live in one repo (no more internal
> contribs), yet each still have its own release cycle? This could make
> sharing build infrastructure easier, and detecting Solr compatibility with
> them easier. Although it would mean sharing GitHub project area, thus
> sharing issues/PRs.
> >>>> Q: Should we create a separate JIRA for these contribs... or ditch
> JIRA entirely for them, relying on GitHub alone?
> >>>> Q: Would contribs be treated as first class citizens in the Solr
> Reference Guide (they are still in the ASF after all), or would they be
> banished like the DIH was?
> >>>>
> >>>> ~ David Smiley
> >>>> Apache Lucene/Solr Search Developer
> >>>> http://www.linkedin.com/in/davidwsmiley
> >>>>
> >>>>
> >>>> On Thu, Nov 12, 2020 at 6:40 PM Mike Drob <mdrob@apache.org> wrote:
> >>>>>
> >>>>> Solr Devs,
> >>>>>
> >>>>> We've slowly been moving into a multi-repository model, and I wanted
> to bring some more attention to it and have a more focused discussion.
> We've recently embarked upon the acceptance of solr-operator as a distinct
> repo[1] under the care of the Lucene (soon to be Solr) PMC. I expect that
> there will be more cases of this as we transition additional contribs out
> of core, or as more plugins, packages, and integrations mature. Some will
> make sense as externally maintained code bases, but I believe other
> contributions may benefit our community more as part of the Apache
> Foundation.
> >>>>>
> >>>>> I think there was a very insightful comment[2] made by GP regarding
> adopting a similar model to Apache Commons governance, bringing attention
> to it here because I fear it may have gotten lost deep in the thread. Based
> on observations of Commons and a few other Apache projects with multi-repo
> setups, there thankfully does not appear to be a limit on how many
> repositories a PMC can maintain. The size and scope of each individual
> repository can vary greatly. I see potential ideas for anything that could
> be standalone and not tied to a release cycle (Admin UI, DIH, etc...), or
> anything that bridges integrations between Solr and other systems (k8s,
> HDFS, etc...).
> >>>>>
> >>>>> The risks that new repos face are similar to the risks they would
> have encountered as contrib modules, but I don't think they should dissuade
> us. Each project would need to start with a champion or sponsor and a
> discussion on the mailing list. From there, we can vote to accept the code,
> or just the idea if there is no code yet, as a community and create the
> repo. As part of a natural lifecycle, if there's not enough momentum or
> adoption over time, then we can update the README and docs and "retire"
> certain projects. The exact mechanisms can be undetermined for now; maybe
> it's a repo rename, maybe it's marking the repo read-only, maybe it's
> something else.
> >>>>>
> >>>>> The Commons model is that everyone is a committer on everything.
> There are other governance models, like Hadoop, with "area committers" who
> are limited to the specific repositories they have contributed frequently
> to. I'm not sure which model ultimately suits us better, but I think that
> leveraging area committers would allow us to recognize and empower
> contributors sooner and more frequently. Releases would still need to be
> voted on and approved by the singular PMC.
> >>>>>
> >>>>> There's no real action items here, it's more of a discussion prompt.
> If it looks like we have general consensus to this approach, then I'll
> start putting together individual proposals for a few repos to exercise the
> process and get more contributions going. I'll probably put the proposals
> together even if there's no replies here, but I'd much rather have some
> acknowledgement from the community that I'm headed in a sustainable
> direction!
> >>>>>
> >>>>> Mike
> >>>>>
> >>>>> [1]:
> https://lists.apache.org/thread.html/rb90f530155dc6edc6f1ccd5f056db1618142fdfcbd32d83f539d984b%40%3Cdev.lucene.apache.org%3E
> >>>>> [2]:
> https://lists.apache.org/thread.html/r9965cb693369d927a942f805c134bfeb45c5e80f447ad0fe2f663fae%40%3Cdev.lucene.apache.org%3E
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>