Yes, I agree these are what is about (despite the divergence into
locking).
As I see, it the question is about whether we should try to do major
releases on the order of a year, rather than the current 2+ year
schedule and also how to best handle bad behavior when producing
tokens that previous applications rely on.
On the first case, we said we would try to do minor releases more
frequently (on the order of once a quarter) in the past, but this, so
far hasn't happened. However, it has only been one release, and it
did have a lot of big changes that warranted longer testing. I do
agree with Michael M. that we have done a good job of keeping back
compatibility. I still don't know if trying to clean out deprecations
once a year puts some onerous task on people when it comes to
upgrading as opposed to doing every two years. Do people really have
code that they never compile or work on in over a year? If they do,
do they care about upgrading? It clearly means they are happy w/
Lucene and don't need any bug fixes. I can understand this being a
bigger issue if it were on the order of every 6 months or less, but
that isn't what I am proposing. I guess my suggestion would be that
we try to get back onto the once a quarter release goal, which will
more than likely lead to a major release in the 1-1.5 year time
frame. That being said, I am fine with maintaining the status quo
concerning back. compatibility as I think those arguments are
compelling. On the interface thing, I wish there was a @introducing
annotation that could announce the presence of a new method and would
give a warning up until the version specified is met, at which point
it would break the compile, but I realize the semantics of that are
pretty weird, so...
As for the other issue concerning things like token issues, I think it
is reasonable to fix the bug and just let people know it will change
indexing, but try to allow for the old way if it is not to onerous.
Chances are most people aren't even aware of it, and thus telling them
about may actually cause them to consider it. For things like
maxFieldLength, etc. then back compat. is a reasonable thing to
preserve.
Cheers,
Grant
On Jan 23, 2008, at 6:24 PM, DM Smith wrote:
> Top posting because this is a response to the thread as a whole.
>
> It appears that this thread has identified some different reasons
> for "needing" to break compatibility:
> 1) A current behavior is now deemed bad or wrong. Examples: the
> silent truncation of large documents or an analyzer that works
> incorrectly.
> 2) Performance tuning such as seen in Token, allowing reuse.
> 3) Support of a new language feature, e.g. generics, that make the
> code "better".
> 4) A new feature requires a change to the existing API.
>
> Perhaps there were others? Maybe specifics are in Jira.
>
> It seems to me that the Lucene developers have done an excellent job
> at figuring out how to maintain compatibility. This is a testament
> to how well grounded the design of the API actually is, from early
> on and even now. And changes seem to be well thought out, well
> designed and carefully implemented.
>
> I think that when it really gets down to it, the Lucene API will
> stay very stable because of this.
>
> On a side note, the cLucene project seems to be languishing (still
> trying to get to 2.0) and any stability of the API is a good thing
> for it. And perhaps for the other "ports" as well.
>
> Again many thanks for all your hard work,
> DM Smith, a thankful "parasite" :)
>
> On Jan 23, 2008, at 5:16 PM, Michael McCandless wrote:
>
>>
>> chris Hostetter wrote:
>>
>>>
>>> : I do like the idea of a static/system property to match legacy
>>> : behavior. For example, the bugs around how StandardTokenizer
>>> : mislabels tokens (eg LUCENE-1100), this would be the perfect
>>> solution.
>>> : Clearly those are silly bugs that should be fixed, quickly, with
>>> this
>>> : back-compatible mode to keep the bug in place.
>>> :
>>> : We might want to, instead, have ctors for many classes take a
>>> required
>>> : arg which states the version of Lucene you are using? So if you
>>> are
>>> : writing a new app you would pass in the current version. Then, on
>>> : dropping in a future Lucene JAR, we could use that arg to
>>> enforce the
>>> : right backwards compatibility. This would save users from
>>> having to
>>> : realize they are hitting one of these situations and then know
>>> to go
>>> : set the right static/property to retain the buggy behavior.
>>>
>>> I'm not sure that this would be better though ... when i write my
>>> code, i
>>> pass "2.3" to all these constructors (or factory methods) and then
>>> later i
>>> want to upgrade to 2.3 to get all the new performance goodness ... i
>>> shouldn't have to change all those constructor calls to get all
>>> the 2.4
>>> goodness, i should be able to leave my code as is -- but if i do
>>> that,
>>> then i might not get all the 2.4 goodness, (like improved
>>> tokenization, or more precise segment merging) because some of that
>>> goodness violates previous assumptions that some code might have
>>> had ...
>>> my code doesn't have those assumptions, i know nothing about them,
>>> i'll
>>> take whatever behavior the Lucene Developers recommend (unless i see
>>> evidence that it breaks something, in which case i'll happily set a
>>> system property or something that the release notes say will force
>>> the
>>> old behavior.
>>>
>>> The basic principle being: by default, give users the behavior
>>> that is
>>> generally viewed as "correct" -- but give them the option to force
>>> "uncorrect" legacy behavior.
>>
>> OK, I agree: the vast majority of users upgrading would in fact
>> want all of the changes in the new release. And then the rare user
>> who is affected by that bug fix to StandardTokenizer would have to
>> set the compatibility mode. So it makes sense for you to get all
>> changes on upgrading (and NOT specify the legacy version in all
>> ctors).
>>
>>> : Also, backporting is extremely costly over time. I'd much
>>> rather keep
>>> : compatibility for longer on our forward releases, than spend our
>>> : scarce resources moving changes back.
>>>
>>> +1
>>>
>>> : So to summarize ... I think we should have (keep) a high
>>> tolerance for
>>> : cruft to maintain API compatibility. I think our current approach
>>> : (try hard to keep compatibility during "minor" releases, then
>>> : deprecate, then remove APIs on a major release; do major
>>> releases only
>>> : when truly required) is a good one.
>>>
>>> i'm with you for the most part, it's just the defintion of "when
>>> truly
>>> required" that tends to hang people up ... there's a chicken vs egg
>>> problem of deciding wether the code should drive what the next
>>> release
>>> number is: "i've added a bitch'n feature but it requires adding a
>>> method
>>> to an interface, therefor the next release must be called 4.0" ...
>>> vs the
>>> mindset that "we just had a 3.0 release, it's too soon for another
>>> major
>>> release, the next release should be called 3.1, so we need to hold
>>> off on
>>> commiting non backwards compatible changes for a while."
>>>
>>> I'm in the first camp: version numbers should be descriptive,
>>> information
>>> carrying, labels for releases -- but the version number of a release
>>> should be dicated by the code contained in that release. (if that
>>> means
>>> the next version after 3.0.0 is 4.0.0, then so be it.)
>>
>> Well, I am weary of doing major releases too often. Though I do
>> agree that the version number should be a "fastmatch" for reading
>> through CHANGES.txt.
>>
>> Say we do this, and zoom forward 2 years when we're up to 6.0, then
>> poor users stuck on 1.9 will dread upgrading, but probably shouldn't.
>>
>> One of the amazing things about Lucene, to me, is how many really
>> major changes we have been able to make while not in fact breaking
>> backwards compatibility (too much). Being very careful not to make
>> things public, intentionally not committing to things like exactly
>> when does a flush or commit or merge actually happen, marking new
>> APIs as experimental and freely subject to change, using abstract
>> classes not interfaces, are all wonderful tools that Lucene employs
>> (and should continue to do so), to enable sizable changes in the
>> future while keeping backwards compatibility.
>>
>> Allowing for future backwards compatibility is one of the most
>> important things we all do when we make changes to Lucene!
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
locking).
As I see, it the question is about whether we should try to do major
releases on the order of a year, rather than the current 2+ year
schedule and also how to best handle bad behavior when producing
tokens that previous applications rely on.
On the first case, we said we would try to do minor releases more
frequently (on the order of once a quarter) in the past, but this, so
far hasn't happened. However, it has only been one release, and it
did have a lot of big changes that warranted longer testing. I do
agree with Michael M. that we have done a good job of keeping back
compatibility. I still don't know if trying to clean out deprecations
once a year puts some onerous task on people when it comes to
upgrading as opposed to doing every two years. Do people really have
code that they never compile or work on in over a year? If they do,
do they care about upgrading? It clearly means they are happy w/
Lucene and don't need any bug fixes. I can understand this being a
bigger issue if it were on the order of every 6 months or less, but
that isn't what I am proposing. I guess my suggestion would be that
we try to get back onto the once a quarter release goal, which will
more than likely lead to a major release in the 1-1.5 year time
frame. That being said, I am fine with maintaining the status quo
concerning back. compatibility as I think those arguments are
compelling. On the interface thing, I wish there was a @introducing
annotation that could announce the presence of a new method and would
give a warning up until the version specified is met, at which point
it would break the compile, but I realize the semantics of that are
pretty weird, so...
As for the other issue concerning things like token issues, I think it
is reasonable to fix the bug and just let people know it will change
indexing, but try to allow for the old way if it is not to onerous.
Chances are most people aren't even aware of it, and thus telling them
about may actually cause them to consider it. For things like
maxFieldLength, etc. then back compat. is a reasonable thing to
preserve.
Cheers,
Grant
On Jan 23, 2008, at 6:24 PM, DM Smith wrote:
> Top posting because this is a response to the thread as a whole.
>
> It appears that this thread has identified some different reasons
> for "needing" to break compatibility:
> 1) A current behavior is now deemed bad or wrong. Examples: the
> silent truncation of large documents or an analyzer that works
> incorrectly.
> 2) Performance tuning such as seen in Token, allowing reuse.
> 3) Support of a new language feature, e.g. generics, that make the
> code "better".
> 4) A new feature requires a change to the existing API.
>
> Perhaps there were others? Maybe specifics are in Jira.
>
> It seems to me that the Lucene developers have done an excellent job
> at figuring out how to maintain compatibility. This is a testament
> to how well grounded the design of the API actually is, from early
> on and even now. And changes seem to be well thought out, well
> designed and carefully implemented.
>
> I think that when it really gets down to it, the Lucene API will
> stay very stable because of this.
>
> On a side note, the cLucene project seems to be languishing (still
> trying to get to 2.0) and any stability of the API is a good thing
> for it. And perhaps for the other "ports" as well.
>
> Again many thanks for all your hard work,
> DM Smith, a thankful "parasite" :)
>
> On Jan 23, 2008, at 5:16 PM, Michael McCandless wrote:
>
>>
>> chris Hostetter wrote:
>>
>>>
>>> : I do like the idea of a static/system property to match legacy
>>> : behavior. For example, the bugs around how StandardTokenizer
>>> : mislabels tokens (eg LUCENE-1100), this would be the perfect
>>> solution.
>>> : Clearly those are silly bugs that should be fixed, quickly, with
>>> this
>>> : back-compatible mode to keep the bug in place.
>>> :
>>> : We might want to, instead, have ctors for many classes take a
>>> required
>>> : arg which states the version of Lucene you are using? So if you
>>> are
>>> : writing a new app you would pass in the current version. Then, on
>>> : dropping in a future Lucene JAR, we could use that arg to
>>> enforce the
>>> : right backwards compatibility. This would save users from
>>> having to
>>> : realize they are hitting one of these situations and then know
>>> to go
>>> : set the right static/property to retain the buggy behavior.
>>>
>>> I'm not sure that this would be better though ... when i write my
>>> code, i
>>> pass "2.3" to all these constructors (or factory methods) and then
>>> later i
>>> want to upgrade to 2.3 to get all the new performance goodness ... i
>>> shouldn't have to change all those constructor calls to get all
>>> the 2.4
>>> goodness, i should be able to leave my code as is -- but if i do
>>> that,
>>> then i might not get all the 2.4 goodness, (like improved
>>> tokenization, or more precise segment merging) because some of that
>>> goodness violates previous assumptions that some code might have
>>> had ...
>>> my code doesn't have those assumptions, i know nothing about them,
>>> i'll
>>> take whatever behavior the Lucene Developers recommend (unless i see
>>> evidence that it breaks something, in which case i'll happily set a
>>> system property or something that the release notes say will force
>>> the
>>> old behavior.
>>>
>>> The basic principle being: by default, give users the behavior
>>> that is
>>> generally viewed as "correct" -- but give them the option to force
>>> "uncorrect" legacy behavior.
>>
>> OK, I agree: the vast majority of users upgrading would in fact
>> want all of the changes in the new release. And then the rare user
>> who is affected by that bug fix to StandardTokenizer would have to
>> set the compatibility mode. So it makes sense for you to get all
>> changes on upgrading (and NOT specify the legacy version in all
>> ctors).
>>
>>> : Also, backporting is extremely costly over time. I'd much
>>> rather keep
>>> : compatibility for longer on our forward releases, than spend our
>>> : scarce resources moving changes back.
>>>
>>> +1
>>>
>>> : So to summarize ... I think we should have (keep) a high
>>> tolerance for
>>> : cruft to maintain API compatibility. I think our current approach
>>> : (try hard to keep compatibility during "minor" releases, then
>>> : deprecate, then remove APIs on a major release; do major
>>> releases only
>>> : when truly required) is a good one.
>>>
>>> i'm with you for the most part, it's just the defintion of "when
>>> truly
>>> required" that tends to hang people up ... there's a chicken vs egg
>>> problem of deciding wether the code should drive what the next
>>> release
>>> number is: "i've added a bitch'n feature but it requires adding a
>>> method
>>> to an interface, therefor the next release must be called 4.0" ...
>>> vs the
>>> mindset that "we just had a 3.0 release, it's too soon for another
>>> major
>>> release, the next release should be called 3.1, so we need to hold
>>> off on
>>> commiting non backwards compatible changes for a while."
>>>
>>> I'm in the first camp: version numbers should be descriptive,
>>> information
>>> carrying, labels for releases -- but the version number of a release
>>> should be dicated by the code contained in that release. (if that
>>> means
>>> the next version after 3.0.0 is 4.0.0, then so be it.)
>>
>> Well, I am weary of doing major releases too often. Though I do
>> agree that the version number should be a "fastmatch" for reading
>> through CHANGES.txt.
>>
>> Say we do this, and zoom forward 2 years when we're up to 6.0, then
>> poor users stuck on 1.9 will dread upgrading, but probably shouldn't.
>>
>> One of the amazing things about Lucene, to me, is how many really
>> major changes we have been able to make while not in fact breaking
>> backwards compatibility (too much). Being very careful not to make
>> things public, intentionally not committing to things like exactly
>> when does a flush or commit or merge actually happen, marking new
>> APIs as experimental and freely subject to change, using abstract
>> classes not interfaces, are all wonderful tools that Lucene employs
>> (and should continue to do so), to enable sizable changes in the
>> future while keeping backwards compatibility.
>>
>> Allowing for future backwards compatibility is one of the most
>> important things we all do when we make changes to Lucene!
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org