Mailing List Archive

Code longevity statistics
Hi all,

I came across an interesting article [1] on patterns of code evolution. I
got
curious, ran their analysis on the Lucene repo, and produced a breakdown of
lines of code per year and the chance of a line of code still existing after
5 years (see attached images).

I think the big drop in the first plot corresponds to Solr moving to its own
project. Not sure about the other jumps - maybe someone else has insight
there.
The second plot shows that there is relatively little churn in Lucene; 45%
of
the code written 5 years ago is still around. For many modern projects, this
curve is a steeper exponential. In Lucene, it's closer to linear. The
article
argues that this could point to better design and more modularity, which
makes
it so code isn't rewritten much.

Just a fun thing to share!

Stefan

[1] https://erikbern.com/2016/12/05/the-half-life-of-code.html
Re: Code longevity statistics [ In reply to ]
IMO another factor is leaving stuff around because it takes effort to
remove old things, effort that isn't fun like making claims to remove
something that someone else may still like/use, and soliciting users "hey,
is XYZ used?". No fun. Lucene has several roughly equivalent ways to do
the same thing (like in highlighting).

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Sep 27, 2023 at 1:35?PM Stefan Vodita <stefan.vodita@gmail.com>
wrote:

> Hi all,
>
> I came across an interesting article [1] on patterns of code evolution. I
> got
> curious, ran their analysis on the Lucene repo, and produced a breakdown of
> lines of code per year and the chance of a line of code still existing
> after
> 5 years (see attached images).
>
> I think the big drop in the first plot corresponds to Solr moving to its
> own
> project. Not sure about the other jumps - maybe someone else has insight
> there.
> The second plot shows that there is relatively little churn in Lucene; 45%
> of
> the code written 5 years ago is still around. For many modern projects,
> this
> curve is a steeper exponential. In Lucene, it's closer to linear. The
> article
> argues that this could point to better design and more modularity, which
> makes
> it so code isn't rewritten much.
>
> Just a fun thing to share!
>
> Stefan
>
> [1] https://erikbern.com/2016/12/05/the-half-life-of-code.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org