Mailing List Archive

Query about the GitHub statistics for Lucene
Hi,

In preparation for the project’s upcoming ASF board report, I came across and reported [1] an issue with the GH statistics, available at: https://reporter.apache.org/wizard/statistics?lucene

It appears that there is no GH activity for 2024! Clearly this is incorrect. I’ve yet to track down what’s going on with this. Familiar to anyone here?

@Mike. Would it be possible to add a “Past 3 months” to https://githubsearch.mikemccandless.com/search.py ? Which would be helpful when reporting.

-Chris

[1] https://lists.apache.org/thread/78fh8hb68zybbkz63odb0hzohgrddzkq
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Query about the GitHub statistics for Lucene [ In reply to ]
On Tue, Mar 5, 2024 at 4:50?AM Chris Hegarty
<christopher.hegarty@elastic.co.invalid> wrote:
> It appears that there is no GH activity for 2024! Clearly this is incorrect. I’ve yet to track down what’s going on with this. Familiar to anyone here?
>

Last time I looked at this, it appeared it is looking at the incorrect
github repositories, for example https://github.com/apache/lucene-solr
and not https://github.com/apache/lucene

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Query about the GitHub statistics for Lucene [ In reply to ]
> On 5 Mar 2024, at 13:26, Robert Muir <rcmuir@gmail.com> wrote:
>
> On Tue, Mar 5, 2024 at 4:50?AM Chris Hegarty
> <christopher.hegarty@elastic.co.invalid> wrote:
>> It appears that there is no GH activity for 2024! Clearly this is incorrect. I’ve yet to track down what’s going on with this. Familiar to anyone here?
>>
>
> Last time I looked at this, it appeared it is looking at the incorrect
> github repositories, for example https://github.com/apache/lucene-solr
> and not https://github.com/apache/lucene

Ah, that could explain it!!

I’ll try to take a look at what repo those report stats are generated from, and how we might be able to get that updated. Mostly for convenience, and also having a single source of truth.

Anyway, thankfully git and GH are good enough to get the kind of basic stats we typically want - just that it’s not as clear when comparing to previously gathered stats. Well… commits are commits, and counting PRs should not result in different numbers, but you know ... ;-)

Thanks,
-Chris.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Query about the GitHub statistics for Lucene [ In reply to ]
> On 5 Mar 2024, at 13:26, Robert Muir <rcmuir@gmail.com> wrote:
>
> On Tue, Mar 5, 2024 at 4:50?AM Chris Hegarty
> <christopher.hegarty@elastic.co.invalid> wrote:
>> It appears that there is no GH activity for 2024! Clearly this is incorrect. I’ve yet to track down what’s going on with this. Familiar to anyone here?
>>
>
> Last time I looked at this, it appeared it is looking at the incorrect
> github repositories, for example https://github.com/apache/lucene-solr
> and not https://github.com/apache/lucene

Ah, that could explain it!!

I’ll try to take a look at what repo those report stats are generated from, and how we might be able to get that updated. Mostly for convenience, and also having a single source of truth.

Anyway, thankfully git and GH are good enough to get the kind of basic stats we typically want - just that it’s not as clear when comparing to previously gathered stats. Well… commits are commits, and counting PRs should not result in different numbers, but you know ... ;-)

Thanks,
-Chris.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Query about the GitHub statistics for Lucene [ In reply to ]
Perhaps this is what you meant by 'gh' but wanted to mention it -
https://github.com/apache/lucene/pulse/monthly

On Tue, Mar 5, 2024 at 4:34?PM Chris Hegarty
<christopher.hegarty@elastic.co.invalid> wrote:

>
> > On 5 Mar 2024, at 13:26, Robert Muir <rcmuir@gmail.com> wrote:
> >
> > On Tue, Mar 5, 2024 at 4:50?AM Chris Hegarty
> > <christopher.hegarty@elastic.co.invalid> wrote:
> >> It appears that there is no GH activity for 2024! Clearly this is
> incorrect. I’ve yet to track down what’s going on with this. Familiar to
> anyone here?
> >>
> >
> > Last time I looked at this, it appeared it is looking at the incorrect
> > github repositories, for example https://github.com/apache/lucene-solr
> > and not https://github.com/apache/lucene
>
> Ah, that could explain it!!
>
> I’ll try to take a look at what repo those report stats are generated
> from, and how we might be able to get that updated. Mostly for convenience,
> and also having a single source of truth.
>
> Anyway, thankfully git and GH are good enough to get the kind of basic
> stats we typically want - just that it’s not as clear when comparing to
> previously gathered stats. Well… commits are commits, and counting PRs
> should not result in different numbers, but you know ... ;-)
>
> Thanks,
> -Chris.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
Re: Query about the GitHub statistics for Lucene [ In reply to ]
On Tue, Mar 5, 2024 at 4:49?AM Chris Hegarty <christopher.hegarty@elastic.co>
wrote:

In preparation for the project’s upcoming ASF board report, I came across
> and reported [1] an issue with the GH statistics, available at:
> https://reporter.apache.org/wizard/statistics?lucene
>
> It appears that there is no GH activity for 2024! Clearly this is
> incorrect. I’ve yet to track down what’s going on with this. Familiar to
> anyone here?


There is a long-standing INFRA issue about this. Lemme try to locate it
...

@Mike. Would it be possible to add a “Past 3 months” to
> https://githubsearch.mikemccandless.com/search.py ? Which would be
> helpful when reporting.
>

Good idea! Done!
https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated&dd=status%3AOpen&dd=updated%3APast+3+months

Mike McCandless

http://blog.mikemccandless.com
Re: Query about the GitHub statistics for Lucene [ In reply to ]
Found the prior discussion/issue:
https://lists.apache.org/thread/fhzw0y7kpnf48cxfml8t0313sdswdv6b

And a prior prior discussion:
https://lists.apache.org/thread/6rsr8v982fjqgyopprqzw057cpzfnz3z

Issue: https://issues.apache.org/jira/browse/COMDEV-425. Jan seemed to get
close to fixing the (regexp?) bug!

Mike McCandless

http://blog.mikemccandless.com


On Tue, Mar 5, 2024 at 1:03?PM Michael McCandless <lucene@mikemccandless.com>
wrote:

>
> On Tue, Mar 5, 2024 at 4:49?AM Chris Hegarty <
> christopher.hegarty@elastic.co> wrote:
>
> In preparation for the project’s upcoming ASF board report, I came across
>> and reported [1] an issue with the GH statistics, available at:
>> https://reporter.apache.org/wizard/statistics?lucene
>>
>> It appears that there is no GH activity for 2024! Clearly this is
>> incorrect. I’ve yet to track down what’s going on with this. Familiar to
>> anyone here?
>
>
> There is a long-standing INFRA issue about this. Lemme try to locate it
> ...
>
> @Mike. Would it be possible to add a “Past 3 months” to
>> https://githubsearch.mikemccandless.com/search.py ? Which would be
>> helpful when reporting.
>>
>
> Good idea! Done!
> https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated&dd=status%3AOpen&dd=updated%3APast+3+months
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
Re: Query about the GitHub statistics for Lucene [ In reply to ]
Hi,

Seems that I’ve fallen into the newbie PMC Chair rabbit hole! ;-) - the reporting tool has long standing issues. Maybe they’re fixable, maybe not, but it’s possible we don’t necessarily need it now.

> On 5 Mar 2024, at 18:22, Michael McCandless <lucene@mikemccandless.com> wrote:
>
> ...
> @Mike. Would it be possible to add a “Past 3 months” to https://githubsearch.mikemccandless.com/search.py ? Which would be helpful when reporting.
>
> Good idea! Done! https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated&dd=status%3AOpen&dd=updated%3APast+3+months

Cool. Thanks.

The stats I’m trying to retrieve are for PRs created in the past 3 months. GitHub allows me to get that with:
https://github.com/apache/lucene/pulls?q=is%3Apr+created%3A%3E2023-12-05

, which (when run today) shows: PRs - 36 Open 163 Closed

Another interesting stat is PRs UPDATED in the past 3 months, e.g.
https://github.com/apache/lucene/pulls?q=is%3Apr+updated%3A%3E2023-12-05+
~355 PRs updated.
( which we can also see from Mike’s githubsearch [1])

@Mike is it possible to add “created since” filter?

Another very rough approximation of activity / health is commits, e.g.

$ git log --pretty='format:%cd' --since='3 months ago' | wc -l
244
$ git log --all --pretty='format:%cd' --since='3 months ago' | wc -l
472

So 472 commits on all branches in the past 3 months.

-Chris

[1] https://githubsearch.mikemccandless.com/search.py?chg=du&text=&a1=status&a2=undefined&page=0&searcher=29577&sort=recentlyUpdated&format=list&id=uzz5ht9buk98&dd=status%3AOpen&dd=updated%3APast+3+months&dd=issue_or_pr%3APR&newText=


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Query about the GitHub statistics for Lucene [ In reply to ]
On Wed, Mar 6, 2024 at 4:41?AM Chris Hegarty <christopher.hegarty@elastic.co>
wrote:

Seems that I’ve fallen into the newbie PMC Chair rabbit hole! ;-) - the
> reporting tool has long standing issues. Maybe they’re fixable, maybe not,
> but it’s possible we don’t necessarily need it now.
>

Sorry :) Seems to be a rite-of-passage at this point! It should be
mentioned in the handover instructions... or, we should simply merge Daniel
Gruno's one-line fix to the regexp that Kibble/Whimsy/reporter tool uses:
https://issues.apache.org/jira/browse/COMDEV-425?focusedCommentId=17823767&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17823767

@Mike is it possible to add “created since” filter?
>

Ahh good idea, done!
https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated&dd=created%3APast+3+months&dd=issue_or_pr%3APR
(this is PRs created in the Past 3 months ... it shows 36 open and 162
closed right now, close to the GitHub counts you found).

Here's the luceneserver commit that adds it:
https://github.com/mikemccand/luceneserver/commit/397942573bed3e2c4fd00ab0a324a19fd014bfd4

Mike McCandless

http://blog.mikemccandless.com
Re: Query about the GitHub statistics for Lucene [ In reply to ]
Hi Mike,

> On 6 Mar 2024, at 10:47, Michael McCandless <lucene@mikemccandless.com> wrote:
>
> On Wed, Mar 6, 2024 at 4:41?AM Chris Hegarty <christopher.hegarty@elastic.co> wrote:
>
> Seems that I’ve fallen into the newbie PMC Chair rabbit hole! ;-) - the reporting tool has long standing issues. Maybe they’re fixable, maybe not, but it’s possible we don’t necessarily need it now.
>
> Sorry :) Seems to be a rite-of-passage at this point!

Ha! Just happy that I’m not alone on this.

> It should be mentioned in the handover instructions... or, we should simply merge Daniel Gruno's one-line fix to the regexp that Kibble/Whimsy/reporter tool uses: https://issues.apache.org/jira/browse/COMDEV-425?focusedCommentId=17823767&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17823767

That would be great, but I’m not sure why it’s not been done before at this point. I’ll add a note to future handover instructions if it cannot be resolved.

> @Mike is it possible to add “created since” filter?
>
> Ahh good idea, done! https://githubsearch.mikemccandless.com/search.py?sort=recentlyUpdated&dd=created%3APast+3+months&dd=issue_or_pr%3APR (this is PRs created in the Past 3 months ... it shows 36 open and 162 closed right now, close to the GitHub counts you found).

This looks right, thanks. I think we can use Githubsearch going forward. :-)

> Here's the luceneserver commit that adds it: https://github.com/mikemccand/luceneserver/commit/397942573bed3e2c4fd00ab0a324a19fd014bfd4

Thank you,
-Chris.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Query about the GitHub statistics for Lucene [ In reply to ]
Hi,

Yes, we should contact INFRA so they get all the repository links
uptodate. They should maybe send us a list of tracked repos/issue
trackers for us to review. There were also some crazy things like the
temporary repository, that we used to migrate our issues from JIRA to
Github, be used for statistics, but NOT the apache/lucene one.

The statistics for JIRA are clearly wrong, too. The last change in JIRA
was Aug 19, 2022.

Uwe

Am 05.03.2024 um 14:26 schrieb Robert Muir:
> On Tue, Mar 5, 2024 at 4:50?AM Chris Hegarty
> <christopher.hegarty@elastic.co.invalid> wrote:
>> It appears that there is no GH activity for 2024! Clearly this is incorrect. I’ve yet to track down what’s going on with this. Familiar to anyone here?
>>
> Last time I looked at this, it appeared it is looking at the incorrect
> github repositories, for example https://github.com/apache/lucene-solr
> and not https://github.com/apache/lucene
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org