Mailing List Archive

Who has access to Google Analytics for Lucene site?
Hi,

Who has access to the Lucene site GA account? If it is dead in the waters, I'd like to setup a new one also for Lucene.

I plan to publish the new web sites today, would be nice to track and graph the traffic ramp-up.

Jan
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
Lets avoid GA for our website. I don't think it is a good idea to give a
private company (Google Inc) data about our website traffic and their IP
addresses etc.

On Wed, Mar 3, 2021 at 1:40 PM Jan Høydahl <jan.asf@cominvent.com> wrote:

> Hi,
>
> Who has access to the Lucene site GA account? If it is dead in the waters,
> I'd like to setup a new one also for Lucene.
>
> I plan to publish the new web sites today, would be nice to track and
> graph the traffic ramp-up.
>
> Jan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
Lucene/Solr website has "always" had a GA script. And I believe the reason was to understand download number estimates for Solr, which we don't get from ASF.
I think we can proceed with GA for a few months to learn about the Solr migration and any problems users may have with the site. Then discontinue it and fall back to old boring httpd stats like this https://uls.apache.org/exports/lucene.apache.org.yaml

The question was - who has acces to current website stats at GA?

Jan

> 3. mar. 2021 kl. 10:55 skrev Ishan Chattopadhyaya <ichattopadhyaya@gmail.com>:
>
> Lets avoid GA for our website. I don't think it is a good idea to give a private company (Google Inc) data about our website traffic and their IP addresses etc.
>
> On Wed, Mar 3, 2021 at 1:40 PM Jan Høydahl <jan.asf@cominvent.com <mailto:jan.asf@cominvent.com>> wrote:
> Hi,
>
> Who has access to the Lucene site GA account? If it is dead in the waters, I'd like to setup a new one also for Lucene.
>
> I plan to publish the new web sites today, would be nice to track and graph the traffic ramp-up.
>
> Jan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <mailto:dev-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: dev-help@lucene.apache.org <mailto:dev-help@lucene.apache.org>
>
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
if nobody has access or is looking at the numbers, then we damn sure don't
need this tracking shit

On Wed, Mar 3, 2021 at 6:15 AM Jan Høydahl <jan.asf@cominvent.com> wrote:

> Lucene/Solr website has "always" had a GA script. And I believe the reason
> was to understand download number estimates for Solr, which we don't get
> from ASF.
> I think we can proceed with GA for a few months to learn about the Solr
> migration and any problems users may have with the site. Then discontinue
> it and fall back to old boring httpd stats like this
> https://uls.apache.org/exports/lucene.apache.org.yaml
>
> The question was - who has acces to current website stats at GA?
>
> Jan
>
> 3. mar. 2021 kl. 10:55 skrev Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com>:
>
> Lets avoid GA for our website. I don't think it is a good idea to give a
> private company (Google Inc) data about our website traffic and their IP
> addresses etc.
>
> On Wed, Mar 3, 2021 at 1:40 PM Jan Høydahl <jan.asf@cominvent.com> wrote:
>
>> Hi,
>>
>> Who has access to the Lucene site GA account? If it is dead in the
>> waters, I'd like to setup a new one also for Lucene.
>>
>> I plan to publish the new web sites today, would be nice to track and
>> graph the traffic ramp-up.
>>
>> Jan
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
Hi,
I have access.
Uwe

Am March 3, 2021 8:10:29 AM UTC schrieb "Jan Høydahl" <jan.asf@cominvent.com>:
>Hi,
>
>Who has access to the Lucene site GA account? If it is dead in the
>waters, I'd like to setup a new one also for Lucene.
>
>I plan to publish the new web sites today, would be nice to track and
>graph the traffic ramp-up.
>
>Jan
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>For additional commands, e-mail: dev-help@lucene.apache.org

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
RE: Who has access to Google Analytics for Lucene site? [ In reply to ]
Hi,



sorry, I just noticed that the account disappeared from my google analytics profile.



It was setup by Grant Ingersoll, maybe he can give us access again. If it is no longer there, we lost the data, but we can recreate one.



Uwe



-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: uwe@thetaphi.de



From: Uwe Schindler <uwe@thetaphi.de>
Sent: Wednesday, March 3, 2021 1:10 PM
To: dev@lucene.apache.org
Subject: Re: Who has access to Google Analytics for Lucene site?



Hi,
I have access.
Uwe

Am March 3, 2021 8:10:29 AM UTC schrieb "Jan Høydahl" <jan.asf@cominvent.com <mailto:jan.asf@cominvent.com> >:

Hi,

Who has access to the Lucene site GA account? If it is dead in the waters, I'd like to setup a new one also for Lucene.

I plan to publish the new web sites today, would be nice to track and graph the traffic ramp-up.

Jan


_____

To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <mailto:dev-unsubscribe@lucene.apache.org>
For additional commands, e-mail: dev-help@lucene.apache.org <mailto:dev-help@lucene.apache.org>


--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
+1 Rob

On Wed, 3 Mar, 2021, 5:55 pm Uwe Schindler, <uwe@thetaphi.de> wrote:

> Hi,
>
>
>
> sorry, I just noticed that the account disappeared from my google
> analytics profile.
>
>
>
> It was setup by Grant Ingersoll, maybe he can give us access again. If it
> is no longer there, we lost the data, but we can recreate one.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Uwe Schindler <uwe@thetaphi.de>
> *Sent:* Wednesday, March 3, 2021 1:10 PM
> *To:* dev@lucene.apache.org
> *Subject:* Re: Who has access to Google Analytics for Lucene site?
>
>
>
> Hi,
> I have access.
> Uwe
>
> Am March 3, 2021 8:10:29 AM UTC schrieb "Jan Høydahl" <
> jan.asf@cominvent.com>:
>
> Hi,
>
> Who has access to the Lucene site GA account? If it is dead in the waters, I'd like to setup a new one also for Lucene.
>
> I plan to publish the new web sites today, would be nice to track and graph the traffic ramp-up.
>
> Jan
>
> ------------------------------
>
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de
>
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
I am offering to look at the numbers, if I can get access.

We can do that for a couple of months and then take it out.

I am not clear whether I negated Rob's position here from the full
propositional logic though, as he used 'or'.

I do agree that there is no point for analytics that is not used. So, let's
use it and have clear picture of its value.

Regards,
Alex

On Wed., Mar. 3, 2021, 7:46 a.m. Ishan Chattopadhyaya, <
ichattopadhyaya@gmail.com> wrote:

> +1 Rob
>
> On Wed, 3 Mar, 2021, 5:55 pm Uwe Schindler, <uwe@thetaphi.de> wrote:
>
>> Hi,
>>
>>
>>
>> sorry, I just noticed that the account disappeared from my google
>> analytics profile.
>>
>>
>>
>> It was setup by Grant Ingersoll, maybe he can give us access again. If it
>> is no longer there, we lost the data, but we can recreate one.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -----
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: uwe@thetaphi.de
>>
>>
>>
>> *From:* Uwe Schindler <uwe@thetaphi.de>
>> *Sent:* Wednesday, March 3, 2021 1:10 PM
>> *To:* dev@lucene.apache.org
>> *Subject:* Re: Who has access to Google Analytics for Lucene site?
>>
>>
>>
>> Hi,
>> I have access.
>> Uwe
>>
>> Am March 3, 2021 8:10:29 AM UTC schrieb "Jan Høydahl" <
>> jan.asf@cominvent.com>:
>>
>> Hi,
>>
>> Who has access to the Lucene site GA account? If it is dead in the waters, I'd like to setup a new one also for Lucene.
>>
>> I plan to publish the new web sites today, would be nice to track and graph the traffic ramp-up.
>>
>> Jan
>>
>> ------------------------------
>>
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>> --
>> Uwe Schindler
>> Achterdiek 19, 28357 Bremen
>> https://www.thetaphi.de
>>
>
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
For sure any analytics are useless if nobody is looking at the numbers :)

But I don't like google analytics myself, I think in 2021 using such third
party tracking/cookies stuff makes our website look cheap and spammy.

It also isn't even reasonably accurate in 2021. If i visit lucene.apache.org,
I don't get counted. Because like ~25% of the general population, I have
adblocker on my device and I see ublock origin preventing the request to
google analytics. I don't know the stats here for *developers* that have
adblocker installed, but it may skew much higher than 25%, since developers
tend to be more technical.


On Wed, Mar 3, 2021 at 8:07 AM Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> I am offering to look at the numbers, if I can get access.
>
> We can do that for a couple of months and then take it out.
>
> I am not clear whether I negated Rob's position here from the full
> propositional logic though, as he used 'or'.
>
> I do agree that there is no point for analytics that is not used. So,
> let's use it and have clear picture of its value.
>
> Regards,
> Alex
>
> On Wed., Mar. 3, 2021, 7:46 a.m. Ishan Chattopadhyaya, <
> ichattopadhyaya@gmail.com> wrote:
>
>> +1 Rob
>>
>> On Wed, 3 Mar, 2021, 5:55 pm Uwe Schindler, <uwe@thetaphi.de> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> sorry, I just noticed that the account disappeared from my google
>>> analytics profile.
>>>
>>>
>>>
>>> It was setup by Grant Ingersoll, maybe he can give us access again. If
>>> it is no longer there, we lost the data, but we can recreate one.
>>>
>>>
>>>
>>> Uwe
>>>
>>>
>>>
>>> -----
>>>
>>> Uwe Schindler
>>>
>>> Achterdiek 19, D-28357 Bremen
>>>
>>> https://www.thetaphi.de
>>>
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>
>>> *From:* Uwe Schindler <uwe@thetaphi.de>
>>> *Sent:* Wednesday, March 3, 2021 1:10 PM
>>> *To:* dev@lucene.apache.org
>>> *Subject:* Re: Who has access to Google Analytics for Lucene site?
>>>
>>>
>>>
>>> Hi,
>>> I have access.
>>> Uwe
>>>
>>> Am March 3, 2021 8:10:29 AM UTC schrieb "Jan Høydahl" <
>>> jan.asf@cominvent.com>:
>>>
>>> Hi,
>>>
>>> Who has access to the Lucene site GA account? If it is dead in the waters, I'd like to setup a new one also for Lucene.
>>>
>>> I plan to publish the new web sites today, would be nice to track and graph the traffic ramp-up.
>>>
>>> Jan
>>>
>>> ------------------------------
>>>
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, 28357 Bremen
>>> https://www.thetaphi.de
>>>
>>
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
Before you look, should we have a betting pool on the number of
downloads/day? I will arrange for a bottle of some excellent liquid to
be sent to the closest guess at the number of redirects to the mirror
sites, as determined by Alexandre. Also, has it been increasing over
the last year? Finally, if we can predict these trends using activity
on the main apache site, maybe we don't need to track independently.

I tried to find stats for our pages, but all I came up with was this:
https://projects.apache.org/statistics.html which actually (at the
very bottom) has a graph showing "Worldwide download mirror activity."
I wonder what data drives that?

-Mike

I have absolutely no idea how many. I'll guess 1000 redirects/day

random question: does GoogleBot accept GA tracking cookies?

On Wed, Mar 3, 2021 at 8:07 AM Alexandre Rafalovitch <arafalov@gmail.com> wrote:
>
> I am offering to look at the numbers, if I can get access.
>
> We can do that for a couple of months and then take it out.
>
> I am not clear whether I negated Rob's position here from the full propositional logic though, as he used 'or'.
>
> I do agree that there is no point for analytics that is not used. So, let's use it and have clear picture of its value.
>
> Regards,
> Alex
>
> On Wed., Mar. 3, 2021, 7:46 a.m. Ishan Chattopadhyaya, <ichattopadhyaya@gmail.com> wrote:
>>
>> +1 Rob
>>
>> On Wed, 3 Mar, 2021, 5:55 pm Uwe Schindler, <uwe@thetaphi.de> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> sorry, I just noticed that the account disappeared from my google analytics profile.
>>>
>>>
>>>
>>> It was setup by Grant Ingersoll, maybe he can give us access again. If it is no longer there, we lost the data, but we can recreate one.
>>>
>>>
>>>
>>> Uwe
>>>
>>>
>>>
>>> -----
>>>
>>> Uwe Schindler
>>>
>>> Achterdiek 19, D-28357 Bremen
>>>
>>> https://www.thetaphi.de
>>>
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>
>>> From: Uwe Schindler <uwe@thetaphi.de>
>>> Sent: Wednesday, March 3, 2021 1:10 PM
>>> To: dev@lucene.apache.org
>>> Subject: Re: Who has access to Google Analytics for Lucene site?
>>>
>>>
>>>
>>> Hi,
>>> I have access.
>>> Uwe
>>>
>>> Am March 3, 2021 8:10:29 AM UTC schrieb "Jan Høydahl" <jan.asf@cominvent.com>:
>>>
>>> Hi,
>>>
>>> Who has access to the Lucene site GA account? If it is dead in the waters, I'd like to setup a new one also for Lucene.
>>>
>>> I plan to publish the new web sites today, would be nice to track and graph the traffic ramp-up.
>>>
>>> Jan
>>>
>>> ________________________________
>>>
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, 28357 Bremen
>>> https://www.thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov <msokolov@gmail.com> wrote:

> Before you look, should we have a betting pool on the number of
> downloads/day? I will arrange for a bottle of some excellent liquid to
> be sent to the closest guess at the number of redirects to the mirror
> sites, as determined by Alexandre. Also, has it been increasing over
> the last year? Finally, if we can predict these trends using activity
> on the main apache site, maybe we don't need to track independently.
>

Why do we even care?

How many users are downloading lucene tgz from the site versus using an
artifact in maven repositories (via maven, gradle, etc)? How many users are
downloading solr tgz from the site versus using solr official image from
docker hub?

I'm just asking these questions to try to understand the need for the
google tracking.
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
I block any analytics I can find. I am with you on the overall positioning.
And yes, the absolute numbers lie.

At the same time, we can get a lot of relative numbers and trends that are
valuable in other ways.

For example:
1) Are the social media announcements of new releases drive people to
download Solr?
2) Which Ref Guide pages (if we had GA there) are most popular and why
can't we convince users to use the latest version instead of 6.6 (looking
at referrals). My specific peeve is that I think URPs page should be a lot
more visible, I would love to see if my assumptions are true by seeing if
people discover that page, relative to other pages.
3) What is the page flow on the website? Are there any pages that are
complete invisible because of how we linked to them? Are there super
popular pages that are completely out of date?
4) Do we have increase or decrease in traffic matching specific events
5) Is there a specific partner/agency site that is driving a lot of
attention to Solr; can we replicate that with others?
6) Do we even count downloads in GA? Because GA is for HTML pages only by
default
7) If any of this is valuable, but we want to pull out GA anyway, this
would help to know what tracking information we would like from Apache
Infra?

In general, these kinds of questions are the domain of Developer
Relationships role. Lucene/Solr project does not have one as such, which
may explain why not many people understand the values of modern analytics
solutions. I am offering my time to make the value of analytics concrete,
so we are making the next decision based on reality rather than our
collective imagination of what analytics actually does or does not.

Regards,
Alex.




On Wed., Mar. 3, 2021, 8:40 a.m. Robert Muir, <rcmuir@gmail.com> wrote:

>
>
> On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov <msokolov@gmail.com> wrote:
>
>> Before you look, should we have a betting pool on the number of
>> downloads/day? I will arrange for a bottle of some excellent liquid to
>> be sent to the closest guess at the number of redirects to the mirror
>> sites, as determined by Alexandre. Also, has it been increasing over
>> the last year? Finally, if we can predict these trends using activity
>> on the main apache site, maybe we don't need to track independently.
>>
>
> Why do we even care?
>
> How many users are downloading lucene tgz from the site versus using an
> artifact in maven repositories (via maven, gradle, etc)? How many users are
> downloading solr tgz from the site versus using solr official image from
> docker hub?
>
> I'm just asking these questions to try to understand the need for the
> google tracking.
>
>
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
I'm not trying to come across as anti-analytics, i'm not. But I feel a lot
of those questions can be answered by the aggregate stats already provided
by apache (presumably from httpd access_log), without adding
privacy-invading-google-tracker javascripts and cookies. So, while your
answers are good, they don't justify google analytics in my eyes.

As an example, lets look at
https://uls.apache.org/exports/lucene.apache.org.yaml and consider your list
1. You can see breakdown of pageviews and "visitors" by day. I don't know
how they determine unique "visitor" since it isn't cookie tracking: maybe
some combo of (IP address, TLS session ID, user agent), but whatever they
have is good enough for me.
2. I can see most popular pages and your 6.6 ref guide stuff
3. Top referrers gives you a rough idea of where people are coming from
(including internal referrers). So people are clicking links on those
pages.
4. see #1.
5. see #3. Google provides no additional magic here, this is referer (sic)
header either way.
6. i think the download process is actually hacked up/convoluted just to
force some GA tracking. At least i know if i disable javascript, the
download buttons still work.
7. what is missing?


On Wed, Mar 3, 2021 at 9:15 AM Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> I block any analytics I can find. I am with you on the overall
> positioning. And yes, the absolute numbers lie.
>
> At the same time, we can get a lot of relative numbers and trends that are
> valuable in other ways.
>
> For example:
> 1) Are the social media announcements of new releases drive people to
> download Solr?
> 2) Which Ref Guide pages (if we had GA there) are most popular and why
> can't we convince users to use the latest version instead of 6.6 (looking
> at referrals). My specific peeve is that I think URPs page should be a lot
> more visible, I would love to see if my assumptions are true by seeing if
> people discover that page, relative to other pages.
> 3) What is the page flow on the website? Are there any pages that are
> complete invisible because of how we linked to them? Are there super
> popular pages that are completely out of date?
> 4) Do we have increase or decrease in traffic matching specific events
> 5) Is there a specific partner/agency site that is driving a lot of
> attention to Solr; can we replicate that with others?
> 6) Do we even count downloads in GA? Because GA is for HTML pages only by
> default
> 7) If any of this is valuable, but we want to pull out GA anyway, this
> would help to know what tracking information we would like from Apache
> Infra?
>
> In general, these kinds of questions are the domain of Developer
> Relationships role. Lucene/Solr project does not have one as such, which
> may explain why not many people understand the values of modern analytics
> solutions. I am offering my time to make the value of analytics concrete,
> so we are making the next decision based on reality rather than our
> collective imagination of what analytics actually does or does not.
>
> Regards,
> Alex.
>
>
>
>
> On Wed., Mar. 3, 2021, 8:40 a.m. Robert Muir, <rcmuir@gmail.com> wrote:
>
>>
>>
>> On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> Before you look, should we have a betting pool on the number of
>>> downloads/day? I will arrange for a bottle of some excellent liquid to
>>> be sent to the closest guess at the number of redirects to the mirror
>>> sites, as determined by Alexandre. Also, has it been increasing over
>>> the last year? Finally, if we can predict these trends using activity
>>> on the main apache site, maybe we don't need to track independently.
>>>
>>
>> Why do we even care?
>>
>> How many users are downloading lucene tgz from the site versus using an
>> artifact in maven repositories (via maven, gradle, etc)? How many users are
>> downloading solr tgz from the site versus using solr official image from
>> docker hub?
>>
>> I'm just asking these questions to try to understand the need for the
>> google tracking.
>>
>>
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
I am opposed, in principle, to letting a private company have access to the
IP addresses and other personally identifiable information of our users.

On Wed, Mar 3, 2021 at 8:11 PM Robert Muir <rcmuir@gmail.com> wrote:

> I'm not trying to come across as anti-analytics, i'm not. But I feel a lot
> of those questions can be answered by the aggregate stats already provided
> by apache (presumably from httpd access_log), without adding
> privacy-invading-google-tracker javascripts and cookies. So, while your
> answers are good, they don't justify google analytics in my eyes.
>
> As an example, lets look at
> https://uls.apache.org/exports/lucene.apache.org.yaml and consider your
> list
> 1. You can see breakdown of pageviews and "visitors" by day. I don't know
> how they determine unique "visitor" since it isn't cookie tracking: maybe
> some combo of (IP address, TLS session ID, user agent), but whatever they
> have is good enough for me.
> 2. I can see most popular pages and your 6.6 ref guide stuff
> 3. Top referrers gives you a rough idea of where people are coming from
> (including internal referrers). So people are clicking links on those
> pages.
> 4. see #1.
> 5. see #3. Google provides no additional magic here, this is referer (sic)
> header either way.
> 6. i think the download process is actually hacked up/convoluted just to
> force some GA tracking. At least i know if i disable javascript, the
> download buttons still work.
> 7. what is missing?
>
>
> On Wed, Mar 3, 2021 at 9:15 AM Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
>
>> I block any analytics I can find. I am with you on the overall
>> positioning. And yes, the absolute numbers lie.
>>
>> At the same time, we can get a lot of relative numbers and trends that
>> are valuable in other ways.
>>
>> For example:
>> 1) Are the social media announcements of new releases drive people to
>> download Solr?
>> 2) Which Ref Guide pages (if we had GA there) are most popular and why
>> can't we convince users to use the latest version instead of 6.6 (looking
>> at referrals). My specific peeve is that I think URPs page should be a lot
>> more visible, I would love to see if my assumptions are true by seeing if
>> people discover that page, relative to other pages.
>> 3) What is the page flow on the website? Are there any pages that are
>> complete invisible because of how we linked to them? Are there super
>> popular pages that are completely out of date?
>> 4) Do we have increase or decrease in traffic matching specific events
>> 5) Is there a specific partner/agency site that is driving a lot of
>> attention to Solr; can we replicate that with others?
>> 6) Do we even count downloads in GA? Because GA is for HTML pages only by
>> default
>> 7) If any of this is valuable, but we want to pull out GA anyway, this
>> would help to know what tracking information we would like from Apache
>> Infra?
>>
>> In general, these kinds of questions are the domain of Developer
>> Relationships role. Lucene/Solr project does not have one as such, which
>> may explain why not many people understand the values of modern analytics
>> solutions. I am offering my time to make the value of analytics concrete,
>> so we are making the next decision based on reality rather than our
>> collective imagination of what analytics actually does or does not.
>>
>> Regards,
>> Alex.
>>
>>
>>
>>
>> On Wed., Mar. 3, 2021, 8:40 a.m. Robert Muir, <rcmuir@gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov <msokolov@gmail.com>
>>> wrote:
>>>
>>>> Before you look, should we have a betting pool on the number of
>>>> downloads/day? I will arrange for a bottle of some excellent liquid to
>>>> be sent to the closest guess at the number of redirects to the mirror
>>>> sites, as determined by Alexandre. Also, has it been increasing over
>>>> the last year? Finally, if we can predict these trends using activity
>>>> on the main apache site, maybe we don't need to track independently.
>>>>
>>>
>>> Why do we even care?
>>>
>>> How many users are downloading lucene tgz from the site versus using an
>>> artifact in maven repositories (via maven, gradle, etc)? How many users are
>>> downloading solr tgz from the site versus using solr official image from
>>> docker hub?
>>>
>>> I'm just asking these questions to try to understand the need for the
>>> google tracking.
>>>
>>>
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
What is missing is the disaggregated statistics (Referrers to specific
pages, etc). And possibly a lot more, as I just pulled a couple of
examples of the top of my head, I am not a GA specialist, it is just
one of many things I do in my overall job. The specific metrics
available will actually depend on the version of GA tracker being run,
on options enabled in GA Admin, etc.

And if people don't see the value in having more detailed statistics,
I will not waste my time on doing it. I have no commercial interest
riding on the decision.

My understanding was that the analytics was in place but there was
nobody volunteering to leverage it, so we were paying the "information
leakage tax" without getting anything out of it. I've offered to solve
the "nobody volunteered" part to - at least - have a fully informed
discussion.

This conversation feels like it is veering towards a formal vote on
"information leakage tax". If that's actually what we want to do, I am
+0 on keeping it for at least 3 month for Lucene and +1 for having it
for Solr with a review at the end of that.

Regards,
Alex.

On Wed, 3 Mar 2021 at 09:41, Robert Muir <rcmuir@gmail.com> wrote:
>
> I'm not trying to come across as anti-analytics, i'm not. But I feel a lot of those questions can be answered by the aggregate stats already provided by apache (presumably from httpd access_log), without adding privacy-invading-google-tracker javascripts and cookies. So, while your answers are good, they don't justify google analytics in my eyes.
>
> As an example, lets look at https://uls.apache.org/exports/lucene.apache.org.yaml and consider your list
> 1. You can see breakdown of pageviews and "visitors" by day. I don't know how they determine unique "visitor" since it isn't cookie tracking: maybe some combo of (IP address, TLS session ID, user agent), but whatever they have is good enough for me.
> 2. I can see most popular pages and your 6.6 ref guide stuff
> 3. Top referrers gives you a rough idea of where people are coming from (including internal referrers). So people are clicking links on those pages.
> 4. see #1.
> 5. see #3. Google provides no additional magic here, this is referer (sic) header either way.
> 6. i think the download process is actually hacked up/convoluted just to force some GA tracking. At least i know if i disable javascript, the download buttons still work.
> 7. what is missing?
>
>
> On Wed, Mar 3, 2021 at 9:15 AM Alexandre Rafalovitch <arafalov@gmail.com> wrote:
>>
>> I block any analytics I can find. I am with you on the overall positioning. And yes, the absolute numbers lie.
>>
>> At the same time, we can get a lot of relative numbers and trends that are valuable in other ways.
>>
>> For example:
>> 1) Are the social media announcements of new releases drive people to download Solr?
>> 2) Which Ref Guide pages (if we had GA there) are most popular and why can't we convince users to use the latest version instead of 6.6 (looking at referrals). My specific peeve is that I think URPs page should be a lot more visible, I would love to see if my assumptions are true by seeing if people discover that page, relative to other pages.
>> 3) What is the page flow on the website? Are there any pages that are complete invisible because of how we linked to them? Are there super popular pages that are completely out of date?
>> 4) Do we have increase or decrease in traffic matching specific events
>> 5) Is there a specific partner/agency site that is driving a lot of attention to Solr; can we replicate that with others?
>> 6) Do we even count downloads in GA? Because GA is for HTML pages only by default
>> 7) If any of this is valuable, but we want to pull out GA anyway, this would help to know what tracking information we would like from Apache Infra?
>>
>> In general, these kinds of questions are the domain of Developer Relationships role. Lucene/Solr project does not have one as such, which may explain why not many people understand the values of modern analytics solutions. I am offering my time to make the value of analytics concrete, so we are making the next decision based on reality rather than our collective imagination of what analytics actually does or does not.
>>
>> Regards,
>> Alex.
>>
>>
>>
>>
>> On Wed., Mar. 3, 2021, 8:40 a.m. Robert Muir, <rcmuir@gmail.com> wrote:
>>>
>>>
>>>
>>> On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov <msokolov@gmail.com> wrote:
>>>>
>>>> Before you look, should we have a betting pool on the number of
>>>> downloads/day? I will arrange for a bottle of some excellent liquid to
>>>> be sent to the closest guess at the number of redirects to the mirror
>>>> sites, as determined by Alexandre. Also, has it been increasing over
>>>> the last year? Finally, if we can predict these trends using activity
>>>> on the main apache site, maybe we don't need to track independently.
>>>
>>>
>>> Why do we even care?
>>>
>>> How many users are downloading lucene tgz from the site versus using an artifact in maven repositories (via maven, gradle, etc)? How many users are downloading solr tgz from the site versus using solr official image from docker hub?
>>>
>>> I'm just asking these questions to try to understand the need for the google tracking.
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
On Wed, Mar 3, 2021 at 10:02 AM Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> What is missing is the disaggregated statistics (Referrers to specific
> pages, etc). And possibly a lot more, as I just pulled a couple of
> examples of the top of my head, I am not a GA specialist, it is just
> one of many things I do in my overall job. The specific metrics
> available will actually depend on the version of GA tracker being run,
> on options enabled in GA Admin, etc.
>

OK, good. I'm not sure what tool apache is using, what the privacy
limitations are here, etc. Sometimes referer itself could have some privacy
implications, just due to what is in URLs. In general, as soon as you start
"disaggregating" the data you potentially start to tread on privacy
concerns. But maybe we could ask for some improvements for the in-house
tool?


>
> And if people don't see the value in having more detailed statistics,
> I will not waste my time on doing it. I have no commercial interest
> riding on the decision.
>
> My understanding was that the analytics was in place but there was
> nobody volunteering to leverage it, so we were paying the "information
> leakage tax" without getting anything out of it. I've offered to solve
> the "nobody volunteered" part to - at least - have a fully informed
> discussion.
>

Great, my sentiments exactly on the tax. I'm unhappy about the third party
tracking and stuff, but I am mostly concerned about long term versus "fix
it now". I feel like we had similar discussions many, many years ago, and
all the GA stuff was setup but then for YEARS, nobody uses it, so the only
entity doing any analytics is Google on our users.

For the short term, using what is already setup makes sense.

If you are volunteering to do the work, I feel it's a little unfair to ask
you to do additional work, so I just propose that for any of your high
level findings, we separately take a look at "what is missing" from the
apache in-house stats and at least provide the feedback constructively to
improve it, so that long term we can stop relying on third party tracking.
Perhaps Google Analytics has some fancy GUI with lots of fancy stuff today,
but I imagine a bunch of those features are unsustainable anyway. See their
recent blog post today:
https://blog.google/products/ads-commerce/a-more-privacy-first-web/


>
> This conversation feels like it is veering towards a formal vote on
> "information leakage tax". If that's actually what we want to do, I am
> +0 on keeping it for at least 3 month for Lucene and +1 for having it
> for Solr with a review at the end of that.
>
> Regards,
> Alex.
>
> On Wed, 3 Mar 2021 at 09:41, Robert Muir <rcmuir@gmail.com> wrote:
> >
> > I'm not trying to come across as anti-analytics, i'm not. But I feel a
> lot of those questions can be answered by the aggregate stats already
> provided by apache (presumably from httpd access_log), without adding
> privacy-invading-google-tracker javascripts and cookies. So, while your
> answers are good, they don't justify google analytics in my eyes.
> >
> > As an example, lets look at
> https://uls.apache.org/exports/lucene.apache.org.yaml and consider your
> list
> > 1. You can see breakdown of pageviews and "visitors" by day. I don't
> know how they determine unique "visitor" since it isn't cookie tracking:
> maybe some combo of (IP address, TLS session ID, user agent), but whatever
> they have is good enough for me.
> > 2. I can see most popular pages and your 6.6 ref guide stuff
> > 3. Top referrers gives you a rough idea of where people are coming from
> (including internal referrers). So people are clicking links on those pages.
> > 4. see #1.
> > 5. see #3. Google provides no additional magic here, this is referer
> (sic) header either way.
> > 6. i think the download process is actually hacked up/convoluted just to
> force some GA tracking. At least i know if i disable javascript, the
> download buttons still work.
> > 7. what is missing?
> >
> >
> > On Wed, Mar 3, 2021 at 9:15 AM Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
> >>
> >> I block any analytics I can find. I am with you on the overall
> positioning. And yes, the absolute numbers lie.
> >>
> >> At the same time, we can get a lot of relative numbers and trends that
> are valuable in other ways.
> >>
> >> For example:
> >> 1) Are the social media announcements of new releases drive people to
> download Solr?
> >> 2) Which Ref Guide pages (if we had GA there) are most popular and why
> can't we convince users to use the latest version instead of 6.6 (looking
> at referrals). My specific peeve is that I think URPs page should be a lot
> more visible, I would love to see if my assumptions are true by seeing if
> people discover that page, relative to other pages.
> >> 3) What is the page flow on the website? Are there any pages that are
> complete invisible because of how we linked to them? Are there super
> popular pages that are completely out of date?
> >> 4) Do we have increase or decrease in traffic matching specific events
> >> 5) Is there a specific partner/agency site that is driving a lot of
> attention to Solr; can we replicate that with others?
> >> 6) Do we even count downloads in GA? Because GA is for HTML pages only
> by default
> >> 7) If any of this is valuable, but we want to pull out GA anyway, this
> would help to know what tracking information we would like from Apache
> Infra?
> >>
> >> In general, these kinds of questions are the domain of Developer
> Relationships role. Lucene/Solr project does not have one as such, which
> may explain why not many people understand the values of modern analytics
> solutions. I am offering my time to make the value of analytics concrete,
> so we are making the next decision based on reality rather than our
> collective imagination of what analytics actually does or does not.
> >>
> >> Regards,
> >> Alex.
> >>
> >>
> >>
> >>
> >> On Wed., Mar. 3, 2021, 8:40 a.m. Robert Muir, <rcmuir@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>>>
> >>>> Before you look, should we have a betting pool on the number of
> >>>> downloads/day? I will arrange for a bottle of some excellent liquid to
> >>>> be sent to the closest guess at the number of redirects to the mirror
> >>>> sites, as determined by Alexandre. Also, has it been increasing over
> >>>> the last year? Finally, if we can predict these trends using activity
> >>>> on the main apache site, maybe we don't need to track independently.
> >>>
> >>>
> >>> Why do we even care?
> >>>
> >>> How many users are downloading lucene tgz from the site versus using
> an artifact in maven repositories (via maven, gradle, etc)? How many users
> are downloading solr tgz from the site versus using solr official image
> from docker hub?
> >>>
> >>> I'm just asking these questions to try to understand the need for the
> google tracking.
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
Hi,

AS discussed above it's probably best not to use Google Analytics as the ASF currently discourages it use. Please see: https://issues.apache.org/jira/browse/LEGAL-470

Privacy is likely to ask projects to remove it in the near future.

Infra should be able you to get download stats if you need those.

Thanks,
Justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Who has access to Google Analytics for Lucene site? [ In reply to ]
We have already discussed and agreed on removing it when the time
comes. This discussion is specifically about trying to regain access
to the analytics that we already collected, at a higher granularity
than what current Infra stats seems to provide.

Even if we took analytics out today, avoiding years of potential
insights seems wasteful. I have reviewed the discussion linked and it
does not seem to present any additional arguments to contradict the
current position.

Regards,
Alex.

On Fri, 19 Mar 2021 at 00:12, Justin Mclean <jmclean@apache.org> wrote:
>
> Hi,
>
> AS discussed above it's probably best not to use Google Analytics as the ASF currently discourages it use. Please see: https://issues.apache.org/jira/browse/LEGAL-470
>
> Privacy is likely to ask projects to remove it in the near future.
>
> Infra should be able you to get download stats if you need those.
>
> Thanks,
> Justin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org