On Wed, Mar 3, 2021 at 10:02 AM Alexandre Rafalovitch <arafalov@gmail.com>
wrote:
> What is missing is the disaggregated statistics (Referrers to specific
> pages, etc). And possibly a lot more, as I just pulled a couple of
> examples of the top of my head, I am not a GA specialist, it is just
> one of many things I do in my overall job. The specific metrics
> available will actually depend on the version of GA tracker being run,
> on options enabled in GA Admin, etc.
>
OK, good. I'm not sure what tool apache is using, what the privacy
limitations are here, etc. Sometimes referer itself could have some privacy
implications, just due to what is in URLs. In general, as soon as you start
"disaggregating" the data you potentially start to tread on privacy
concerns. But maybe we could ask for some improvements for the in-house
tool?
>
> And if people don't see the value in having more detailed statistics,
> I will not waste my time on doing it. I have no commercial interest
> riding on the decision.
>
> My understanding was that the analytics was in place but there was
> nobody volunteering to leverage it, so we were paying the "information
> leakage tax" without getting anything out of it. I've offered to solve
> the "nobody volunteered" part to - at least - have a fully informed
> discussion.
>
Great, my sentiments exactly on the tax. I'm unhappy about the third party
tracking and stuff, but I am mostly concerned about long term versus "fix
it now". I feel like we had similar discussions many, many years ago, and
all the GA stuff was setup but then for YEARS, nobody uses it, so the only
entity doing any analytics is Google on our users.
For the short term, using what is already setup makes sense.
If you are volunteering to do the work, I feel it's a little unfair to ask
you to do additional work, so I just propose that for any of your high
level findings, we separately take a look at "what is missing" from the
apache in-house stats and at least provide the feedback constructively to
improve it, so that long term we can stop relying on third party tracking.
Perhaps Google Analytics has some fancy GUI with lots of fancy stuff today,
but I imagine a bunch of those features are unsustainable anyway. See their
recent blog post today:
https://blog.google/products/ads-commerce/a-more-privacy-first-web/ >
> This conversation feels like it is veering towards a formal vote on
> "information leakage tax". If that's actually what we want to do, I am
> +0 on keeping it for at least 3 month for Lucene and +1 for having it
> for Solr with a review at the end of that.
>
> Regards,
> Alex.
>
> On Wed, 3 Mar 2021 at 09:41, Robert Muir <rcmuir@gmail.com> wrote:
> >
> > I'm not trying to come across as anti-analytics, i'm not. But I feel a
> lot of those questions can be answered by the aggregate stats already
> provided by apache (presumably from httpd access_log), without adding
> privacy-invading-google-tracker javascripts and cookies. So, while your
> answers are good, they don't justify google analytics in my eyes.
> >
> > As an example, lets look at
> https://uls.apache.org/exports/lucene.apache.org.yaml and consider your
> list
> > 1. You can see breakdown of pageviews and "visitors" by day. I don't
> know how they determine unique "visitor" since it isn't cookie tracking:
> maybe some combo of (IP address, TLS session ID, user agent), but whatever
> they have is good enough for me.
> > 2. I can see most popular pages and your 6.6 ref guide stuff
> > 3. Top referrers gives you a rough idea of where people are coming from
> (including internal referrers). So people are clicking links on those pages.
> > 4. see #1.
> > 5. see #3. Google provides no additional magic here, this is referer
> (sic) header either way.
> > 6. i think the download process is actually hacked up/convoluted just to
> force some GA tracking. At least i know if i disable javascript, the
> download buttons still work.
> > 7. what is missing?
> >
> >
> > On Wed, Mar 3, 2021 at 9:15 AM Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
> >>
> >> I block any analytics I can find. I am with you on the overall
> positioning. And yes, the absolute numbers lie.
> >>
> >> At the same time, we can get a lot of relative numbers and trends that
> are valuable in other ways.
> >>
> >> For example:
> >> 1) Are the social media announcements of new releases drive people to
> download Solr?
> >> 2) Which Ref Guide pages (if we had GA there) are most popular and why
> can't we convince users to use the latest version instead of 6.6 (looking
> at referrals). My specific peeve is that I think URPs page should be a lot
> more visible, I would love to see if my assumptions are true by seeing if
> people discover that page, relative to other pages.
> >> 3) What is the page flow on the website? Are there any pages that are
> complete invisible because of how we linked to them? Are there super
> popular pages that are completely out of date?
> >> 4) Do we have increase or decrease in traffic matching specific events
> >> 5) Is there a specific partner/agency site that is driving a lot of
> attention to Solr; can we replicate that with others?
> >> 6) Do we even count downloads in GA? Because GA is for HTML pages only
> by default
> >> 7) If any of this is valuable, but we want to pull out GA anyway, this
> would help to know what tracking information we would like from Apache
> Infra?
> >>
> >> In general, these kinds of questions are the domain of Developer
> Relationships role. Lucene/Solr project does not have one as such, which
> may explain why not many people understand the values of modern analytics
> solutions. I am offering my time to make the value of analytics concrete,
> so we are making the next decision based on reality rather than our
> collective imagination of what analytics actually does or does not.
> >>
> >> Regards,
> >> Alex.
> >>
> >>
> >>
> >>
> >> On Wed., Mar. 3, 2021, 8:40 a.m. Robert Muir, <rcmuir@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >>>>
> >>>> Before you look, should we have a betting pool on the number of
> >>>> downloads/day? I will arrange for a bottle of some excellent liquid to
> >>>> be sent to the closest guess at the number of redirects to the mirror
> >>>> sites, as determined by Alexandre. Also, has it been increasing over
> >>>> the last year? Finally, if we can predict these trends using activity
> >>>> on the main apache site, maybe we don't need to track independently.
> >>>
> >>>
> >>> Why do we even care?
> >>>
> >>> How many users are downloading lucene tgz from the site versus using
> an artifact in maven repositories (via maven, gradle, etc)? How many users
> are downloading solr tgz from the site versus using solr official image
> from docker hub?
> >>>
> >>> I'm just asking these questions to try to understand the need for the
> google tracking.
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>