Mailing List Archive

Google, Wikipedia and internall affairs
Hi,

I have noticed that google also have in his index the "talk:something" namespace and "user:something", "User talk:something" on the English and the Dutch wikipedia. The others i do not know.

And it also haves in his index the MetaWikipedia.

I find that google should not index our internall affairs. Only the aricles, nothing else. And that the MetaWikipedia is not especialy for the general public. I should also not be in the index of google.

Am I alone about this ?

Giskart
Re: Google, Wikipedia and internall affairs [ In reply to ]
On Mon, 2002-12-16 at 12:38, giskart wrote:

> I find that google should not index our internall affairs.

I disagree. Google is not a Wikipedia search engine, it's an Internet
search engine. The goal is to collect and index the knowledge of the
Net. Part of this "knowledge" are our user pages and even our
discussions, which often have nothing to do with particular articles.
Furthermore, people should discover through Google that Wikipedia is not
just an encyclopedia but a community.

If people want to search only particular namespaces, they can use the
Wikipedia search engine.

Regards,
Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Re: Google, Wikipedia and internall affairs [ In reply to ]
On Mon, 2002-12-16 at 03:38, giskart wrote:
> I have noticed that google also have in his index the "talk:something" namespace and "user:something", "User talk:something" on the English and the Dutch wikipedia. The others i do not know.
>
> And it also haves in his index the MetaWikipedia.
>
> I find that google should not index our internall affairs. Only the aricles, nothing else. And that the MetaWikipedia is not especialy for the general public. I should also not be in the index of google.
>
> Am I alone about this ?

Our internal affairs aren't exactly internal -- it's an open, public
discussion and everyone's welcome to participate. If Google helps people
find the discussion, that's fine by me.

If something isn't meant for the general public, it shouldn't be put on
a public web server with an open content license. ;) If you're afraid
to have your name or your customary handle attached to your opinions in
the public record, don't use your real name when you speak in public.

-- brion vibber (brion @ pobox.com)
Re: Google, Wikipedia and internall affairs [ In reply to ]
Eloquence wrote:

>Giskart wrote:

>>I find that google should not index our internall affairs.

>I disagree. Google is not a Wikipedia search engine, it's an Internet
>search engine. The goal is to collect and index the knowledge of the
>Net. Part of this "knowledge" are our user pages and even our
>discussions, which often have nothing to do with particular articles.
>Furthermore, people should discover through Google that Wikipedia is not
>just an encyclopedia but a community.

I agree.
But it's also worth noting that there's probably no point
in Google's indexing page histories, edit pages,
what links here, user contributions, and the like.


-- Toby
Re: Google, Wikipedia and internall affairs [ In reply to ]
giskart wrote:

>Hi,
>
>I have noticed that google also have in his index the "talk:something" namespace and "user:something", "User talk:something" on the English and the Dutch wikipedia. The others i do not know.
>
>And it also haves in his index the MetaWikipedia.
>
>I find that google should not index our internall affairs. Only the aricles, nothing else. And that the MetaWikipedia is not especialy for the general public. I should also not be in the index of google.
>
>Am I alone about this ?
>
You're not alone. Much of what happens on talk pages is pretty
"wild-west". I think that we all want to continue that as part of the
process for arriving at consensus on NPOV for articles.

Perhaps one of our more technically minded Wikipedians can respond about
the technical feasibility of keeping talk pages out of the Google index.

Eclecticology
Re: Google, Wikipedia and internall affairs [ In reply to ]
Brion Vibber wrote:
> Our internal affairs aren't exactly internal -- it's an open, public
> discussion and everyone's welcome to participate.

I know and understand that.

>If Google helps people
> find the discussion, that's fine by me.

I have not really a problem whit the fact that google also finds the
non-articles but I do that the talk pages sometimes haves a higher
ranking then the articles. Those pages are competition whit the articels.

I only bring it up because I had the impression that a couple of months
ago was decided to exclude the non-articles from the search machines.

> If something isn't meant for the general public, it shouldn't be put on
> a public web server with an open content license. ;) If you're afraid
> to have your name or your customary handle attached to your opinions in
> the public record, don't use your real name when you speak in public.

I am not afraid about that. I stand after my (broken) words. My IRL-name
is on my user page. I find that a sysop should show his/her/it real name.

> -- brion vibber (brion @ pobox.com)
Re: Re: Google, Wikipedia and internall affairs [ In reply to ]
On Mon, Dec 16, 2002 at 11:03:06AM -0800, Brion Vibber wrote:
>Not that I'm aware of. We exclude _edit pages_ since they're the
>equivalent of 404s; and we exclude some dynamically generated pages
>because spidering them puts extra load on the server as zillions of
>"show next XX items" links are followed.

Brion, what is the mechanism for telling Google not to follow a link?
Pragma: no-cache? In mod_wiki, it would be cool to make
"googleable" be an attribute of the page one could turn on and off.

Jonathan

--
Geek House Productions, Ltd.

Providing Unix & Internet Contracting and Consulting,
QA Testing, Technical Documentation, Systems Design & Implementation,
General Programming, E-commerce, Web & Mail Services since 1998

Phone: 604-435-1205
Email: djw@reactor-core.org
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC V5R2W2
Re: Re: Google, Wikipedia and internall affairs [ In reply to ]
On Mon, 2002-12-16 at 10:49, Giskart wrote:
> I have not really a problem whit the fact that google also finds the
> non-articles but I do that the talk pages sometimes haves a higher
> ranking then the articles. Those pages are competition whit the articels.

Then the article better be brought up to snuff so it gets a better
ranking! ;)

> I only bring it up because I had the impression that a couple of months
> ago was decided to exclude the non-articles from the search machines.

Not that I'm aware of. We exclude _edit pages_ since they're the
equivalent of 404s; and we exclude some dynamically generated pages
because spidering them puts extra load on the server as zillions of
"show next XX items" links are followed.

-- brion vibber (brion @ pobox.com)
Re: Google, Wikipedia and internall affairs [ In reply to ]
On Mon, 2002-12-16 at 10:34, Ray Saintonge wrote:
> Perhaps one of our more technically minded Wikipedians can respond about
> the technical feasibility of keeping talk pages out of the Google index.

It would be very easy, but I don't think it's desireable to do so.

-- brion vibber (brion @ pobox.com)
Re: Re: Google, Wikipedia and internall affairs [ In reply to ]
On Mon, Dec 16, 2002 at 10:55:44AM -0800, Jonathan Walther wrote:
> On Mon, Dec 16, 2002 at 11:03:06AM -0800, Brion Vibber wrote:
> >Not that I'm aware of. We exclude _edit pages_ since they're the
> >equivalent of 404s; and we exclude some dynamically generated pages
> >because spidering them puts extra load on the server as zillions of
> >"show next XX items" links are followed.
>
> Brion, what is the mechanism for telling Google not to follow a link?
> Pragma: no-cache? In mod_wiki, it would be cool to make
> "googleable" be an attribute of the page one could turn on and off.

robots.txt file is right solution.
And googlability is property of classes of pages, not individual pages,
so such atribute is misdesigned.
Re: Re: Google, Wikipedia and internall affairs [ In reply to ]
On Mon, 2002-12-16 at 10:55, Jonathan Walther wrote:
> Brion, what is the mechanism for telling Google not to follow a link?
> Pragma: no-cache? In mod_wiki, it would be cool to make
> "googleable" be an attribute of the page one could turn on and off.

The cache obviously doesn't kill google, or our entire site would be
missing from their index. ;)

What we use seems to be:
<meta name="robots" content="noindex,nofollow">

Supposedly it works, though I've made no attempt to test it. (Our
robots.txt blocks off everything that directly uses the /w/wiki.phtml
link, ie everything but simple page views and the default views of
special pages.)

-- brion vibber (brion @ pobox.com)