Mailing List Archive: Static html

Static html

puglisi at arcetri

May 29, 2003, 2:15 AM

Post #1 of 100 (878 views)

After some delays and bug-hunting my script for the HTML static versions
is in acceptable shape.

Here you can see an example, built from a SQL file of some weeks ago:
(Don't try the Search box!!! I explain below)

http://www.arcetri.astro.it/~puglisi/wiki/dump/ma/main_page.html

Please don't DOS the connection, it's not a very fast line.
Interested parties can find the script here:

http://www.arcetri.astro.it/~puglisi/wiki/wiki2static.txt

(renamed to .txt due to some server misconfig)
use a wide terminal for this one. Everything (html code included) is in
one single file. The whitespace may appear weird because I use 4-space
tabs. There's no need to tell me you don't like the coding style, I
alread know :-)))

Some issues:

- the topbar links do not work (known bug :-). The Edit link goes to the
online wikipedia site.
- interlanguage links are ignored
- some wiki markup is not recognized yet.
- no images are present (of course!)
- filenames should be OK for most filesystems not "8.3" limited
(max 63 chars, only a-z, 0-9 and underscore)

- despite the two-letter subdirectories, some of them have over 4,000
files in them!

- Time: the script takes more than 2 hours on my 1.3 Ghz Athlon...

- Size: this dump is about 800MB. (tar.gz is just 110MB). I think
that I can bring it down to 600-650MB with a bit of trimming and
eliminating unnecessary redirects. BUT, without some form of compression,
the English wikipedia will soon overflow a single CD. Maybe we should
target DVDs? :-)

- Images: no images are present here. AFAIK, each of them has a SQL record
(that my script skips), but the actual image data is not included. How
many megabytes of images we have? I think it will be impossible to store
the full images on a CD. Certainly it's possible on a DVD. Maybe a low-res
version could be included in a CD.

- Search: I tried a javascript search that worked well for small sized
databases: it's basically a big array of strings (article titles and
filenames) with some lines code that do a regexp match against them.
For full-sized databases like this one, the search page becomes an 8
megabytes monster that takes forever to process (IE grabs 100 MB of memory
and stops there, Opera is even worse). I'll see if I can find a different
solution.

Enough for now. While I carry on development, any input is welcome.

Ciao,
Alfio

RE: Static html [ In reply to ]

e.p.zachte at chello

May 29, 2003, 4:42 PM

Post #2 of 100 (871 views)

Hi Alfio,

I looked at your code. Nice job.

Superficially it may seem we did almost the same job.
But overlap is minimal. My perl script addresses a lot of issues that
only are relevant in a Palm/Pocket PC/TomeRaider environment.

Your version has quite some code which is specific for a static html
version.

Still there are some areas where we can be of help to each other.
You mentioned unicode support as an open issue. Conincidentally I was
looking into this issue the past few days, while preparing a TomeRaider
version of the Esperanto Wikipedia, which would be unreadable without
it.

You will also find the UTF-8 coding scheme on which this is based below.

Here is some Perl code to translate unicode multicharacter byte
sequences into html tags of type &#nnn;

# unicode -> html character codes &#nnnn;
$entry =~ s/([\x80-\xFF]+)/&UnicodeToHtml($1)/ge ;

sub UnicodeToHtml
{
my $text = shift ;
my $html = "" ;
my $c, $byte, $ord, $unicode, $bytes, $html ;
for ($c = 0 ; $c < length ($text) ; $c++)
{
$byte = substr ($text,$c,1) ; # optimize with regexp ?
$ord = ord ($byte) ;
if ($ord < 128) # plain ascii character
{ $html .= $byte ; } # (will not occur in this script)
else
{
if ($ord < 224)
{ $bytes = 2 ; }
elsif ($ord < 240)
{ $bytes = 3 ; }
elsif ($ord < 248)
{ $bytes = 4 ; }
elsif ($ord < 252)
{ $bytes = 5 ; }
else
{ $bytes = 6 ; }
$unicode = substr ($text,$c,$bytes) ;
$html .= &UnicodeToHtmlTag ($unicode) ;
$c += $bytes - 1 ;
}
}
return ($html) ;
}

sub UnicodeToHtmlTag
{
my $unicode = shift ;
my $char = substr ($unicode,0,1) ;
my $ord = ord ($char) ;
my $c, $ord, $value ;

if ($ord < 128) # plain ascii character
{ return ($unicode) ; } # (will not occur in this script)
else
{
if ($ord >= 252)
{ $value = $ord - 252 ; }
elsif ($ord >= 248)
{ $value = $ord - 248 ; }
elsif ($ord >= 240)
{ $value = $ord - 240 ; }
elsif ($ord >= 224)
{ $value = $ord - 222 ; }
else
{ $value = $ord - 192 ; }
for ($c = 1 ; $c < length ($unicode) ; $c++)
{ $value = $value * 64 + ord (substr ($unicode, $c,1)) - 128 ; }
return ("\&\#" . $value . ";") ;
}
}

Found this somewhere on the web:

#UTF-8 works as follows:
#ENCODING
# The following byte sequences are used to represent a char-
# acter. The sequence to be used depends on the UCS code
# number of the character:
# 0x00000000 - 0x0000007F:
# 0xxxxxxx
#
# 0x00000080 - 0x000007FF:
# 110xxxxx 10xxxxxx
#
# 0x00000800 - 0x0000FFFF:
# 1110xxxx 10xxxxxx 10xxxxxx
#
# 0x00010000 - 0x001FFFFF:
# 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
#
# 0x00200000 - 0x03FFFFFF:
# 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
#
# 0x04000000 - 0x7FFFFFFF:
# 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
#
# The xxx bit positions are filled with the bits of the
# character code number in binary representation. Only the
# shortest possible multibyte sequence which can represent
# the code number of the character can be used.

By the way I enjoyed your contribution about Ant Power.

If you have any questions or suggestions you can reach me at
xxx@chello.nl
!spam: read xxx as epzachte

Cheers, Erik Zachte

Re: Static html [ In reply to ]

May 29, 2003, 9:46 PM

Post #3 of 100 (867 views)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Je Ä´aÅdo 29 Majo 2003 02:15, Alfio Puglisi skribis:
> - Images: no images are present here. AFAIK, each of them has a SQL
> record (that my script skips), but the actual image data is not
> included.

Many images uploaded to Wikipedia are third-parties' copyrighted IP
being used under the vague claim of "fair use". The project has not yet
received legal advice about whether such materials can be redistributed
under the terms of the GFDL; until that's resolved I at least have no
intention of putting them all in one easy download, for which a
significant use would be reuse and redistribution by people like
yourself trying to reuse Wikipedia material.

IIRC the last word from Jimbo on the subject was:
http://mail.wikipedia.org/pipermail/wikipedia-l/2002-November/007880.html

I know several prominent Wikipedians don't seem to care about this, but
I think staying true to our all-reusable-all-redistributable license is
a very important aspect of maintaining Wikipedia's credibility and
fulfilling the project's goals. It may not always be _expedient_, but
then copying every article straight out of EB or World Book or Encarta
would have been a very expedient way to build an encyclopedia, and
cracking Windows XP activation codes and binary search-and-replacing
'(c) Microsoft' to 'Copyleft GPL' would be a very expedient way to put
out a free operating system. :P

> How many megabytes of images we have?

On the English wikipedia, the upload tree contains about 437 megs at
present not counting older replaced revisions of files. I'm not sure
what portion are actually used or needed; and some portion of that is
non-image material (sound clips, etc).

- -- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE+1uJ+xVlOmwh1xjgRAtdRAJsFoTquEj2JX1snahmX62vLP/oIOgCdH4v+
+KFGEdqBjKAO2PmHrpjdYa0=
=EyF4
-----END PGP SIGNATURE-----

Re: Static html [ In reply to ]

May 29, 2003, 10:18 PM

Post #4 of 100 (864 views)

> (Brion Vibber <brion@pobox.com>):
>
> Many images uploaded to Wikipedia are third-parties' copyrighted IP
> being used under the vague claim of "fair use". The project has not yet
> received legal advice about whether such materials can be redistributed
> under the terms of the GFDL; until that's resolved I at least have no
> intention of putting them all in one easy download, for which a
> significant use would be reuse and redistribution by people like
> yourself trying to reuse Wikipedia material.

I don't think there's anything unresolved; clearly we CAN'T redistribute
copyrighted photos under GFDL. But the issue of whether or not we can use
them in Wikipedia is more complicated. The supply of good public domain
photos is negligible, and the usefulness of photos to an encyclopedia is
critical enough to justify some hassle. So if copyrighted images, used with
permission or under fair use, are attached to Wikipedia articles, and it is
likely that most if not all redistributions of the Wikipedia would also
qualify for the same fair use, I think that's a reasonable second choice,
so long as (1) every image so used _is clearly identified_ as such, and
has clear documentation of its source.

Images from unknown sources should certainly be removed, and any images
that can be replaced with real GFDL images should be. Likewise, if we do
any static mirror of Wikipedia, I think it's important that it retain
(or point to) the image description pages that identify the source of
each image and its status.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC

Re: Static html [ In reply to ]

May 30, 2003, 12:49 AM

Post #5 of 100 (867 views)

On Thu, May 29, 2003 at 09:46:53PM -0700, Brion Vibber wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Je ??a??do 29 Majo 2003 02:15, Alfio Puglisi skribis:
> > - Images: no images are present here. AFAIK, each of them has a SQL
> > record (that my script skips), but the actual image data is not
> > included.
>
> Many images uploaded to Wikipedia are third-parties' copyrighted IP
> being used under the vague claim of "fair use". The project has not yet
> received legal advice about whether such materials can be redistributed
> under the terms of the GFDL; until that's resolved I at least have no
> intention of putting them all in one easy download, for which a
> significant use would be reuse and redistribution by people like
> yourself trying to reuse Wikipedia material.

On Polish Wikipedia they would be immediately deleted.
"Fair use" is well known to be incompatible with free software -
it gives very limited rights.

Please detele all such images, so Wikipedia can stay free.

Re: Static html [ In reply to ]

puglisi at arcetri

May 30, 2003, 1:44 AM

Post #6 of 100 (863 views)

On Fri, 30 May 2003, Lee Daniel Crocker wrote:

>critical enough to justify some hassle. So if copyrighted images, used with
>permission or under fair use, are attached to Wikipedia articles, and it is
>likely that most if not all redistributions of the Wikipedia would also
>qualify for the same fair use, I think that's a reasonable second choice,
>so long as (1) every image so used _is clearly identified_ as such, and
>has clear documentation of its source.
>
>Images from unknown sources should certainly be removed, and any images
>that can be replaced with real GFDL images should be. Likewise, if we do
>any static mirror of Wikipedia, I think it's important that it retain
>(or point to) the image description pages that identify the source of
>each image and its status.

Maybe a new SQL field for images? Something like

copyright tinyint(2) NOT NULL

1 = public domain
2 = explicit GFDL
3 = fair use
4 = copyright holder permission
5 = don't redistribute
6 = stolen
7 = don't know

to be filled with a checkbox in the image upload page?

Ciao,
Alfio

Re: Static html [ In reply to ]

May 30, 2003, 2:01 AM

Post #7 of 100 (866 views)

On Fri, May 30, 2003 at 10:44:16AM +0200, Alfio Puglisi wrote:
> On Fri, 30 May 2003, Lee Daniel Crocker wrote:
>
> >critical enough to justify some hassle. So if copyrighted images, used with
> >permission or under fair use, are attached to Wikipedia articles, and it is
> >likely that most if not all redistributions of the Wikipedia would also
> >qualify for the same fair use, I think that's a reasonable second choice,
> >so long as (1) every image so used _is clearly identified_ as such, and
> >has clear documentation of its source.
> >
> >Images from unknown sources should certainly be removed, and any images
> >that can be replaced with real GFDL images should be. Likewise, if we do
> >any static mirror of Wikipedia, I think it's important that it retain
> >(or point to) the image description pages that identify the source of
> >each image and its status.
>
> Maybe a new SQL field for images? Something like
>
> copyright tinyint(2) NOT NULL
>
> 1 = public domain
> 2 = explicit GFDL
> 3 = fair use
> 4 = copyright holder permission
> 5 = don't redistribute
> 6 = stolen
> 7 = don't know
>
> to be filled with a checkbox in the image upload page?

No !!!
Just delete all non-free images.

Re: Fair use (was: Static html) [ In reply to ]

erik_moeller at gmx

May 30, 2003, 3:09 AM

Post #8 of 100 (864 views)

Tomasz-

> No !!!
> Just delete all non-free images.

It would be silly not to allow the freedoms given to us by the law to the
maximum possible extent. Obviously free images (PD/FDL) are always
preferable to non-free ones, but so long as Wikipedia remains
redistributable AND forkable, I see no problem with using fair use
pictures or those under equivalent semi-free licenses. Keep in mind that
the freedom to modify is, for images, not nearly as important as for
text.[1] How do you propose acquiring a free photo of a prominent,
recently deceased author or actor? A picture of an important historical
event? This is unrealistic -- you won't, and with your attitude, we will
simply have no image for that article, while Encarta et al. will sport a
nice gallery of them. Unacceptable.

Fair use is acceptable as per the current consensus on [[Wikipedia:Image
use policy]]. You may use any quasi-dictatorial powers you have on pl. to
enforce your point of view, but this will not be possible on en.

Having a status flag in the database would be very helpful for forkers,
but we should not encourage uploading photos under licenses that limit
redistribution. But a little more Linus Torvalds style pragmatism and a
little less Richard Stallman style zealotry wouldn't hurt either.

What we should allow:
---------------------
1) public domain
2) FDL
3) fair use
4) free for non-commercial use. This is similar to fair use, but less
vague.
5) Creative Commons licenses (e.g. Attribution) and FDL-equivalent open
content licenses

What we should not allow:
-------------------------
1) Copyrighted, no permission (duh)
2) No permission to redistribute other than for Wikipedia (prevents
forking)
3) Advertising or prominent copyright notices in images that cannot be
removed
4) Other restrictive licenses

Regards,

Erik

[1] And before someone starts mentioning all the important crop and
resizing operations we do on images, these would not constitute license
violations because they do not substantially alter the source image.

Re: Re: Fair use (was: Static html) [ In reply to ]

daniel at copyleft

May 30, 2003, 3:27 AM

Post #9 of 100 (865 views)

On 30 May 2003, Erik Moeller wrote:

> Having a status flag in the database would be very helpful for forkers,
> but we should not encourage uploading photos under licenses that limit
> redistribution. But a little more Linus Torvalds style pragmatism and a
> little less Richard Stallman style zealotry wouldn't hurt either.

And end up being sued like Linus?

-- Daniel

Re: Re: Fair use (was: Static html) [ In reply to ]

May 30, 2003, 4:18 AM

Post #10 of 100 (870 views)

On Fri, May 30, 2003 at 12:09:00PM +0200, Erik Moeller wrote:
> Tomasz-
>
> > No !!!
> > Just delete all non-free images.
>
> It would be silly not to allow the freedoms given to us by the law to the
> maximum possible extent. Obviously free images (PD/FDL) are always
> preferable to non-free ones, but so long as Wikipedia remains
> redistributable AND forkable, I see no problem with using fair use
> pictures or those under equivalent semi-free licenses. Keep in mind that
> the freedom to modify is, for images, not nearly as important as for
> text.[1] How do you propose acquiring a free photo of a prominent,
> recently deceased author or actor? A picture of an important historical
> event? This is unrealistic -- you won't, and with your attitude, we will
> simply have no image for that article, while Encarta et al. will sport a
> nice gallery of them. Unacceptable.

This reasoning is exactly the greatest danger to Wikipedia - you
want us to give up important freedoms to get a little more content.
People who think like that are much greater threat that all Helgas and Lirs
project ever had.

Not only it is perfectly acceptable, it is THE ONLY RIGHT THING TO DO.

> Fair use is acceptable as per the current consensus on [[Wikipedia:Image
> use policy]]. You may use any quasi-dictatorial powers you have on pl. to
> enforce your point of view, but this will not be possible on en.

There has never been any contensus about that. Upload page has always been
saying that copyright status of uploaded files must be clear.

> Having a status flag in the database would be very helpful for forkers,
> but we should not encourage uploading photos under licenses that limit
> redistribution. But a little more Linus Torvalds style pragmatism and a
> little less Richard Stallman style zealotry wouldn't hurt either.

What you call "pragmatism" is in fact short-sightedness.

> What we should allow:
> ---------------------
> 1) public domain
> 2) FDL
> 3) fair use
> 4) free for non-commercial use. This is similar to fair use, but less
> vague.
> 5) Creative Commons licenses (e.g. Attribution) and FDL-equivalent open
> content licenses

Of course Wikipedia may be used commercially.

Re: Re: Fair use [ In reply to ]

erik_moeller at gmx

May 30, 2003, 5:32 AM

Post #11 of 100 (864 views)

Daniel-
> On 30 May 2003, Erik Moeller wrote:

>> Having a status flag in the database would be very helpful for forkers,
>> but we should not encourage uploading photos under licenses that limit
>> redistribution. But a little more Linus Torvalds style pragmatism and a
>> little less Richard Stallman style zealotry wouldn't hurt either.

> And end up being sued like Linus?

Hardly an argument. You can always end up being sued. The question is
whether the other side wins.

Regards,

Erik

Re: Re: Fair use [ In reply to ]

erik_moeller at gmx

May 30, 2003, 5:38 AM

Post #12 of 100 (868 views)

Tomasz-
> This reasoning is exactly the greatest danger to Wikipedia - you
> want us to give up important freedoms to get a little more content.
> People who think like that are much greater threat that all Helgas and Lirs
> project ever had.

I could respond on the same level, but I won't. It's a matter of balance
-- what freedoms do you lose? You might lose the freedom to take all of
Wikipedia's images and sell them as the "Coca Cola photo collection" for
50 bucks. Big deal. As I said, having the respective flag in the "image"
table would be sufficient for any third party to easily filter out images
which contradict commercial use.

> Not only it is perfectly acceptable, it is THE ONLY RIGHT THING TO DO.

Because you say so? I don't care about an ideologically defined "RIGHT
THING TO DO". I care about results.

>> Fair use is acceptable as per the current consensus on [[Wikipedia:Image
>> use policy]]. You may use any quasi-dictatorial powers you have on pl. to
>> enforce your point of view, but this will not be possible on en.

> There has never been any contensus about that.

I don't know about pl:, but policies on en: are developed through
discussions. This is how the current image use policy was created. Jimbo
has said that fair use, in limited scope, is acceptable. I agree: Fair use
is the last desirable of all choices because it is so vaguely defined. It
should only be used for very important images.

> What you call "pragmatism" is in fact short-sightedness.

No, I am thinking very much about how Wikipedia will look 10 years from
now.

Regards,

Erik

Re: Re: Fair use [ In reply to ]

daniel at copyleft

May 30, 2003, 5:51 AM

Post #13 of 100 (868 views)

On 30 May 2003, Erik Moeller wrote:

>>> Having a status flag in the database would be very helpful for forkers,
>>> but we should not encourage uploading photos under licenses that limit
>>> redistribution. But a little more Linus Torvalds style pragmatism and a
>>> little less Richard Stallman style zealotry wouldn't hurt either.

>> And end up being sued like Linus?

> Hardly an argument. You can always end up being sued. The question is
> whether the other side wins.

That's like saying seatbelts are hardly a safety improvement, because you might
still end up dead.

Free software and free content projects are very vunerable to corporate
litigation simply because they don't have the financial resources to tackle it.
The projects are generally run by a group of individuals, which make ideal
targets for corporate threats about liability.

So speaking in practical terms (wich should please you) the question isn't so
much about who is right or wrong - but what the risks are. And the risks of
using material that isn't 100% verified as free are enormous. As the SCO
debacle clearly demonstrates.

FreeBSD is a project where they have learned from bitter experience that every
last piece of code included or inherited must be double checked for copyright,
patents and non disclusure problems. Now, FreeBSD people are very far away
"politically" from RMS, yet take these issues very seriously. So please don't
turn this into a "practical vs. zealot" issue.

So should Wikipedia.

(Wikipedia also has the great advantage of incorporating hundreds, if not
thousands of authors world wide. Cameras are cheap. If there is a serious
dearth of pictures, we should organize a project to have more of our authors go
out and take pictures, and work to aquire permission to use copyrighted
pictures where this isn't possible.)

-- Daniel

Re: Re: Fair use [ In reply to ]

erik_moeller at gmx

May 30, 2003, 6:38 AM

Post #14 of 100 (862 views)

Daniel-

> Free software and free content projects are very vunerable to corporate
> litigation simply because they don't have the financial resources to tackle
> it. The projects are generally run by a group of individuals, which make
> ideal targets for corporate threats about liability.

That's why having a non-profit organization is a good idea. It is one of
the best arguments to join the GNU project when developing free software,
for example -- still, most people are understandably irritated by their
zealous nature. For Wikipedia, we will have the Nupedia Foundation to
offer legal protection.

> So speaking in practical terms (wich should please you) the question isn't
> so much about who is right or wrong - but what the risks are. And the risks
> of using material that isn't 100% verified as free are enormous. As the SCO
> debacle clearly demonstrates.

Only that the "SCO debacle" happened in spite of assurances by everyone
that only GNU GPL code was involved -- according to SCO, however, IBM
screwed up. Aside from the fact that the entire lawsuit is a Microsoft FUD
project, all this demonstrates is that in spite of your best intentions,
you can still get sued -- exactly what I've been saying.

It makes no sense to ignore the freedoms offered to us by law -- and one
of these freedoms, in the US and most other civilized nations, is fair use
-- only because some people might interpret that law differently.
Realistically, the only real threat to Wikipedia is that someone will send
us a nasty letter to take this or that image down, which we obviously will
do in most cases. What you are promoting is to give up freedom because the
Large Evil Corporations have deeper pockets. I don't think we should give
in that easily. See also:
http://meta.wikipedia.org/wiki/avoid_copyright_paranoia

It makes sense to be careful, it makes sense to use open content whenever
possible. But a purely ideological stance is counter productive. I find it
quite ironic that the most ardent advocates of open content also
frequently adopt a police state mentality with regard to perceived
violations.

> FreeBSD is a project where they have learned from bitter experience that
> every last piece of code included or inherited must be double checked for
> copyright, patents and non disclusure problems.

I'm all for double checking. I'm against saying "Everything but the GNU
FDL or public domain is forbidden". I'm against removing images
prematurely because they *might* be protected. There is no analogy to fair
use in the free software world.

> (Wikipedia also has the great advantage of incorporating hundreds, if not
> thousands of authors world wide. Cameras are cheap. If there is a
> serious dearth of pictures, we should organize a project to have more of
> our authors go out and take pictures, and work to aquire permission to
> use copyrighted pictures where this isn't possible.)

[Wikipedia-l] Wikipedia photo squad?
Erik Moeller wikipedia-l@wikipedia.org
Wed, 30 Oct 2002 15:51:29 +0100 (MET)
http://mail.wikipedia.org/pipermail/wikipedia-l/2002-October/006804.html

As you can see, I'm all for collaborative content creation. However, it
will be difficult to get a nice FDL photo of Clark Gable, because the
guy's been dead for quite a while. Incidentally, the current article about
Clark Gable uses a copyrighted photo, uploaded by Zoe, a long-time
Wikipedian who is also very careful about finding and reporting copyright
infringements. Fair use is a common practice on Wikipedia, and I don't
think that should change.

Regards,

Erik

Re: Static html [ In reply to ]

May 30, 2003, 9:35 AM

Post #15 of 100 (866 views)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Je Ä´aÅdo 29 Majo 2003 22:18, Lee Daniel Crocker skribis:
> I don't think there's anything unresolved; clearly we CAN'T
> redistribute copyrighted photos under GFDL. But the issue of whether
> or not we can use them in Wikipedia is more complicated.

Aye, there's the rub. Can we modify a GFDL-licensed work by integrating
material we *know* we cannot distribute under the GFDL license and
*distribute the result under the GFDL license*?

It sure looks to me like we're blatantly voiding our own license, and
that makes me reeeeeeal uncomfortable.

- -- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE+14hExVlOmwh1xjgRAlIiAJ4myQCi4yxrfP7+h6N0TT/qIE2lVgCfVjbZ
wFRTTOU5fI2BE75ayb/acM0=
=WJSw
-----END PGP SIGNATURE-----

Re: Re: Fair use (was: Static html) [ In reply to ]

saintonge at telus

May 30, 2003, 9:36 AM

Post #16 of 100 (867 views)

Hr. Daniel Mikkelsen wrote:

>On 30 May 2003, Erik Moeller wrote:
>
>>Having a status flag in the database would be very helpful for forkers,
>>but we should not encourage uploading photos under licenses that limit
>>redistribution. But a little more Linus Torvalds style pragmatism and a
>>little less Richard Stallman style zealotry wouldn't hurt either.
>>
>
>And end up being sued like Linus?
>
I agree with Erik on this. Simply guessing that something is copyright
doesn't make it so. It's all a question of whether you interpret
copyright laws liberally in favour of the user or liberally in favour of
the putative copyright owner. The fear of being sued is a red herring;
we are very far from any possibility that such a thing might happen.
With the safe harbor provisions of copyright law we would have ample
opportunity to remove offending material before it went that far.
Notification of copyright violation would have to come first from the
copyright OWNER or his AUTHORIZED representative, not from some
disinterested 3rd party do-gooder with unschooled guesses about what the
law says.

Effectively violating copyright is very easy to do. I have hundreds of
copyright books in my personal library that I could scan with OCR
software, and add into a Wikipedia article without anyone EVER realizing
it. That wouldn't make it right, but I'm confident that nobody would
ever call my hand on it. If I can do it so can any other contributor,
so let's not be naive about this.

Of course, if a copyright violation is flagrant and obvious it should be
removed. Unfortunately, most alleged violations are not that clear, and
we should give the contributor the benefit of the doubt without
descending into copyright paranoia.

Eclecticology

Re: Re: Fair use [ In reply to ]

saintonge at telus

May 30, 2003, 9:55 AM

Post #17 of 100 (871 views)

Erik Moeller wrote:

>Daniel-
>
>>On 30 May 2003, Erik Moeller wrote:
>>
>
>>>Having a status flag in the database would be very helpful for forkers,
>>>but we should not encourage uploading photos under licenses that limit
>>>redistribution. But a little more Linus Torvalds style pragmatism and a
>>>little less Richard Stallman style zealotry wouldn't hurt either.
>>>
>
>>And end up being sued like Linus?
>>
>
>Hardly an argument. You can always end up being sued. The question is
>whether the other side wins.
>
Agreed, but there's still the problem of ambulance chasing lawyers
earning contingency fees. Innocent defendants in such suits need to be
able to collect contingency fees. :-D

Ec

Re: Re: Fair use (was: Static html) [ In reply to ]

May 30, 2003, 11:04 AM

Post #18 of 100 (864 views)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Je Vendredo 30 Majo 2003 03:09, Erik Moeller skribis:
> Having a status flag in the database would be very helpful for
> forkers, but we should not encourage uploading photos under licenses
> that limit redistribution. But a little more Linus Torvalds style
> pragmatism and a little less Richard Stallman style zealotry wouldn't
> hurt either.

I assume you're referring to the row over linking binary-only modules in
the Linux kernel. For the uninitiated:

The GPL license forbids linking GPL'd code with non-GPL'able code,
excluding standard system libraries on non-free operating systems. This
is to prevent fake "free" code that in fact is dependant on third-party
non-free libraries in order to function.

The Free Software Foundation claims that this forbids _any_ linking,
including dynamic run-time linking such as is common today in plug-in
module/driver architectures. Linus doesn't interpret it this way, and
says it's okay to distribute binary-only (non-free, no-source, limited
distribution) driver modules that can be linked into the GPL'd Linux
kernel at run-time.

Coupla notes:
* Said modules are distributed by their copyright owners with the
supported hardware, not as part of the Linux kernel.
* The Linux kernel runs just fine without them. The modules provide
additional support for certain hardware only.
* Said linking is done at run time on the user's machine; the combined
result exists only in memory and is never redistributed.

So by analogy "Linus Torvalds style pragmatism" might provide for a
third-party filter program that inserts non-FDL images into Wikipedia
articles on a reader's computer as they're loaded. ;) But it's real
questionable whether it could mean "sure, let's put a whole bunch of
stuff that's not compatible with our license directly into our main
distribution and embed them directly into pages served from our server.
After all, we can tell people to remove them if they can't use them
(even though we say it's absolutely vital that we include these images
to have a legitimate encyclopedia)."

> What we should allow:
> ---------------------
> 1) public domain
> 2) FDL

Wonderful.

> 3) fair use
> 4) free for non-commercial use. This is similar to fair use, but less
> vague.

Can't redistribute these under the terms of the GFDL, so can't embed
them in articles.

> 5) Creative Commons licenses (e.g. Attribution) and FDL-equivalent
> open content licenses

This is a greyer area, as such licenses have much the same goals, but
may or may not be letter-compatible. There's a fair chance that someone
releasing material under such a license would be willing to
dual-license if asked.

> What we should not allow:
> -------------------------
> 1) Copyrighted, no permission (duh)
> 2) No permission to redistribute other than for Wikipedia (prevents
> forking)
> 3) Advertising or prominent copyright notices in images that cannot
> be removed
> 4) Other restrictive licenses

How can these be shown to be distinct from claiming "fair use" on
copyrighted material that hasn't given us *any* rights (the default
being 'all rights reserved')?

Unless one of us is a lawyer familiar with the licenses and laws
involved, I don't think this discussion is going to go anywhere useful
at this point. :)

- -- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE+150lxVlOmwh1xjgRAkUWAJ99UXi5sapf4NBPe6JKUbtWhx0R4ACglGRW
Ui2kAUwEsQvgmGdcQLmlZjg=
=DQep
-----END PGP SIGNATURE-----

Re: Static html [ In reply to ]

axelboldt at yahoo

May 30, 2003, 11:05 AM

Post #19 of 100 (865 views)

--- Tomasz Wegrzanowski <taw@users.sourceforge.net> wrote:
> "Fair use" is well known to be incompatible with free software -
> it gives very limited rights.
>
> Please delete all such images, so Wikipedia can stay free.

I agree. Illustrating an article by adding a photo creates a derivative
work, therefore the whole has to be put under GFDL, therefore it cannot
be fair use material (or anything besides public domain or GFDL).

Axel

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

Re: Re: Fair use (was: Static html) [ In reply to ]

saintonge at telus

May 30, 2003, 11:23 AM

Post #20 of 100 (867 views)

Brion Vibber wrote:

>How can these be shown to be distinct from claiming "fair use" on
>copyrighted material that hasn't given us *any* rights (the default
>being 'all rights reserved')?
>
"All right reserved" is equivalent to saying, "I don't know what my
rights are, but whatever they are I'm claiming them. Figuring out what
rights I don't have is your problem."

>Unless one of us is a lawyer familiar with the licenses and laws
>involved, I don't think this discussion is going to go anywhere useful
>at this point. :)
>
Even if one of us IS a lawyer, I don't think it would go any further. ;-)

Ec

Re: Re: Fair use (was: Static html) [ In reply to ]

axelboldt at yahoo

May 30, 2003, 11:29 AM

Post #21 of 100 (870 views)

--- Erik Moeller <erik_moeller@gmx.de> wrote:

> How do you propose acquiring a free photo of a prominent,
> recently deceased author or actor? A picture of an important
> historical event?

First, use the extensive public domain archives, for instance the
collections of the library of congress. Next, once our foundation has
money, we can try to acquire the copyright of selected important images
we need and cannot get in any other way.

Axel

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

Re: Re: Fair use (was: Static html) [ In reply to ]

axelboldt at yahoo

May 30, 2003, 1:36 PM

Post #22 of 100 (868 views)

--- Erik Moeller <erik_moeller@gmx.de> wrote:
> Tomasz-
>
> > No !!!
> > Just delete all non-free images.
>
> It would be silly not to allow the freedoms given to us by the law to
> the maximum possible extent.

By law, we could include texts which are exclusively licensed to us.
Yet we don't do it: the goal isn't to produce a good encyclopedia
(there are enough already), but a good encyclopedia that's free in as
many senses as possible.

Axel

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

Re: Re: Fair use [ In reply to ]

toby+wikipedia at math

May 30, 2003, 1:57 PM

Post #23 of 100 (869 views)

Brion Vibber wrote in part:

>So by analogy "Linus Torvalds style pragmatism" might provide for a
>third-party filter program that inserts non-FDL images into Wikipedia
>articles on a reader's computer as they're loaded. ;)

Like a web browser?
That's a third-party program that inserts Wikipedia's images
into Wikipedia's articles on a reader's computer as they're loaded.
The HTML document (the article) that we serve them has no image;
we also serve them the image, which they request in reaction to our article
(much as a module may be requested in reaction to the kernel's actions),
but that's under fair use, not GFDL.

The problem, it seems to me, isn't that we use the images,
but that we pretend that we're using them under the GFDL.
They're already separated out a bit (not in the download, after all),
and we should separate them (or the non-free ones) out further,
rather than claiming in some places that we're entirely free and GFDL,
while claiming in other places that we use some images through fair use.
(I'd even support placing fair use images in a separate namespace
and database from the free ones, just to make things ultraclear.)
Our fair use images should be treated as *auxiliary*.

>(even though we say it's absolutely vital that we include these images
>to have a legitimate encyclopedia).

It should ''never'' be vital to an article that we include an image,
if we can help it, not only for distribution but also for accessibility.
We should make our articles as good as possible without the images,
whether or not we then also decide to send an image along with it.
This is true even for free images, because we may have blind readers.

>Unless one of us is a lawyer familiar with the licenses and laws
>involved, I don't think this discussion is going to go anywhere useful
>at this point. :)

Of course, IANAL either.

-- Toby

Re: Re: Fair use [ In reply to ]

vibber at aludra

May 30, 2003, 3:46 PM

Post #24 of 100 (864 views)

On Fri, 30 May 2003, Toby Bartels wrote:
> Brion Vibber wrote in part:
> >So by analogy "Linus Torvalds style pragmatism" might provide for a
> >third-party filter program that inserts non-FDL images into Wikipedia
> >articles on a reader's computer as they're loaded. ;)
>
> Like a web browser?
> That's a third-party program that inserts Wikipedia's images
> into Wikipedia's articles on a reader's computer as they're loaded.

...automatically upon the programmed instructions contained in the page,
without permission of the copyright owner of the image, without
instruction or intervention by the user.

> The HTML document (the article) that we serve them has no image;
> we also serve them the image, which they request in reaction to our article
> (much as a module may be requested in reaction to the kernel's actions),
> but that's under fair use, not GFDL.

Compare again:
* the modules are distributed by their owners, *not* with the GPL kernel
* the images are distributed with the GFDL articles, *not* by their owners

* the modules are installed and loaded by deliberate action of the user on
the user's machine, not automatically by the kernel
* the images are loaded automatically upon the instructions in the article
page without user intervention

* the kernel-module combination is never redistributed
* the article-image combination is redistributed by local saving,
hardcopy, mirroring, or format conversion of the articles unless
careful effort is made to avoid including the image (which the article
includes programmatic commands to include)

Most importantly, the paradigm of separate text and image files is simply
a technical limitation of the HTML format used. If Wikipedia were
distributed in PostScript, PDF, MS Word documents, RTF, or the printed
page, there would be no such separation. ('Material printed in black ink
is licensed under the GNU Free Documentation License and may be
redistributed and modified under those terms. Material in cyan, magenta,
and/or yellow inks which *just happens* to be on the same pages and *just
happens* to be about things mentioned in the text and *just happens* to
have been placed there by modifying the GFDL source text to include it is
used without permission of the copyright holder and can't be reused except
under very limited circumstances. These are actually separate documents,
you see, having been printed by separate inks, but for convenience all
four documents have been placed on the same page. They are not related
in any way. It's just a fluke!')

As it is, the GFDL pages we distribute include commands with the explicit
purpose of embedding particular non-GFDL material inside them, and correct
loading of pages is expected to load the images and display them
transparently without the user needing to notice that the data comes in
several chunks with different filenames. To claim they are separate
documents is at the least deceptive.

> The problem, it seems to me, isn't that we use the images,
> but that we pretend that we're using them under the GFDL.

Either we are, or we're violating the license of every article that
someone has modified by putting a non-GFDL picture into it.

> They're already separated out a bit (not in the download, after all),
> and we should separate them (or the non-free ones) out further,
> rather than claiming in some places that we're entirely free and GFDL,
> while claiming in other places that we use some images through fair use.
> (I'd even support placing fair use images in a separate namespace
> and database from the free ones, just to make things ultraclear.)
> Our fair use images should be treated as *auxiliary*.

I'm all in favor of putting them on a separate server with a separate
database and *not* embedding those images inline into articles, but
allowing explicitly external links to those images which users would have
to follow knowingly.

nonfree.wikipedia.org, anyone?

> >(even though we say it's absolutely vital that we include these images
> >to have a legitimate encyclopedia).
>
> It should ''never'' be vital to an article that we include an image,
> if we can help it, not only for distribution but also for accessibility.

Thanks! This removes the sense of urgency that drives some people to add
non-free images knowing they are not compatible with the project's goals.
Hopefully they'll stop immediately. :)

-- brion vibber (brion @ pobox.com)

Re: Re: Fair use [ In reply to ]

erik_moeller at gmx

May 30, 2003, 5:03 PM

Post #25 of 100 (865 views)

Brion-
> Coupla notes:
> * Said modules are distributed by their copyright owners with the
> supported hardware, not as part of the Linux kernel.
> * The Linux kernel runs just fine without them. The modules provide
> additional support for certain hardware only.
> * Said linking is done at run time on the user's machine; the combined
> result exists only in memory and is never redistributed.

> So by analogy "Linus Torvalds style pragmatism" might provide for a
> third-party filter program that inserts non-FDL images into Wikipedia
> articles on a reader's computer as they're loaded. ;)

Since Wikipedia is a client/server application, you also have to look at
what the server is doing. Image data is actually stored in a separate
table; they have their own namespaces and are transcluded on demand as a
page is fetched, so the analogy is applicable. Keep in mind that the FDL
is primarily intended to cover printed documents, so it may well offer us
a pretty decent loophole here.

The text [[Image:foo.png]] is, of course, under the FDL.

I'm sure if we pay a bunch of lawyers enough money, they can up with
plenty of reasons why using fair use content is perfectly acceptable. We
may need to do so for a reason that has so far been ignored in the
discussion:

Quotes of any length are fair use, too. It is quite clear that we cannot
do without them. We could move them into a separate namespace and
transclude them, though, which would actually be a very Xanadu-esque thing
to do.

Regards,

Erik