Mailing List Archive

Using TeX
Hi!

If you remember, I have set up a small very local wiki using the wikipedia
scripts. Thanks for the comments I received last time. They were very
valuable.

I have not solved my problem with page editing; I intended to reload all
the scripts. I got an account on sourceforge, but did not (yet) manage the
download. I have a password pb. Is it needed to be registered in the
project itself to download?

One of the reasons why I looked to wikipedia was because of maths and the
use of TeX.

However, when I looked to the details, I found (to my taste) too many
limitations with the followed approach. Then I wrote an extension to
outputpage.php to replace the call to texvc by calls to pdflatex then
imagemagick convert. This has reached the stage of minimal functionality
(translation and caching). This allows me to have full maths, without
having to remember what is supported or not, or modified, compared to
LaTeX. In addition, we intend to use it for small music scores (using Mu
sixTeX or possibly PMX), and I have in mind some other uses of LaTeX.

Please understand that I'm not trying to (re)open a debate. I've read the
"math markup" page on meta.wikipedia (and music markup as well), and I'm
aware of (some of) the drawbacks of the approach I followed. I mentionned
what I did just in conformance with GPL: if there is any interest in this
small piece of code (which I doubt, it's rather trivial!), just say it.

Michel Mouly
Re: Using TeX [ In reply to ]
Michel Mouly wrote:

>However, when I looked to the details, I found (to my taste) too many
>limitations with the followed approach. Then I wrote an extension to
>outputpage.php to replace the call to texvc by calls to pdflatex then
>imagemagick convert. This has reached the stage of minimal functionality
>(translation and caching). This allows me to have full maths, without
>having to remember what is supported or not, or modified, compared to
>LaTeX. In addition, we intend to use it for small music scores (using Mu-
>sixTeX or possibly PMX), and I have in mind some other uses of LaTeX.

Is your version safe against DoS attacks with long scripts?
Is it safe against running TeX commands that access files?
OTOH, does it allow inclusion of additional TeX packages (like Xypic)
with a simple modification to the code opening up the package?
If so, then some of us (me and AxelBoldt, I guess)
might well prefer your code to the current texvc --
at least when producing PNG output instead of HTML.

>Please understand that I'm not trying to (re)open a debate. I've read the
>"math markup" page on meta.wikipedia (and music markup as well), and I'm
>aware of (some of) the drawbacks of the approach I followed. I mentionned
>what I did just in conformance with GPL: if there is any interest in this
>small piece of code (which I doubt, it's rather trivial!), just say it.

I'd like to see the diff to see just what you took away from texvc.


-- Toby
RE: Using TeX [ In reply to ]
The mail is quite long, I 'included' some php code, and a small 'article'
at the end. I did it like that rather than to include files...

> -----Message d'origine-----
> De: Toby Bartels [SMTP:toby+wikipedia@math.ucr.edu]
> Date: lundi 31 mars 2003 07:02
> À: wikitech-l@wikipedia.org
> Objet: Re: [Wikitech-l] Using TeX
>
> Michel Mouly wrote:
>
> >However, when I looked to the details, I found (to my taste) too many
> >limitations with the followed approach. Then I wrote an extension to
> >outputpage.php to replace the call to texvc by calls to pdflatex then
> >imagemagick convert. This has reached the stage of minimal functionality
> >(translation and caching). This allows me to have full maths, without
> >having to remember what is supported or not, or modified, compared to
> >LaTeX. In addition, we intend to use it for small music scores (using
Mu-
> >sixTeX or possibly PMX), and I have in mind some other uses of LaTeX.
>
> Is your version safe against DoS attacks with long scripts?

I confess my ignorance on the topic. But I'm ready to learn. Maybe it is
relevant to mention that, contrarily to texvc, the
text to compile is not in the DOS calls: the script writes files and DOS
lines are pretty standard.

> Is it safe against running TeX commands that access files?

Safety is a real problem, I agree. I did not look in any details to the
question with LaTeX. The
small application I'm trying to set up with some friends is (or will be,
I've still this problem with a blank page
return after submit), I hope, sufficiently safe for reasons independent
from the php scripts. Maybe naive...

> OTOH, does it allow inclusion of additional TeX packages (like Xypic)
> with a simple modification to the code opening up the package?

Well, this can be done already just modifying the 'header.tex' file
(included). That is what I will do for music. My idea (see the article) is
that different markups to choose between header files.

BTW, using drawing packages like Xypic is also on my agenda, see article.

> If so, then some of us (me and AxelBoldt, I guess)
> might well prefer your code to the current texvc --
> at least when producing PNG output instead of HTML.

>
> >Please understand that I'm not trying to (re)open a debate. I've read
the
> >"math markup" page on meta.wikipedia (and music markup as well), and I'm
> >aware of (some of) the drawbacks of the approach I followed. I
mentionned
> >what I did just in conformance with GPL: if there is any interest in
this
> >small piece of code (which I doubt, it's rather trivial!), just say it.
>
> I'd like to see the diff to see just what you took away from texvc.
>

I include the relevant part of outputpage.php. It is
basically scratch code, to check if the idea is viable. At least error
handling requires further work. As you will see, I just 'mimicked' texvc
(same input format, same output format) and kept the rest of the code.

I include header.tex (the very basic and trivial one), for completion.

I also include a text I prepared with in mind the possibility to put it
somewhere in wikipedia or meta wikipedia; I'm too new in the business to
decide whether this is valuable, or where exactly to put it. Consider it
backgroupd information. It deals with 'source' for images or sounds: one of
my problems in my small project is music, and allowing others to modify
scores is important. LaTeX provides those tools as well.

An important point, hinted at in the text, is that compiling the 'source'
on the wikipedia site is not really necessary (though definitely useful).
Then security aspects should be less a problem, as well as computing load
(going through pdflatex is quite long on my machine).



>
> -- Toby
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@wikipedia.org
> http://www.wikipedia.org/mailman/listinfo/wikitech-l

This the beginning of outputpage.php; the rest is exactly as in the normal
script. The modifications are

1) first step, encapsulating the call to texvc;

2) second step, function 'fulltex', same input format, same output format
as the function encapsulting texvc. The functionality is very slightly
different: the '$' are included in the call to fulltex, so that it can be
used for text in non math mode.

First comes the 4-line 'header.tex'.

<header.tex>
\documentclass[12 pt]{article}
\pagestyle{empty}
\begin{document}
\LARGE
</header.tex>

<code>

# See design.doc

function linkToMathImage ( $tex, $outputhash )
{
global $wgMathPath;
return "<img src=\"".$wgMathPath."/".$outputhash.".png\"
alt=\"".wfEscapeHTML($tex)."\">";
}

function texvc($tex)
{
global $wgMathDirectory, $wgTmpDirectory, $wgInputEncoding;
$cmd = "./math/texvc ".escapeshellarg($wgTmpDirectory)." ".
escapeshellarg($wgMathDirectory)." ".escapeshellarg($tex)."
".escapeshellarg($wgInputEncoding);
return(`$cmd`);
}

function fullTeX($tex)
#same output syntax as texvc, generation done via pdfLatex and
Imagemagick convert
{
global $wgMathDirectory, $wgTmpDirectory, $wgInputEncoding;
global $wgPdflatex, $wgconvert;

#wgInputEncoding not taken into account, assumed to be compatible with
pdfLatex

#if (!isset($wgPdflatex)) $wgPdflatex = "pdflatex -quiet -halt-on-error
-interaction batchmode -output-directory $wgTmpDirectory";
# chdir() ok, while -output_directory leads to pb (pdflatex can't find
its own .aux!!)
if (!isset($wgPdflatex)) $wgPdflatex = "pdflatex -quiet -halt-on-error
-interaction batchmode";
if (!isset($Convert)) $wgConvert =
'C:\Programs\ImageMagick-5.5.6-Q16\convert';
$headerfilename = "$wgMathDirectory/header.tex";
#$headerfilename = 'header.tex';
$header = fopen($headerfilename, 'r');

$md5 = md5($tex);
$filename = "$wgTmpDirectory/$md5";

$fp = fopen($filename.".tex", 'w+');

fwrite($fp, fread($header, filesize ($headerfilename)));

fwrite($fp, "$tex\r");
fwrite($fp, "\\end{document}");
fclose($fp);

$backupcwd = getcwd();
chdir($wgTmpDirectory);
$cmd = "$wgPdflatex $filename.tex";
$res = `$cmd`;

#todo: test if error; OK if empty (thanks to option -quiet)

#$cmd = "$wgConvert $filename.pdf -trim -bordercolor white -border 5 x 5
$wgMathDirectory/$md5.png";
$cmd = "$wgConvert $filename.pdf -trim $wgMathDirectory/$md5.png";
$res = `$cmd`;

#todo: test if error; OK if empty

#todo : delete temporary files (should be kept for debug and error)
chdir($backupcwd); # don't know if needed, certainly cleaner
return ("+$md5");
}

function renderMath( $tex )
{
global $wgUser, $wgMathDirectory, $wgTmpDirectory, $wgInputEncoding;
$mf = wfMsg( "math_failure" );
$munk = wfMsg( "math_unknown_error" );

$fname = "renderMath";

$math = $wgUser->getOption("math");
if ($math == 3)
return ('$ '.wfEscapeHTML($tex).' $');

$md5 = md5($tex);
$md5_sql = mysql_escape_string(pack("H32", $md5));
if ($math == 0)
$sql = "SELECT math_outputhash FROM math WHERE math_inputhash =
'".$md5_sql."'";
else
$sql = "SELECT math_outputhash,math_html_conservativeness,math_html
FROM math WHERE math_inputhash = '".$md5_sql."'";

$res = wfQuery( $sql, $fname );
if ( wfNumRows( $res ) == 0 )
{
# $cmd = "./math/texvc ".escapeshellarg($wgTmpDirectory)." ".
# escapeshellarg($wgMathDirectory)." ".escapeshellarg($tex)."
".escapeshellarg($wgInputEncoding);
# $contents = `$cmd`;

### $contents = texvc($tex);
$contents = fullTeX("\$$tex\$");

if (strlen($contents) == 0)
return "<b>".$mf." (".$munk."): ".wfEscapeHTML($tex)."</b>";
$retval = substr ($contents, 0, 1);
if (($retval == "C") || ($retval == "M") || ($retval == "L")) {
if ($retval == "C")

<\code>

<article Non text elements in Wikipedia>

This discusses how to handle non text elements in wikipedia pages,
such as images, sounds, or math formulae. More precisely, this
advocates the possibility to have the 'source code' of such
elements, so that they can be modified as easily (almost!) as the
text can be modified.

Akin ideas have been discussed in the past (math markup, SVG
support, chess talk page, ...). I did not looked everywhere (by
far!), so the ideas propounded herein are likely not original! If
they are, the key aspect is that the proposed scheme is general,
not specific to one domain, whether it be math formula,
chessboards or vectorised images.

The present state

Documents can already include different types of material, namely
text, images and sounds.

For texts a 'source file', according to a special syntax, is
uploaded, and is 'compiled' (i.e., translated in HTML) by the
wikipedia site.

Images and sounds are simply uploaded. They are either included in
the text (images) or available for links (images and sounds).

There is an intermediate case, that of mathematical formula. They
are included in the visible page as images, but the 'source' is
uploaded and compiled by the site. Another peculiarity is that the
'source' is embedded in the text 'source'. And still another one
is that a special syntax is to be used (derived from TeX, but not
TeX).

That the images and sounds are uploaded 'as is' is, IMHO, in
contradiction with the general goal of wikipedia, in particular
easiness to modify.

In many cases, sounds and images have been, or could, be generated
from a 'source'. Making this 'source' available would have many
advantages:

* it would allow for free modifications, in conformance with the general
spirit;
* it would allow more or less automatic eventual change to another
format (e.g., extensions of HTML);
* it provides ready-to-use examples to other images/sounds/maths.

Let us take an example. Chess positions. This is done at present
with png images. They are quite nice, I agree, but how to modify
them? How to add new ones in the same style as the existing ones?
Simply because the images are difficult to reproduce, a set of
pages become difficult to extend upon. Either a different style of
drawing is used, and the result is not professional, or somebody
becomes an unavoidable intermediate! The talk page of the chess
article shows such concerns.

Imagine now a simple source code to draw chess positions (this
exists in LaTeX). To recipe for creating new drawings is obvious
and style is consistent. No blocking.

(To complete the example, source code for a chess position with
LaTeX could be (taken from LaTeX graphics companion):

\usepackage{chess}
\board{B* * * KR}
{*r* * *R*}
{* b p p}
{ *P*k*P*}
{*p* P *p}
{ P *P* P}
{* *N*N* }

Ok, this looks a bit esoteric, but this is a simple matrix, with
uppercase for white and lowercase for black, p for pawn, k for
king, n for knight, and so on. The result is a very nice and
professionally
looking drawing. Don't tell me the source is more esoteric
or difficult to use than, say, HTML.)


How to upload the source?

The case of math formula provides one approach: to embed the
source in the page text, with a special markup.

This raises then the issue of the generation of the 'compiled'
version. In the case of math, this is done by the site. This
offers the advantage to the users that they don't have to install
anything. On the other hand, this requires that the generation
software is installed on the site, thus limiting freedom, and
consumes some site resources (who does consider that the response
time is short enough??), in particular in the case of successive
corrections, e.g., to correct syntax errors.

The other possibility is to ask the user to upload both the source
and the result. This is more complex for the user, mainly because
this requires the software, but this allows for checking prior
upload (less load on the side, and possibly, all taken into
account, less operations for the user).

In practice (for the user), this consists in extending the upload
page to include:

* the result;
* (optional) the source;
* when not obvious, a description of the 'compiling' method (e.g.,
texvc, pdflatex with such or such header then imagemagick convert,
povray 3.5).

Conversely, clicking on a drawing (for instance) opens a page more
or less as the present one, extended with the source and the
compiling indications, plus the possibility to edit the source
code (exactly as for a text page).


Embedding in the page text can still be a possibility (better for
math than for images for instance), but either has to be limited
to what the site can compile, or has to be coupled with the upload
of the result.

Which formats are acceptable?

Ideally, the source format should be such that:

* it is in plain text;
* it is public, free of copyrights or other constraints;
* it is already in use;
* at least one free version of a 'compiler' is easily available,
easy to install, and easy to use for as many
platforms as possible;
* it must be as secure as possible (to prevent carrying nasty
code).

IMHO, texvc does not respect all the conditions.

Examples (in my limited knowledge) that do respect them include :

* music (scores): lilypond, musixtex;
* music (sound) : midi;
* math : LaTeX;
* images : povray (security??), drawing packages in LaTeX;

Browsing through LaTeX drawing packages, one can see the potential
richness of such a scheme. Could be mentioned, in any order, board
games, card games, graphs, Feynman diagrams, chemical diagrams,
electrical diagrams, ...


Should the list of formats be explicitly prescribed?

IMO, no. Wikipedia is assumed to be self-regulating. If a format
is considered wrong, somebody can transcribe it in something more
appropriate.

<\article>