Mailing List Archive

making python man pages looks hard
I'm having fun watching tchrist learn python. I learned perl from Tom
back in '90 or so. Anyway... I said my piece on perl v.s python back in '96,
and I don't have much to add since then:

Subject: Re: Python, Tcl and Perl, oh my! (was Re: tcl vs. perl)
Date: 1996/06/26
Newsgroups: comp.lang.perl.misc, comp.lang.tcl, comp.lang.perl.tk, comp.lang.python
http://www.egroups.com/group/python-list/11548.html?

But, having spent most of my career translating technical
documentation from one format to another, the gripe about a
lack of man pages for python got me thinking.
I happen to be a *big fan* of the python documentation[2] as is,
but I don't see any reason why it shouldn't be available as
man pages too. Or at least... I didn't.

[2] http://www.python.org/doc/

Then I looked at the source[3]

[3] http://www.python.org/ftp/python/doc/1.5.2p1/latex-1.5.2p1.tgz

The source format is LaTeX. Converting from LaTeX is hard/messy:

==============
For example, I conjecture that it is impossible to write a program that
will extract the third word from a TeX document. It would be an easy
task for 80% of the TeX documents out there -- just skip over some
formatting stuff and grab the third bunch of characters surrounded by
whitespace. But that "formatting stuff" might be a program that
generates 100 words from the hypenation dictionary. So the simple
lexical scan of the TeX source would find a word that is not third
word of the document when printed.

This may seem like an obscure and unimportant problem, but I
assure you that the problem of converting TeX tables to FrameMaker
MIF is just as unsolvable.

So while "programmable" document formats have the advantage that
features can be added on a per-document basis, they suffer the
disadvantage that these features cannot be recovered by the
machine and translated in an automated fashion.

excerpted from Toward Closure on HTML
1994/04/07
http://www.w3.org/People/Connolly/drafts/html-essay
==============

The author of one of the python doc tools seems to agree:

===============
# Why not start from LaTeX rather than HTML?
# I could hack latex2html itself to produce Texinfo instead, or fix up
# partparse.py (which already translates LaTeX to Teinfo).
# Pros:
# * has high-level information such as index entries, original formatting
# Cons:
# * those programs are complicated to read and understand
# * those programs try to handle arbitrary LaTeX input, track catcodes,
# and more: I don't want to go to that effort. HTML isn't as powerful
# as LaTeX, so there are fewer subtleties.
# * the result wouldn't work for arbitrary HTML documents; it would be
# nice to eventually extend this program to HTML produced from Docbook,
# Frame, and more.

excerpt from
# html2texi.pl -- Convert HTML documentation to Texinfo format
# Michael Ernst <mernst@cs.washington.edu>
# Time-stamp: <1999-01-12 21:34:27 mernst>
part of [3]
===============

The bewildering array of scripts, tools, and hacks used to
generate the HTML version of the python docs is frightening!
It suggests to me that *very few people* maintain the
python docs. That's good for consistency, but it's sort
of a cathedral[3] approach: there's a sharp line between
the "blessed" modules and Everything Else.

[3] http://www.tuxedo.org/~esr/writings/cathedral-bazaar/

It would be fairly easy to convert the HTML to nroff... I think there
are tools that do that... rosettaman or something? Yes... it
seems to have an option to convert back to roff format.
http://thsun1.jinr.dubna.su/cgi-bin/rman.pl?topic=rman&section=all
ftp://ftp.cs.berkeley.edu:/ucb/people/phelps/tcltk/rman.tar.Z

The trick would be dividing up the sections. The python HTML docs
aren't self-contained like the perl man pages.

I took a quick look at the python doc-sig, but I didn't find much
relevant info... they seem to be focussed on a javadoc
work-alike... hmm... maybe that is relevant; is the python
library reference source expected to move into docstrings?
That would be cool.

Anyway... I had hoped to contribute more, but this looks harder
than I expected, and I'm done for the day.

parting shot: CPAN is cool, but I find it frustrating that I can't
read the documentation for a module without downloading
and upacking the module. for example, I can browse
the list of modules,

http://www.perl.com/CPAN/modules/00modlist.long.html

but say I find one I'm interested in:

Mac::Comm::
::OT_PPP RdpO Control Open Transport PPP / Remote Access CNANDOR

the only link goes to the Author contact info. Gee thanks.

"Uniform Resource Identifiers (URIs, aka URLs) are short strings that identify
resources in the web: documents, images, downloadable files, services,
electronic mailboxes, and other resources. They make resources available under
a variety of naming schemes and access methods such as HTTP, FTP, and
Internet mail addressable in the same simple way. They reduce the tedium of "log
in to this server, then issue this magic command ..." down to a single click."
-- http://www.w3.org/Addressing/

--
Dan Connolly
http://www.w3.org/People/Connolly/
making python man pages looks hard [ In reply to ]
In article <1103_935134343@shoal>,
Dan Connolly <connolly@w3.org> wrote:
[snip]
! parting shot: CPAN is cool, but I find it frustrating that I can't
! read the documentation for a module without downloading
! and upacking the module. for example, I can browse
! the list of modules,
!
! http://www.perl.com/CPAN/modules/00modlist.long.html
!
! but say I find one I'm interested in:
!
! Mac::Comm::
! ::OT_PPP RdpO Control Open Transport PPP / Remote Access CNANDOR
!
! the only link goes to the Author contact info. Gee thanks.

that is usually no longer a valid complaint --- try the CPAN
front end at 'theory.uwinnipeg.ca', in particular, the search
engine:

http://theory.uwinnipeg.ca/search/cpan-search.html

I entered Mac::Com in the box and got back:

Mac-Comm-OT_PPP-1.20.tar.gz [readme / module list] 3.6 1998-01-06 CNANDOR
Mac::Comm::OT_PPP [ documentation]: Interface to Open Transport PPP

(and yes, those are links to the .tgz itself, the readme, a list of
modules in the package; and, surprise, "[documentation]" links to the
html'ified manpage).

regards
andrew

--
They're not soaking, they're rusting!
-- my wife on my dishwashing habits
making python man pages looks hard [ In reply to ]
[courtesy cc of this posting mailed to cited author]

In comp.lang.python,
connolly@w3.org (Dan Connolly) writes:
:parting shot: CPAN is cool, but I find it frustrating that I can't
:read the documentation for a module without downloading
:and upacking the module. for example, I can browse
:the list of modules,
:
:http://www.perl.com/CPAN/modules/00modlist.long.html

Try

http://search.cpan.org/

--tom
--
You have to admit that it's difficult to misplace the Perl sources. :-)
--Larry Wall in <1992Aug26.184221.29627@netlabs.com>
making python man pages looks hard [ In reply to ]
(posted and mailed)

Dan Connolly <connolly@w3.org> wrote:
> I happen to be a *big fan* of the python documentation[2] as is
> but I don't see any reason why it shouldn't be available as
> man pages too. Or at least... I didn't.

> The source format is LaTeX. Converting from LaTeX is hard/messy:
>
> ...
> For example, I conjecture that it is impossible to write a program that
> will extract the third word from a TeX document. It would be an easy
> task for 80% of the TeX documents out there -- just skip over some
> formatting stuff and grab the third bunch of characters surrounded by
> whitespace.
> ...

luckily, Fred Drake has already made most of the hard work
here -- check the Doc/tools/sgmlconv directory:

...

$ more Doc/tools/sgmlconv/README
These scripts and Makefile fragment are used to convert the Python
documentation in LaTeX format to SGML. XML is also supported as a
target, but is unlikely to be used.

This material is preliminary and incomplete.

...

not sure how incomplete, though. I don't have things setup
so I can try the current release of this (from the CVS archive),
but maybe Fred can give us a status update?

(maybe he has, I cannot say I've bothered to read *all* the
messages on the group today... sigh...)

anyway, if this stuff works, writing an esis2man converter
cannot be *that* hard.

</F>
making python man pages looks hard [ In reply to ]
Dan Connolly <connolly@w3.org> wrote:
> The source format is LaTeX. Converting from LaTeX is hard/messy:

Dan,
It sure is! You'll get no argument from me on that one.

> For example, I conjecture that it is impossible to write a program that
> will extract the third word from a TeX document. It would be an easy
> task for 80% of the TeX documents out there -- just skip over some
> formatting stuff and grab the third bunch of characters surrounded by

Not impossible, but painful enough in the general case that I have
no plans to write the code to do it (in any language!).

Fredrik Lundh writes:
> luckily, Fred Drake has already made most of the hard work
> here -- check the Doc/tools/sgmlconv directory:

Dang, my secret has been found! ;-)
Yes, there's a good bit of potentially interesting material there.

> not sure how incomplete, though. I don't have things setup
> so I can try the current release of this (from the CVS archive),
> but maybe Fred can give us a status update?

I don't have time today, but I'll try to send a status report to the
Python doc-sig next week. For anyone interested but not subscribed to
the doc-sig mailing list, see http://www.python.org/sigs/doc-sig/ for
information.

> anyway, if this stuff works, writing an esis2man converter
> cannot be *that* hard.

Parsing esis isn't hard (it's in the XML package!), and generating
man pages isn't hard ("print" is a language statement in Python,
right?). The difficulty is the semantic mapping; what do you want on
you manpages, how do you want them organized, etc. The existing
structure may not be trivial to transform into manpages, regardless of
syntax. Having a DOM instance for the library reference doesn't make
manpages a one-liner. ;-)


-Fred

--
Fred L. Drake, Jr. <fdrake@acm.org>
Corporation for National Research Initiatives
making python man pages looks hard [ In reply to ]
>>>>> "DC" == Dan Connolly <connolly@w3.org> writes:

DC> The bewildering array of scripts, tools, and hacks used to
DC> generate the HTML version of the python docs is frightening!

I think that's the Quote of the Day. Wouldn't you agree Fred? :)

DC> It suggests to me that *very few people* maintain the
DC> python docs.

few == one. Fred Drake, a.k.a. Dr. Doc.

speaking-as-a-beta-tester-of-fred's-html-documentation-ly y'rs,
-Barry
making python man pages looks hard [ In reply to ]
Tom> Try

Tom> http://search.cpan.org/

Tom,

Just out of curiosity, how is CPAN populated and kept in sync? I went the
the Winnipeg site at

http://theory.uwinnipeg.ca/search/cpan-search.html

and searched for Frontier (Ken MacLeod's Perl XML-RPC package) and found a
couple links to a readme and a tgz. Clickly the readme link failed the
first time, but succeeded the second (guess it was choosing different
mirrors and got a hoser the first time).

I then tried the same search at search.cpan.org and got no relevant hits.

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/~skip/
847-971-7098
making python man pages looks hard [ In reply to ]
connolly@w3.org (Dan Connolly) writes:

>
> The source format is LaTeX. Converting from LaTeX is hard/messy:

Well - the docs aren't in standard LaTeX, are they? According to
www.python.org/doc/doc:

"With almost no basic TeX or LaTeX markup in use, [...] the markup
syntax is about the only evidence of LaTeX in the actual document
sources."

Since the format seems to be clearly and simply defined, it shouldn't
be *that* difficult... Anyway - a transition to XML seems to be on its
way, which will make all this trivial. (Actually, it wouldn't be that
hard to go from the HTML versions either, I guess, with a little
imagination. My point is - there is no reason to be spooked by the
complexity of LaTeX here...)

--

Magnus Making no sound / Yet smouldering with passion
Lie The firefly is still sadder / Than the moaning insect
Hetland : Minamoto Shigeyuki
making python man pages looks hard [ In reply to ]
I've contacted the maintainer who will get back to you on it.

--tom
making python man pages looks hard [ In reply to ]
"Magnus L. Hetland" wrote:
>
> connolly@w3.org (Dan Connolly) writes:
>
> >
> > The source format is LaTeX. Converting from LaTeX is hard/messy:
>
> Well - the docs aren't in standard LaTeX, are they? According to
> www.python.org/doc/doc:
>
> "With almost no basic TeX or LaTeX markup in use, [...] the markup
> syntax is about the only evidence of LaTeX in the actual document
> sources."
>
> Since the format seems to be clearly and simply defined, it shouldn't
> be *that* difficult... Anyway - a transition to XML seems to be on its
> way, which will make all this trivial. (Actually, it wouldn't be that
> hard to go from the HTML versions either, I guess, with a little
> imagination. My point is - there is no reason to be spooked by the
> complexity of LaTeX here...)

In I:\cvsroot\python\dist\src\Doc\tools\sgmlconv
I find some README and a couple of scripts.
latex2esis.py
esis2sgml.py

and it looks to be able to produce XML. ?

Then I'd take an XML parser and try to generate man pages :-)
I have just no idea of *that* format.

ciao - chris (starting a "who writes the shortest man.py")

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home
making python man pages looks hard [ In reply to ]
Please contact me if you need help building the man pages.

ergowolf
ergowolf@mindspring.com

Tom Christiansen wrote:

> I've contacted the maintainer who will get back to you on it.
>
> --tom