Mailing List Archive

html to xml to html-pipeline
Hi,

Lately I read in a document from the forrest site (a pdf I do not find
yer again...) that the pipeline for native html input is
something like:
HTML -> JTity and Cocoon -> html-to-document.xsl -> ...?? ...->
HTML-output or PDF or ...

Is this right so far? Mainly the JTity and Cocoon pipe: Is that
configurable for example to avoid JTidy to clean several topics from my
HTML?
Can I catch the output just before it goes to html-to-document.xsl for
debugging?
My target is some extra pipeline I add in front for example to add an
xinclude to the HTML which possibly can be used later in the process...
I am looking for ways to automatically create a section numbering in my
documents and other useful stuff like indexing and maybe a bibliography
framework.

Kind regards

Thomas
Re: html to xml to html-pipeline [ In reply to ]
EMMEL Thomas wrote:
> Hi,
>
> Lately I read in a document from the forrest site (a pdf I do not find
> yer again...) that the pipeline for native html input is
> something like:
> HTML -> JTity and Cocoon -> html-to-document.xsl -> ...?? ...->
> HTML-output or PDF or ...

The document you refer to is probably [1]

> Is this right so far? Mainly the JTity and Cocoon pipe: Is that
> configurable for example to avoid JTidy to clean several topics from my
> HTML?

JTidy is highly configurable (see the JTidy website) however, it can't
remove chunks of your HTML, it's job is to tidy up the existing HTML -
make it well formed etc. If you want to remove chunks of your HTML you
need a custom transformation, this is documented in [1] (see
"Customizing the html pipeline")

Cocoon is the application framework Forrest is built on there is no
"cocoon" pipeline, it is the pipeline "engine".

> Can I catch the output just before it goes to html-to-document.xsl for
> debugging?

Yes, override the match that does the transformations in your project
sitemap and remove the line that does the html-to-document transformation.

> My target is some extra pipeline I add in front for example to add an
> xinclude to the HTML which possibly can be used later in the process...
> I am looking for ways to automatically create a section numbering in my
> documents and other useful stuff like indexing and maybe a bibliography
> framework.

Section numbering should be done at the skinning stage not at the
transformation to XDoc. It is part of the rendering not the content.

As for bibliography there is a plugin in the whiteboard that goes
someway towards this. Documentation is non-existent (well it's the code)
and more work is needed, but it is a good starting point.

Ross

[1] http://forrest.apache.org/docs_0_80/howto/howto-custom-html-source.html