Mailing List Archive

Code to convert MS Word documents
Hello Ross,

you wrote, that you are willing to donate the code of your plugin
converting a MS Word document to an OOo document
(http://issues.apache.org/jira/browse/FOR-746).

Is it possible that you send the code to me. That would be wonderful.

Or maybe there are some other possibilities to import a MS Word document
into Forrest?


Martin Gmuer

--
Software development

User Interface Design GmbH, Ludwigsburg, Germany
Phone +49 7141 37700-87, Fax +49 7141 377 00-99
E-mail martin.gmuer@uidesign.de * www.uidesign.de

Offices:
Teinacher Strasse 38, D-71634 Ludwigsburg
Truderinger Strasse 330, D-81825 Muenchen
Friedrichsring 46, D-68161 Mannheim

Legal information according to EHUG:
User Interface Design GmbH; Managing Directors: Dr. Claus Goerner,
Franz Koller; Head office: Ludwigsburg; Commercial register of the
local court of Stuttgart, Germany, HRB 205519
Re: Code to convert MS Word documents [ In reply to ]
Martin Gmür schrieb:
> Hello Ross,
>
> you wrote, that you are willing to donate the code of your plugin
> converting a MS Word document to an OOo document
> (http://issues.apache.org/jira/browse/FOR-746).
>
> Is it possible that you send the code to me. That would be wonderful.

Thanks for sending the plugin!
We got it all deployed but get an error for our and the sample document
(http://localhost:8888/sample/wordSample.html) that we cannot resolve:

URL seems to be an unsupported one.

Any idea why this happens?

Thanks in advance
Johannes and Martin


Internal Server Error
Message: null
Description: No details available.
Sender: org.apache.cocoon.servlet.CocoonServlet
Source: Cocoon Servlet
Request URI
sample/wordSample.html

cause
URL seems to be an unsupported one.

request-uri
/sample/wordSample.html
Apache Cocoon 2.2.0-dev



>
> Or maybe there are some other possibilities to import a MS Word document
> into Forrest?
>
>
> Martin Gmuer
>

--
User Interface Design GmbH, Ludwigsburg, Germany
Phone/Fax +49 7141 37700-46/-99, Mobile +49 170 4914567
E-mail johannes.schaefer@uidesign.de * www.uidesign.de

Offices:
Teinacher Strasse 38, D-71634 Ludwigsburg
Truderinger Strasse 330, D-81825 Muenchen
Friedrichsring 46, D-68161 Mannheim

Legal information according to EHUG:
User Interface Design GmbH; Managing Directors: Dr. Claus Goerner,
Franz Koller; Head office: Ludwigsburg; Commercial register of the
local court of Stuttgart, Germany, HRB 205519
Re: Code to convert MS Word documents [ In reply to ]
On 14/05/07, Johannes Schaefer <johannes.schaefer@uidesign.de> wrote:
> Martin Gmür schrieb:
> > Hello Ross,
> >
> > you wrote, that you are willing to donate the code of your plugin
> > converting a MS Word document to an OOo document
> > (http://issues.apache.org/jira/browse/FOR-746).
> >
> > Is it possible that you send the code to me. That would be wonderful.
>
> Thanks for sending the plugin!
> We got it all deployed but get an error for our and the sample document
> (http://localhost:8888/sample/wordSample.html) that we cannot resolve:
>
> URL seems to be an unsupported one.
>
> Any idea why this happens?
>
> Thanks in advance
> Johannes and Martin
>
>
> Internal Server Error
> Message: null
> Description: No details available.
> Sender: org.apache.cocoon.servlet.CocoonServlet
> Source: Cocoon Servlet
> Request URI
> sample/wordSample.html
>
> cause
> URL seems to be an unsupported one.
>
> request-uri
> /sample/wordSample.html
> Apache Cocoon 2.2.0-dev

Never seen this before. Just a couple of guesses for now, I've not
looked at the code:

- you are using Forrest 0.7
- you are using OOo 1.0 (this was not written against 2.0)
- you start OOo as a server for forrest to use before starting forrest
- Does core.log indicate what URL is the problem (there is some
internal requests and it is likely one of those URLS that are
unsupported)

I'll try and have a look at the code sometime soon.

Ross
Re: Code to convert MS Word documents [ In reply to ]
Ross Gardler schrieb:
> On 14/05/07, Johannes Schaefer <johannes.schaefer@uidesign.de> wrote:
>> Martin Gmür schrieb:
>> > Hello Ross,
>> >
>> > you wrote, that you are willing to donate the code of your plugin
>> > converting a MS Word document to an OOo document
>> > (http://issues.apache.org/jira/browse/FOR-746).
>> >
>> > Is it possible that you send the code to me. That would be wonderful.
>>
>> Thanks for sending the plugin!
>> We got it all deployed but get an error for our and the sample document
>> (http://localhost:8888/sample/wordSample.html) that we cannot resolve:
>>
>> URL seems to be an unsupported one.
>>
>> Any idea why this happens?
>>
>> Thanks in advance
>> Johannes and Martin
>>
>>
>> Internal Server Error
>> Message: null
>> Description: No details available.
>> Sender: org.apache.cocoon.servlet.CocoonServlet
>> Source: Cocoon Servlet
>> Request URI
>> sample/wordSample.html
>>
>> cause
>> URL seems to be an unsupported one.
>>
>> request-uri
>> /sample/wordSample.html
>> Apache Cocoon 2.2.0-dev
>
> Never seen this before. Just a couple of guesses for now, I've not
> looked at the code:
>
> - you are using Forrest 0.7

yes

> - you are using OOo 1.0 (this was not written against 2.0)

sorry, will try with the "legacy build" 1.1.5 of OOo

> - you start OOo as a server for forrest to use before starting forrest

yes

> - Does core.log indicate what URL is the problem (there is some
> internal requests and it is likely one of those URLS that are
> unsupported)
>
> I'll try and have a look at the code sometime soon.

don't do anything before I complain (or report success!)
Johannes

>
> Ross
>

--
User Interface Design GmbH, Ludwigsburg, Germany
Phone/Fax +49 7141 37700-46/-99, Mobile +49 170 4914567
E-mail johannes.schaefer@uidesign.de * www.uidesign.de

Offices:
Teinacher Strasse 38, D-71634 Ludwigsburg
Truderinger Strasse 330, D-81825 Muenchen
Friedrichsring 46, D-68161 Mannheim

Legal information according to EHUG:
User Interface Design GmbH; Managing Directors: Dr. Claus Goerner,
Franz Koller; Head office: Ludwigsburg; Commercial register of the
local court of Stuttgart, Germany, HRB 205519
Re: Code to convert MS Word documents [ In reply to ]
Ross Gardler schrieb:
> On 14/05/07, Johannes Schaefer <johannes.schaefer@uidesign.de> wrote:

<snip/>

> Never seen this before. Just a couple of guesses for now, I've not
> looked at the code:
>
> - you are using Forrest 0.7
> - you are using OOo 1.0 (this was not written against 2.0)

OK, using OOo 1.1.5 [1] works way better!
Thanks Ross, the plug-in works (almost) out of the box.

I can't get "graphics" displayed (Neither embedded nor linked,
Forrest shows the ALT text instead), e.g.
<img alt="Graphic1"
src="openOfficeEmbeddedImage/zip-wordSample-bild.sxw/file-Pictures/2000000F000040CE0000309A05877442.svm">

Did this work formerly?
(Images get displayed correctly with the OOo-Plugin).
Is there a way to get images working?

I get special classes for bold or italics, e.g.
you can make a word appear <span class="T1">strong</span>,
or <span class="T2">emphasised.</span>
These I may format "correctly" but I don't know if "T1" is constant?!

Some paragraphs get CLASS="instruction", e.g. (but not only inside <ul>)
<ul>
<li>
<p class="instruction">Sub lists</p>
</li>
</ul>
which inserts an arrow at the beginning (this is from screen.css).

Do you have "Formatting Guidelines" for Word?
Like: only use standard formats, nest headings properly and so on,
insert images as links
We would have a big (>250 pages) document to insert (in pieces),
this will involve a lot of manual work anyway.

You used some code from [2].
Do you know if there are any (relevant) updates?

Thanks a lot for your support!
Johannes


[1] http://download.openoffice.org/1.1.5/index.html
[2] http://api.openoffice.org/

> - you start OOo as a server for forrest to use before starting forrest
> - Does core.log indicate what URL is the problem (there is some
> internal requests and it is likely one of those URLS that are
> unsupported)
>
> I'll try and have a look at the code sometime soon.
>
> Ross
>

--
User Interface Design GmbH, Ludwigsburg, Germany
Phone/Fax +49 7141 37700-46/-99, Mobile +49 170 4914567
E-mail johannes.schaefer@uidesign.de * www.uidesign.de

Offices:
Teinacher Strasse 38, D-71634 Ludwigsburg
Truderinger Strasse 330, D-81825 Muenchen
Friedrichsring 46, D-68161 Mannheim

Legal information according to EHUG:
User Interface Design GmbH; Managing Directors: Dr. Claus Goerner,
Franz Koller; Head office: Ludwigsburg; Commercial register of the
local court of Stuttgart, Germany, HRB 205519
Re: Code to convert MS Word documents [ In reply to ]
from a chat with Ross:
<snip/>
[16:22:51] Johannes Schäfer :
one quick question: did graphics work from MSWord?
[16:23:39] Ross Gardler :
To be honest I really can't remember. I know I got them to work in the
OOo plugin and therefore they should work. But whether they were tested or
not...
[16:24:13] Johannes Schäfer :
there're none in the sample pages, so I assumed they don't work (ooo
works indeed)
[16:25:39] Ross Gardler :
I'd suggest looking to see if you can get the interim document that is
output from the MSWord generator and seeing what there is in there. They
should be in the produced zip (at least I think the generator produces a
zip - it really has been a long time)
[16:29:51] Ross Gardler :
To see what the MS generator creates try adding the following to
input.xml in the plugin:
[16:30:34] Ross Gardler :
<map:match pattern="**.OOo.xml">
<map:when test="{project:content.xdocs}{uri}.doc">
<map:match type="regexp" pattern="^(.*?)([^/]*).xml$">
<map:generate src="{project:content.xdocs}{1}{2}.doc"/>
<map:serialize/>
</map:match>
[16:30:51] Ross Gardler :
You may need to use a binary serialiser if it does produce a zip

<snip/>

[16:34:17] Ross Gardler : yes - the only reason it is not there is the
license issue. Perhaps you could dump what you have in SVN on the
sourceforge project then if anyone wants to get at what we are discussing
they can - I don't care about the license issues at the moment (it is all
open source stuff)
[16:35:32] Ross Gardler :
http://sourceforge.net/projects/forrestplugins/

<snip/>

Ross dumped "his" plugins there. Thanks!
See

http://forrestplugins.svn.sourceforge.net/viewvc/forrestplugins/trunk/forrestPlugins/

Cheers
Johannes


Johannes Schaefer schrieb:
> Ross Gardler schrieb:
>> On 14/05/07, Johannes Schaefer <johannes.schaefer@uidesign.de> wrote:
>
> <snip/>
>
>> Never seen this before. Just a couple of guesses for now, I've not
>> looked at the code:
>>
>> - you are using Forrest 0.7
>> - you are using OOo 1.0 (this was not written against 2.0)
>
> OK, using OOo 1.1.5 [1] works way better!
> Thanks Ross, the plug-in works (almost) out of the box.
>
> I can't get "graphics" displayed (Neither embedded nor linked,
> Forrest shows the ALT text instead), e.g.
> <img alt="Graphic1"
> src="openOfficeEmbeddedImage/zip-wordSample-bild.sxw/file-Pictures/2000000F000040CE0000309A05877442.svm">
>
> Did this work formerly?
> (Images get displayed correctly with the OOo-Plugin).
> Is there a way to get images working?
>
> I get special classes for bold or italics, e.g.
> you can make a word appear <span class="T1">strong</span>,
> or <span class="T2">emphasised.</span>
> These I may format "correctly" but I don't know if "T1" is constant?!
>
> Some paragraphs get CLASS="instruction", e.g. (but not only inside <ul>)
> <ul>
> <li>
> <p class="instruction">Sub lists</p>
> </li>
> </ul>
> which inserts an arrow at the beginning (this is from screen.css).
>
> Do you have "Formatting Guidelines" for Word?
> Like: only use standard formats, nest headings properly and so on,
> insert images as links
> We would have a big (>250 pages) document to insert (in pieces),
> this will involve a lot of manual work anyway.
>
> You used some code from [2].
> Do you know if there are any (relevant) updates?
>
> Thanks a lot for your support!
> Johannes
>
>
> [1] http://download.openoffice.org/1.1.5/index.html
> [2] http://api.openoffice.org/
>
>> - you start OOo as a server for forrest to use before starting forrest
>> - Does core.log indicate what URL is the problem (there is some
>> internal requests and it is likely one of those URLS that are
>> unsupported)
>>
>> I'll try and have a look at the code sometime soon.
>>
>> Ross
>>
>

--
User Interface Design GmbH, Ludwigsburg, Germany
Phone/Fax +49 7141 37700-46/-99, Mobile +49 170 4914567
E-mail johannes.schaefer@uidesign.de * www.uidesign.de

Offices:
Teinacher Strasse 38, D-71634 Ludwigsburg
Truderinger Strasse 330, D-81825 Muenchen
Friedrichsring 46, D-68161 Mannheim

Legal information according to EHUG:
User Interface Design GmbH; Managing Directors: Dr. Claus Goerner,
Franz Koller; Head office: Ludwigsburg; Commercial register of the
local court of Stuttgart, Germany, HRB 205519