Mailing List Archive

getting attachments out of MS Office files
Has anyone had any success in finding a consistant, reliable way of locating and
extracting attachments out of MS Office files?

I realise that this doesn't really fall into the true purpose of ripMIME, but
one thing is for certain, it's perhaps the most common way of sneaking files
past most filtering agents.

I have thus far located a consistant 'header' byte-sequence before each embedded
file from the 2nd embedment on, but I cannot as yet find a consistant header for
the first embedded file (Beats me why!)

Furthermore - due the BSD licence of ripMIME, it's not really possible to call
on GPL licenced code, mostly because I wish to embed the code and compile it
directly.

Ideas? Anyone with their hands near a MS OLE specification which is actually
human-readable?

Regards.

--
Paul L Daniels http://www.pldaniels.com
Linux/Unix systems Internet Development
ICQ#103642862,AOL:pldsoftware,IRC:inflex irc.freenode.net
A.B.N. 19 500 721 806
RE: getting attachments out of MS Office files [ In reply to ]
> Ideas? Anyone with their hands near a MS OLE specification
> which is actually
> human-readable?

I found this web-site:
http://user.cs.tu-berlin.de/~schwartz/pmh/
To quote it "is a collection of documentations and perl programs dealing
with binary file formats of Windows program documents"

and the Compound File Binary Format (Microsoft Document):
http://www.aafassociation.org/html/specs/ssspecification/stgfmt2.doc
"This document describes the on-disk format of the Compound File, used as
the underpinnings of the structured storage support for OLE 2.0."

I tried reading it, but got a headache.

Hope this helps.
Chris