--Dxnq1zWXvFF0Q93v
Content-Type: text/plain; charset=us-ascii
On Mon, Jun 21, 1999 at 12:56:07AM +0900, Matt Gushee wrote:
> KP <terocr@mysolution.com> writes:
>
> > Here's my dilema: a directory filled (200+) with small emails. My goal
> > is to strip all the headers and combine them into one file. I can read
> > all the files just fine and write them all to one file, but I cannot
> > discern how to strip the headers.
>
> I have no expertise in this area, but I've been reading the "Internet
> Data Handling" section of the Library Reference (Ch. 12 of the 1.5.2
> edition), and it seems like there are several modules that might help
> you. In particular, check out 'rfc822.'
>
> Hope this helps.
>
> Matt Gushee
> Portland, Maine, USA
> mgushee@havenrock.com
>
I wrote a small piece of code that does *exactly* what you are describing.
it doesn't exactly strip the headers, but it parses the message using rfc822
and deals with it. you'll find it attached to this message. if for some
reason it doesn't come through, let me know, and I'll resend it.
regards,
Jeff
--
|| visit gfd <http://quark.newimage.com/>
|| psa member #293 <http://www.python.org/>
|| New Image Systems & Services, Inc. <http://www.newimage.com/>
--Dxnq1zWXvFF0Q93v
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="importcola.py"
#!/usr/bin/env python
import os
import dircache
import mimetools
import colacanister
import getdate
from rfc822 import Message
_COLAROOT="/home/jam/projects/cola/cola.archive"
_COLABASEHREF="
http://www.cs.helsinki.fi/%7Emjrauhal/linux/cola.archive/" if __name__ == "__main__":
l = dircache.listdir(_COLAROOT)
print len(l)
for item in l:
p = os.path.join(_COLAROOT, item)
if os.path.isdir(p):
articles = dircache.listdir(p)
for a in articles:
if a[:5] != "cola." and a[:4] != "mjr.":
continue
fp = open(os.path.join(p, a), "r")
m = Message(fp, seekable=0)
fp.close()
if not m.has_key("subject"):
print "** message does not have subject line. skipped."
continue
url = os.path.join(item, a)
print "processing '%s'" % (url),
if colacanister.get_cola_by_archiveurl(url) is None:
c = colacanister.colacanister()
c["cola_from"] = m["from"]
if m.has_key("date"):
c["cola_dateposted"] = getdate.getdate(m["date"])
c["cola_subject"] = m["subject"]
c["cola_archiveurl"] = url
c.insert()
print "added."
else:
print "already archived."
--Dxnq1zWXvFF0Q93v--