Mailing List Archive

Mailbox, tell and read
From: "Gordon McMillan" <gmcm@hypernet.com>

Paul Prescod writes:

> The mailbox message objects simulate a file interface by seeking
> through a mailbox. A read() returns the entire message. It does this
> by seeking to the start, reading to the message end, and returning
> the data. The problem is that message offsets are derived by
> file.tell() which I presume is in terms of bytes. But then it does a
> read() based on the offsets and read() is in terms of characters.
>
> These don't match up on Windows because of CR/LF pairs. I think that
> the right fix is to never use tell() and instead keep track of the
> location by counting characters. Does that make sense? --

How 'bout opening in binary, and munging line endings yourself just
before returning the message? If you can't use tell(), you can't
trust seek(), either.

- Gordon
Mailbox, tell and read [ In reply to ]
The mailbox message objects simulate a file interface by seeking through a
mailbox. A read() returns the entire message. It does this by seeking to
the start, reading to the message end, and returning the data. The problem
is that message offsets are derived by file.tell() which I presume is in
terms of bytes. But then it does a read() based on the offsets and read()
is in terms of characters.

These don't match up on Windows because of CR/LF pairs. I think that the
right fix is to never use tell() and instead keep track of the location by
counting characters. Does that make sense?
--
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself
http://itrc.uwaterloo.ca/~papresco

"I don't want you to describe to me -- not ever -- what you were doing
to that poor boy to make him sound like that; but if you ever do it
again, please cover his mouth with your hand," Grandmother said.
-- John Irving, "A Prayer for Owen Meany"
Mailbox, tell and read [ In reply to ]
From: Paul Prescod <paul@prescod.net>

The mailbox message objects simulate a file interface by seeking through a
mailbox. A read() returns the entire message. It does this by seeking to
the start, reading to the message end, and returning the data. The problem
is that message offsets are derived by file.tell() which I presume is in
terms of bytes. But then it does a read() based on the offsets and read()
is in terms of characters.

These don't match up on Windows because of CR/LF pairs. I think that the
right fix is to never use tell() and instead keep track of the location by
counting characters. Does that make sense?
--
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself
http://itrc.uwaterloo.ca/~papresco

"I don't want you to describe to me -- not ever -- what you were doing
to that poor boy to make him sound like that; but if you ever do it
again, please cover his mouth with your hand," Grandmother said.
-- John Irving, "A Prayer for Owen Meany"
Mailbox, tell and read [ In reply to ]
Paul Prescod writes:

> The mailbox message objects simulate a file interface by seeking
> through a mailbox. A read() returns the entire message. It does this
> by seeking to the start, reading to the message end, and returning
> the data. The problem is that message offsets are derived by
> file.tell() which I presume is in terms of bytes. But then it does a
> read() based on the offsets and read() is in terms of characters.
>
> These don't match up on Windows because of CR/LF pairs. I think that
> the right fix is to never use tell() and instead keep track of the
> location by counting characters. Does that make sense? --

How 'bout opening in binary, and munging line endings yourself just
before returning the message? If you can't use tell(), you can't
trust seek(), either.

- Gordon
Mailbox, tell and read [ In reply to ]
From: Paul Prescod <paul@prescod.net>

Gordon McMillan wrote:
>
> How 'bout opening in binary, and munging line endings yourself just
> before returning the message?

First, are there platforms other than DOS/Windows/Win32 where there is a
difference between text and binary mode? I didn't want to munge things
myself because I'm not sure if what is the "right" munge on the Mac, for
instance. Can I just do a search and replace for CR/LF -> CR on all
platforms, all of the time?

> If you can't use tell(), you can't trust seek(), either.

Good point.

--
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself
http://itrc.uwaterloo.ca/~papresco

"I don't want you to describe to me -- not ever -- what you were doing
to that poor boy to make him sound like that; but if you ever do it
again, please cover his mouth with your hand," Grandmother said.
-- John Irving, "A Prayer for Owen Meany"
Mailbox, tell and read [ In reply to ]
From: "Gordon McMillan" <gmcm@hypernet.com>

Paul Prescod wrote:

> Gordon McMillan wrote:
> >
> > How 'bout opening in binary, and munging line endings yourself just
> > before returning the message?
>
> First, are there platforms other than DOS/Windows/Win32 where there
> is a difference between text and binary mode?

There's a theoretical difference on *nix - otherwise, they wouldn't
have made up the distinction!

> I didn't want to munge
> things myself because I'm not sure if what is the "right" munge on
> the Mac, for instance. Can I just do a search and replace for CR/LF
> -> CR on all platforms, all of the time?

On Windows, Notepad (and the underlying default text widget) require
\r\n, but newer widgets / editors will be happy with \n. On *nix,
most tools will show a \r as a noise character. No idea how
forgiving Mac tools are.

AFAIK, we have
*nix -> \n
Windows -> \r\n
Mac -> \r

However, since you're dealing with messages created who-knows-where
and run through who-knows-what transformations, you may well have to
munge no matter what.


- Gordon
Mailbox, tell and read [ In reply to ]
Gordon McMillan wrote:
>
> How 'bout opening in binary, and munging line endings yourself just
> before returning the message?

First, are there platforms other than DOS/Windows/Win32 where there is a
difference between text and binary mode? I didn't want to munge things
myself because I'm not sure if what is the "right" munge on the Mac, for
instance. Can I just do a search and replace for CR/LF -> CR on all
platforms, all of the time?

> If you can't use tell(), you can't trust seek(), either.

Good point.

--
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself
http://itrc.uwaterloo.ca/~papresco

"I don't want you to describe to me -- not ever -- what you were doing
to that poor boy to make him sound like that; but if you ever do it
again, please cover his mouth with your hand," Grandmother said.
-- John Irving, "A Prayer for Owen Meany"
Mailbox, tell and read [ In reply to ]
In article <377B7742.F39CB0A4@prescod.net>,
Paul Prescod <paul@prescod.net> wrote:
>
>First, are there platforms other than DOS/Windows/Win32 where there is a
>difference between text and binary mode? I didn't want to munge things
>myself because I'm not sure if what is the "right" munge on the Mac, for
>instance. Can I just do a search and replace for CR/LF -> CR on all
>platforms, all of the time?

I ran into this last year (while I was still a Perl heretic).
Unfortunately, you *have* to use binary mode in opening a mail file
because one of the legal MIME attachment types is non-encoded binary --
and that means you can't just blindly do a search/replace, either.
Never did come to a completely satisfactory answer. :-(

I think CR will break on Unix, but I'm not sure.
--
--- Aahz (@netcom.com)

Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het
Mailbox, tell and read [ In reply to ]
From: aahz@netcom.com (Aahz Maruch)

In article <377B7742.F39CB0A4@prescod.net>,
Paul Prescod <paul@prescod.net> wrote:
>
>First, are there platforms other than DOS/Windows/Win32 where there is a
>difference between text and binary mode? I didn't want to munge things
>myself because I'm not sure if what is the "right" munge on the Mac, for
>instance. Can I just do a search and replace for CR/LF -> CR on all
>platforms, all of the time?

I ran into this last year (while I was still a Perl heretic).
Unfortunately, you *have* to use binary mode in opening a mail file
because one of the legal MIME attachment types is non-encoded binary --
and that means you can't just blindly do a search/replace, either.
Never did come to a completely satisfactory answer. :-(

I think CR will break on Unix, but I'm not sure.
--
--- Aahz (@netcom.com)

Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het
Mailbox, tell and read [ In reply to ]
Paul Prescod wrote:

> Gordon McMillan wrote:
> >
> > How 'bout opening in binary, and munging line endings yourself just
> > before returning the message?
>
> First, are there platforms other than DOS/Windows/Win32 where there
> is a difference between text and binary mode?

There's a theoretical difference on *nix - otherwise, they wouldn't
have made up the distinction!

> I didn't want to munge
> things myself because I'm not sure if what is the "right" munge on
> the Mac, for instance. Can I just do a search and replace for CR/LF
> -> CR on all platforms, all of the time?

On Windows, Notepad (and the underlying default text widget) require
\r\n, but newer widgets / editors will be happy with \n. On *nix,
most tools will show a \r as a noise character. No idea how
forgiving Mac tools are.

AFAIK, we have
*nix -> \n
Windows -> \r\n
Mac -> \r

However, since you're dealing with messages created who-knows-where
and run through who-knows-what transformations, you may well have to
munge no matter what.


- Gordon
Mailbox, tell and read [ In reply to ]
Gordon McMillan wrote:
> > First, are there platforms other than DOS/Windows/Win32 where there
> > is a difference between text and binary mode?
>
> There's a theoretical difference on *nix - otherwise, they wouldn't
> have made up the distinction!

who are "they?"

fwiw, the ANSI C rationale mentions Unix as the
special case here... go figure ;-)

</F>
Mailbox, tell and read [ In reply to ]
From: "Fredrik Lundh" <fredrik@pythonware.com>

Gordon McMillan wrote:
> > First, are there platforms other than DOS/Windows/Win32 where there
> > is a difference between text and binary mode?
>
> There's a theoretical difference on *nix - otherwise, they wouldn't
> have made up the distinction!

who are "they?"

fwiw, the ANSI C rationale mentions Unix as the
special case here... go figure ;-)

</F>
Mailbox, tell and read [ In reply to ]
Fredrik Lundh wrote:
> Gordon McMillan wrote:
> > > First, are there platforms other than DOS/Windows/Win32 where there
> > > is a difference between text and binary mode?
> >
> > There's a theoretical difference on *nix - otherwise, they wouldn't
> > have made up the distinction!
>
> who are "they?"

Whatever AT&T employee it was. AFAIK, it predates any glimmer of an
idea of a "PC".

> fwiw, the ANSI C rationale mentions Unix as the
> special case here... go figure ;-)

I have a vague recollection that it can make a difference on AIX. At
least I remember fixing something by changing someone's "r" to "rb",
but I've completely forgotten the context.

- Gordon
Mailbox, tell and read [ In reply to ]
From: "Gordon McMillan" <gmcm@hypernet.com>

Fredrik Lundh wrote:
> Gordon McMillan wrote:
> > > First, are there platforms other than DOS/Windows/Win32 where there
> > > is a difference between text and binary mode?
> >
> > There's a theoretical difference on *nix - otherwise, they wouldn't
> > have made up the distinction!
>
> who are "they?"

Whatever AT&T employee it was. AFAIK, it predates any glimmer of an
idea of a "PC".

> fwiw, the ANSI C rationale mentions Unix as the
> special case here... go figure ;-)

I have a vague recollection that it can make a difference on AIX. At
least I remember fixing something by changing someone's "r" to "rb",
but I've completely forgotten the context.

- Gordon
Mailbox, tell and read [ In reply to ]
From: "Phil Mayes" <nospam@bitbucket.com>

Paul Prescod wrote in message <377AAE02.E8591727@prescod.net>...
>The mailbox message objects simulate a file interface by seeking through a
>mailbox. A read() returns the entire message. It does this by seeking to
>the start, reading to the message end, and returning the data. The problem
>is that message offsets are derived by file.tell() which I presume is in
>terms of bytes. But then it does a read() based on the offsets and read()
>is in terms of characters.
>
>These don't match up on Windows because of CR/LF pairs. I think that the
>right fix is to never use tell() and instead keep track of the location by
>counting characters. Does that make sense?


I am working with multiple mails in a single file, and found that using
seek/tell on the file in text mode didn't allow me to extract a single mail,
so I use binary, but I have to convert internally else quopri leaves =CRLF
inside the string. Code fragment:
# read the body from the mbx file with 'rb' to get the correct
# amount of data in, then convert CRLF to LF else quopri fails.
fmbx = open(mbxfile, 'rb')
fmbx.seek(self._offbody)
s = fmbx.read(self._offend - self._offbody)
return string.replace(s, '\r\n', '\n') # runs in 2/3 the time of
# ... joinfield/splitfield
--
Phil Mayes pmayes AT olivebr DOT com
Mailbox, tell and read [ In reply to ]
Paul Prescod wrote in message <377AAE02.E8591727@prescod.net>...
>The mailbox message objects simulate a file interface by seeking through a
>mailbox. A read() returns the entire message. It does this by seeking to
>the start, reading to the message end, and returning the data. The problem
>is that message offsets are derived by file.tell() which I presume is in
>terms of bytes. But then it does a read() based on the offsets and read()
>is in terms of characters.
>
>These don't match up on Windows because of CR/LF pairs. I think that the
>right fix is to never use tell() and instead keep track of the location by
>counting characters. Does that make sense?


I am working with multiple mails in a single file, and found that using
seek/tell on the file in text mode didn't allow me to extract a single mail,
so I use binary, but I have to convert internally else quopri leaves =CRLF
inside the string. Code fragment:
# read the body from the mbx file with 'rb' to get the correct
# amount of data in, then convert CRLF to LF else quopri fails.
fmbx = open(mbxfile, 'rb')
fmbx.seek(self._offbody)
s = fmbx.read(self._offend - self._offbody)
return string.replace(s, '\r\n', '\n') # runs in 2/3 the time of
# ... joinfield/splitfield
--
Phil Mayes pmayes AT olivebr DOT com