Mailing List Archive

[Image-SIG] Proper application of the buffer interface
> The context is this: the t1python package (an interface to a Type1
> font renderer) has, in the past, been passing bitmaps back to Python
> as string objects. In the next release, I may create a new type in
> the C layer that provides all the interesting stuff, and
> would like to
> be able to convert these "glyph" objects to PIL images and GTK+/GNOME
> compatible images. Does it make the most sense for the glyph objects
> to offer the buffer interface to make more conversions possible
> without increasing the number of memory copies, or do I misunderstand
> the application of the interface?

I believe you are correct, although _all_ these packages need to support
the interface. The good news is that due to tricks inside Python, they
already may.

As you mention, it is common to use Python string objects to move chunks of
binary data around. The buffer interfaces now allow us to use _any_
object, and as long as it conforms to the buffer interface, you dont lose
anything by not using strings (and obviously gain whatever functionality
your object has)

The general idea is that PIL and other such frameworks can think in terms
of buffers. Rather than PIL saying "give me a Python string with the raw
binary data", it can say "give me an object from which I can obtain a
buffer with the raw binary data". Strings obviously still fit the bill.

The good news is that it is most common for extension modules to spell
"give me a string with the raw binary data" as "PyArg_ParseTyple("s#",
...);". PyArg_ParseTuple has been upgraded to use the buffer interfaces,
and so have Python string objects. Thus, whenever code uses
PyArg_ParseTuple in that way, they are already supporting the buffer
interface.

Thus, you could implement your new object, and define the buffer
interfaces. This object could then be passed to any C extension function
that use PyArg_ParseTuple, and the extension module will still think it has
a "char *" pointer from an in-place string object. In practice, this means
that automatically people will be able to say "file.write(your_object)"
etc. with your new object.

The problem remains, of course, for extensions that use the
PyString_Check(), PyString_AsString() etc functions. If they were upgraded
to use the buffer interfaces, then the transition would be complete.

Just to extend my guesswork somewhat, there is a new built-in "buffer()"
function. This returns a "buffer" object. I speculate this should be used
in preference to Python strings when you have binary data. As this buffer
object supports the buffer interfaces, they are basically as functional as
strings for this purpose, but clearly indicate the data is not really a
string! This appears to be more a matter of style, and also paves the road
to Unicode - eg, it makes sense to convert any Python string object to
Unicode, but not necessarily a binary buffer.

> I'd appreciate some input on this matter. I hope to get the next
> release of t1python done before too much longer, and this is probably
> the biggest remaining question that I need to deal with.

Probably Greg and Guido are the only 2 with the real insight, as they
threashed out the details. But Im pretty happy with my understanding (as
detailed above) of the issues.

Mark.
[Image-SIG] Proper application of the buffer interface [ In reply to ]
On Thu, 5 Aug 1999, Mark Hammond wrote:

> Just to extend my guesswork somewhat, there is a new built-in "buffer()"
> function. This returns a "buffer" object. I speculate this should be used
> in preference to Python strings when you have binary data.

Would it also make sense to make struct.pack() return a buffer object
instead of a string?

--david ascher
[Image-SIG] Proper application of the buffer interface [ In reply to ]
Mark Hammond writes:
> The good news is that it is most common for extension modules to spell
> "give me a string with the raw binary data" as "PyArg_ParseTyple("s#",
> ...);". PyArg_ParseTuple has been upgraded to use the buffer interfaces,

Mark,
This is really cool. Buffers are probably what I want, then; the
question is starting to become whether to use a buffer object or to
create a new object that implements the buffer interface. I think I
can handle that! (And it's looking like implementing a new C type,
for those following t1python.)

> The problem remains, of course, for extensions that use the
> PyString_Check(), PyString_AsString() etc functions. If they were upgraded
> to use the buffer interfaces, then the transition would be complete.

Understandable; this is the price of using the lowest-level concrete
object interfaces.

> strings for this purpose, but clearly indicate the data is not really a
> string! This appears to be more a matter of style, and also paves the road
> to Unicode - eg, it makes sense to convert any Python string object to
> Unicode, but not necessarily a binary buffer.

This is a compelling argument for the buffer type/interface in my
book. I won't be happy until Unicode strings are in the core! (No,
I'm not holding out for Unicode strings to replace the current string
type, just that they be in the core and well-supported.)

> Probably Greg and Guido are the only 2 with the real insight, as they
> threashed out the details. But Im pretty happy with my understanding (as
> detailed above) of the issues.

Your elucidation on the topic is excellent; if you'd like to write a
section for the "Extending & Embedding" manual regarding when to
implement the buffer interface and when to use it, I'd certainly be
glad to mark it up and integrate it. ;-)
(Greg: No, this wouldn't get you off the hook for reference material!)


-Fred

--
Fred L. Drake, Jr. <fdrake@cnri.reston.va.us>
Corporation for National Research Initiatives
[Image-SIG] Proper application of the buffer interface [ In reply to ]
David Ascher writes:
> Would it also make sense to make struct.pack() return a buffer object
> instead of a string?

Sounds like it to me. If Guido agrees, I can make the changes.


-Fred

--
Fred L. Drake, Jr. <fdrake@acm.org>
Corporation for National Research Initiatives
[Image-SIG] Proper application of the buffer interface [ In reply to ]
Fred L. Drake, Jr. wrote:
>
> David Ascher writes:
> > Would it also make sense to make struct.pack() return a buffer object
> > instead of a string?
>
> Sounds like it to me. If Guido agrees, I can make the changes.

Wait... what's so bad about buffer(struct.pack()) ? Strings already
know the buffer interface, so this works just fine already.

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 147 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
[Image-SIG] Proper application of the buffer interface [ In reply to ]
Mark Hammond wrote:
>
> M.-A. Lemburg <mal@lemburg.com> wrote in message
> <37AAA7FE.FFE0FD2@lemburg.com>...
> > Fred L. Drake, Jr. wrote:
> > >
> > > David Ascher writes:
> > > > Would it also make sense to make struct.pack()
> > > > return a buffer object
> > > > instead of a string?
> > >
> > > Sounds like it to me. If Guido agrees, I can make the changes.
> >
> > Wait... what's so bad about buffer(struct.pack()) ? Strings already
> > know the buffer interface, so this works just fine already.
>
> Well, struct.pack() does not return a string - it returns a chunk of
> memory.

The docs say:

pack (fmt, v1, v2, ...)
Return a string containing the values v1, v2, ... packed according to the given format. The
arguments must match the values required by the format exactly.

> So IMO it makes more sense to say "str(struct.pack())" if you really want a
> string.

While its arguable whether returning a string is the right
thing to do, simply returning a buffer object instead of
a string will certainly break code expecting a string -- you can't
rely on all C APIs using "s#" to parse the return value of
struct.pack(), even though most of them will probably.

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 147 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
[Image-SIG] Proper application of the buffer interface [ In reply to ]
M.-A. Lemburg writes:
> Wait... what's so bad about buffer(struct.pack()) ? Strings already
> know the buffer interface, so this works just fine already.

I think the issue is that if strings ever change to Unicode, we
don't want non-character data to ever be exposed as a string. Using a
buffer takes care of this and works now, allowing preparations for
further integration of Unicode to be tested gradually, rather than
having a complete meltdown from a large cutover.
Functions which need to receive binary data will always use s#,
because s will raise TypeError if '\0' is in the string.
The other way around is done less consistently now. The S format
character might not raise an exception if '\0' is in the data, and may
standard functions and methods probably need to be adjusted to finish
up a Unicode integration, but I think using a buffer object for binary
data will only avoid problems, not create them.


-Fred

--
Fred L. Drake, Jr. <fdrake@acm.org>
Corporation for National Research Initiatives
[Image-SIG] Proper application of the buffer interface [ In reply to ]
On Fri, 6 Aug 1999, M.-A. Lemburg wrote:

> Fred L. Drake, Jr. wrote:
> >
> > David Ascher writes:
> > > Would it also make sense to make struct.pack() return a buffer object
> > > instead of a string?
> >
> > Sounds like it to me. If Guido agrees, I can make the changes.
>
> Wait... what's so bad about buffer(struct.pack()) ? Strings already
> know the buffer interface, so this works just fine already.

It's not terrible, but it seems suboptimal. It strikes me that
struct.pack() explicitely shouldn't create strings, it should create
arrays of bytes. Let's ask the question the other way -- what's wrong
with having struct.pack() return buffers instead of strings? The only
thing I can think of is that type(struct.pack()) is no longer StringType.
Maybe that's enough to push the change to 2.0, I'm not sure either way.

--david