Mailing List Archive: buffer interface considered harmful

buffer interface considered harmful

Aug 15, 1999, 3:32 AM

Post #1 of 20 (5729 views)

> Fredrik Lundh wrote:
> >...
> > besides, what about buffers and threads? if you
> > return a pointer from getreadbuf, wouldn't it be
> > good to know exactly when Python doesn't need
> > that pointer any more? explicit initbuffer/exitbuffer
> > calls around each sequence of buffer operations
> > would make that a lot safer...
>
> This is a pretty obvious one, I think: it lasts only as long as the
> object. PyString_AS_STRING is similar. Nothing new or funny here.

well, I think the buffer behaviour is both
new and pretty funny:

from array import array

a = array("f", [0]*8192)

b = buffer(a)

for i in range(1000):
a.append(1234)

print b

in other words, the buffer interface should
be redesigned, or removed.

(though I'm sure AOL would find some inter-
resting use for this ;-)

</F>

"Confusing? Yes, but this is a lot better than
allowing arbitrary pointers!"
-- GvR on assignment operators, November 91

Re: buffer interface considered harmful [ In reply to ]

gstein at lyra

Aug 15, 1999, 1:35 PM

Post #2 of 20 (5687 views)

Permalink

Fredrik Lundh wrote:
>...
> well, I think the buffer behaviour is both
> new and pretty funny:

I think the buffer interface was introduced in 1.5 (by Jack?). I added
the 8-bit character buffer slot and buffer objects in 1.5.2.

> from array import array
>
> a = array("f", [0]*8192)
>
> b = buffer(a)
>
> for i in range(1000):
> a.append(1234)
>
> print b
>
> in other words, the buffer interface should
> be redesigned, or removed.

I don't understand what you believe is weird here. Also, are you saying
the buffer *interface* is weird, or the buffer *object* ?

thx,
-g

--
Greg Stein, http://www.lyra.org/

Re: buffer interface considered harmful [ In reply to ]

fredrik at pythonware

Aug 16, 1999, 12:06 AM

Post #3 of 20 (5711 views)

Permalink

> I think the buffer interface was introduced in 1.5 (by Jack?). I added
> the 8-bit character buffer slot and buffer objects in 1.5.2.
>
> > from array import array
> >
> > a = array("f", [0]*8192)
> >
> > b = buffer(a)
> >
> > for i in range(1000):
> > a.append(1234)
> >
> > print b
> >
> > in other words, the buffer interface should
> > be redesigned, or removed.
>
> I don't understand what you believe is weird here.

did you run that code?

it may work, it may bomb, or it may generate bogus
output. all depending on your memory allocator, the
phase of the moon, etc. just like back in the C/C++
days...

imo, that's not good enough for a core feature.

</F>

Re: buffer interface considered harmful [ In reply to ]

gstein at lyra

Aug 16, 1999, 12:15 AM

Post #4 of 20 (5703 views)

Permalink

Fredrik Lundh wrote:
>
> > I think the buffer interface was introduced in 1.5 (by Jack?). I added
> > the 8-bit character buffer slot and buffer objects in 1.5.2.
> >
> > > from array import array
> > >
> > > a = array("f", [0]*8192)
> > >
> > > b = buffer(a)
> > >
> > > for i in range(1000):
> > > a.append(1234)
> > >
> > > print b
> > >
> > > in other words, the buffer interface should
> > > be redesigned, or removed.
> >
> > I don't understand what you believe is weird here.
>
> did you run that code?

Yup. It printed nothing.

> it may work, it may bomb, or it may generate bogus
> output. all depending on your memory allocator, the
> phase of the moon, etc. just like back in the C/C++
> days...

It probably appeared as an empty string because the construction of the
array filled it with zeroes (at least the first byte).

Regardless, I'd be surprised if it crashed the interpreter. The print
command is supposed to do a str() on the object, which creates a
PyStringObject from the buffer contents. Shouldn't be a crash there.

> imo, that's not good enough for a core feature.

If it crashed, then sure. But I'd say that indicates a bug rather than a
design problem. Do you have a stack trace from a crash?

Ah. I just worked through, in my head, what is happening here. The
buffer object caches the pointer returned by the array object. The
append on the array does a realloc() somewhere, thereby invalidating the
pointer inside the buffer object.

Icky. Gotta think on this one... As an initial thought, it would seem
that the buffer would have to re-query the pointer for each operation.
There are performance implications there, of course, but that would
certainly fix the problem.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/

Re: buffer interface considered harmful [ In reply to ]

jack at oratrix

Aug 16, 1999, 2:49 AM

Post #5 of 20 (5710 views)

Permalink

> >...
> > well, I think the buffer behaviour is both
> > new and pretty funny:
>
> I think the buffer interface was introduced in 1.5 (by Jack?). I added
> the 8-bit character buffer slot and buffer objects in 1.5.2.

Ah, now I understand why I didn't understand some of the previous
conversation: I hadn't never come across the buffer *objects* (as opposed to
the buffer *interface*) until Fredrik's example.

I've just look at it, and I'm not sure I understand the full intentions of the
buffer object. Buffer objects can either behave as the "buffer-aspect" of the
object behind them (without the rest of their functionality) or as array
objects, and if they start out life as the first they can evolve into the
second, is that right?

Is there a rationale behind this design, or is it just something that
happened?
--
Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm

Re: buffer interface considered harmful [ In reply to ]

gstein at lyra

Aug 16, 1999, 2:56 AM

Post #6 of 20 (5702 views)

Permalink

Jack Jansen wrote:
>...
> I've just look at it, and I'm not sure I understand the full intentions of the
> buffer object. Buffer objects can either behave as the "buffer-aspect" of the
> object behind them (without the rest of their functionality) or as array
> objects, and if they start out life as the first they can evolve into the
> second, is that right?
>
> Is there a rationale behind this design, or is it just something that
> happened?

The object doesn't change. You create it as a reference to an existing
object's buffer (as exported via the buffer interface), or you create it
as a reference to some arbitrary memory.

The buffer object provides (optionally read/write) string-like behavior
to any object that supports buffer behavior. It can also be used to make
lightweight slices of another object. For example:

>>> a = "abcdefghi"
>>> b = buffer(a, 3, 3)
>>> print b
def
>>>

In the above example, there is only one copy of "def" (the portion
inside of the string object referenced by <a>).

The string-like behavior can be quite nice for memory-mapped files.
Andrew's mmapfile module's file objects export the buffer interface.
This means that you can open a file, wrap a buffer around it, and
perform quick and easy random-access on the thing. You could even select
slices of the file and pass them around as if they were strings, without
loading anything into the process heap. (I want to try mmap'ing a .pyc
and create code objects that have buffer-based bytecode streams; it will
be interesting to see if this significantly reduces memory consumption
(in terms of the heap size; the mmap'd .pyc can be shared across
processes)).

Cheers,
-g

--
Greg Stein, http://www.lyra.org/

Re: buffer interface considered harmful [ In reply to ]

jim at digicool

Aug 16, 1999, 5:30 AM

Post #7 of 20 (5703 views)

Permalink

Fredrik Lundh wrote:
>
> > Fredrik Lundh wrote:
> > >...
> > > besides, what about buffers and threads? if you
> > > return a pointer from getreadbuf, wouldn't it be
> > > good to know exactly when Python doesn't need
> > > that pointer any more? explicit initbuffer/exitbuffer
> > > calls around each sequence of buffer operations
> > > would make that a lot safer...
> >
> > This is a pretty obvious one, I think: it lasts only as long as the
> > object. PyString_AS_STRING is similar. Nothing new or funny here.
>
> well, I think the buffer behaviour is both
> new and pretty funny:
>
> from array import array
>
> a = array("f", [0]*8192)
>
> b = buffer(a)
>
> for i in range(1000):
> a.append(1234)
>
> print b
>
> in other words, the buffer interface should
> be redesigned, or removed.

A while ago I asked for some documentation on the Buffer
interface. I basically got silence. At this point, I
don't have a good idea what buffers are for and I don't see alot
of evidence that there *is* a design. I assume that there was
a design, but I can't see it. This whole discussion makes me
very queasy.

I'm probably just out of it, since I don't have
time to read the Python list anymore. Presumably the buffer
interface was proposed and discussed there at some distant
point in the past.

(I can't pay as much attention to this discussion as I suspect
I should, due to time constaints and due to a basic understanding
of the rational for the buffer interface. Jst now I caught a sniff
of something I find kinda repulsive. I think I hear you all talking about
beasies that hold a reference to some object's internal storage and that
have write operations so you can write directly to the objects storage
bypassing the object interfaces. I probably just imagined it.)

</whine>

Jim

--
Jim Fulton mailto:jim@digicool.com Python Powered!
Technical Director (888) 344-4332 http://www.python.org
Digital Creations http://www.digicool.com http://www.zope.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission. Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

Re: buffer interface considered harmful [ In reply to ]

gstein at lyra

Aug 16, 1999, 5:41 AM

Post #8 of 20 (5713 views)

Permalink

Jim Fulton wrote:
>...
> A while ago I asked for some documentation on the Buffer
> interface. I basically got silence. At this point, I

I think the silence was caused by the simple fact that the documentation
does not (yet) exist. That's all... nothing nefarious.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/

Re: buffer interface considered harmful [ In reply to ]

mal at lemburg

Aug 16, 1999, 5:41 AM

Post #9 of 20 (5697 views)

Permalink

Greg Stein wrote:
>
> Fredrik Lundh wrote:
> >
> > > I think the buffer interface was introduced in 1.5 (by Jack?). I added
> > > the 8-bit character buffer slot and buffer objects in 1.5.2.
> > >
> > > > from array import array
> > > >
> > > > a = array("f", [0]*8192)
> > > >
> > > > b = buffer(a)
> > > >
> > > > for i in range(1000):
> > > > a.append(1234)
> > > >
> > > > print b
> > > >
> > > > in other words, the buffer interface should
> > > > be redesigned, or removed.
> > >
> > > I don't understand what you believe is weird here.
> >
> > did you run that code?
>
> Yup. It printed nothing.
>
> > it may work, it may bomb, or it may generate bogus
> > output. all depending on your memory allocator, the
> > phase of the moon, etc. just like back in the C/C++
> > days...
>
> It probably appeared as an empty string because the construction of the
> array filled it with zeroes (at least the first byte).
>
> Regardless, I'd be surprised if it crashed the interpreter. The print
> command is supposed to do a str() on the object, which creates a
> PyStringObject from the buffer contents. Shouldn't be a crash there.
>
> > imo, that's not good enough for a core feature.
>
> If it crashed, then sure. But I'd say that indicates a bug rather than a
> design problem. Do you have a stack trace from a crash?
>
> Ah. I just worked through, in my head, what is happening here. The
> buffer object caches the pointer returned by the array object. The
> append on the array does a realloc() somewhere, thereby invalidating the
> pointer inside the buffer object.
>
> Icky. Gotta think on this one... As an initial thought, it would seem
> that the buffer would have to re-query the pointer for each operation.
> There are performance implications there, of course, but that would
> certainly fix the problem.

I guess that's the way to go. I wouldn't want to think
about those details when using buffer objects and a function call
is still better than a copy... it would do the init/exit
wrapping implicitly: init at the time the getreadbuffer
call is made and exit next time a thread switch is done -
provided that the functions using the memory pointer also
keep a reference to the buffer object alive (but that should
be natural as this is always done when dealing with references
in a safe way).

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 138 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/

Re: buffer interface considered harmful [ In reply to ]

jim at digicool

Aug 16, 1999, 6:26 AM

Post #10 of 20 (5708 views)

Permalink

Greg Stein wrote:
>
> Jim Fulton wrote:
> >...
> > A while ago I asked for some documentation on the Buffer
> > interface. I basically got silence. At this point, I
>
> I think the silence was caused by the simple fact that the documentation
> does not (yet) exist. That's all... nothing nefarious.

I didn't mean to suggest anything nefarious. I do think that a change that
affects something as basic as the standard object type layout and that
generates this much discussion really should be documented before it
becomes part of the core. I'd especially like to see some kind of document
that includes information like:

- A problem statement that describes the problem the change is
solving,

- How does the solution solve the problem,

- When and how should people writing new types support the new
interfaces?

We're not talking about a new library module here. There's been
a change to the core object interface.

Jim

--
Jim Fulton mailto:jim@digicool.com Python Powered!
Technical Director (888) 344-4332 http://www.python.org
Digital Creations http://www.digicool.com http://www.zope.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission. Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

Re: buffer interface considered harmful [ In reply to ]

jack at oratrix

Aug 16, 1999, 6:45 AM

Post #11 of 20 (5713 views)

Permalink

> A while ago I asked for some documentation on the Buffer
> interface. I basically got silence. At this point, I
> don't have a good idea what buffers are for and I don't see alot
> of evidence that there *is* a design. I assume that there was
> a design, but I can't see it. This whole discussion makes me
> very queasy.

Okay, as I'm apparently not the only one who is queasy let's start from
scratch.

First, there is the old buffer _interface_. This is a C interface that allows
extension (and builtin) modules and functions a unified way to access objects
if they want to write the object to file and similar things. It is also what
the PyArg_ParseTuple "s#" returns. This is, in C, the
getreadbuffer/getwritebuffer interface.

Second, there's the extension the the buffer interface as of 1.5.2. This is
again only available in C, and it allows C programmers to get an object _as an
ASCII string_. This is meant for things like regexp modules, to access any
"textual" object as an ASCII string. This is the getcharbuffer interface, and
bound to the "t#" specifier in PyArg_ParseTuple.

Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports
the functionality of the buffer interface to Python, but it does a bit more as
well, because the buffer objects have a sort of copy-on-write semantics that
means they may or may not be "attached" to a python object through the buffer
interface.

<personal opinion>
I think that the C interface and the object should be treated completely
separately. I definitely want the C interface, but I personally don't use the
Python buffer objects, so I don't really care all that much about those. Also,
I think that the buffer objects might become easier to understand if we don't
think of it as "the buffer interface exported to python", but as "Python
buffer objects, that may share memory with other Python objects as an
optimization".
</personal opinion>
--
Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm

Re: buffer interface considered harmful [ In reply to ]

jim at digicool

Aug 16, 1999, 9:03 AM

Post #12 of 20 (5713 views)

Permalink

Jack Jansen wrote:
>
> > A while ago I asked for some documentation on the Buffer
> > interface. I basically got silence. At this point, I
> > don't have a good idea what buffers are for and I don't see alot
> > of evidence that there *is* a design. I assume that there was
> > a design, but I can't see it. This whole discussion makes me
> > very queasy.
>
> Okay, as I'm apparently not the only one who is queasy let's start from
> scratch.

Yee ha!

> First, there is the old buffer _interface_. This is a C interface that allows
> extension (and builtin) modules and functions a unified way to access objects
> if they want to write the object to file and similar things.

Is this serialization? What does this achiev that, say, the pickling
protocols don't achiev? What other problems does it solve?

> It is also what
> the PyArg_ParseTuple "s#" returns. This is, in C, the
> getreadbuffer/getwritebuffer interface.

Huh? "s#" doesn't return a string? Or are you saying that you can
pass a non-string object to a C function that uses "s#" and have it
bufferized and then stringized? In either case, this is not
consistent with the documentation (interface) of PyArg_ParseTuple.

> Second, there's the extension the the buffer interface as of 1.5.2. This is
> again only available in C, and it allows C programmers to get an object _as an
> ASCII string_. This is meant for things like regexp modules, to access any
> "textual" object as an ASCII string. This is the getcharbuffer interface, and
> bound to the "t#" specifier in PyArg_ParseTuple.

Hm. So this is making a little more sense. So, there is a notion that
there are "textual" objects that want to provide a method for getting
their "text". How does this text differ from what you get from __str__
or __repr__?

> Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports
> the functionality of the buffer interface to Python,

How so? Maybe I'm at sea because I still don't get what the
C buffer interface is for.

> but it does a bit more as
> well, because the buffer objects have a sort of copy-on-write semantics that
> means they may or may not be "attached" to a python object through the buffer
> interface.

What is this thing used for?

Where does the slot in tp_as_buffer come into all of this?

Why does this need to be a slot in the first place?
Are these "textual" objects really common? Is the presense of this
slot a flag for "textualness"?

It would help alot, at least for me, if there was a clearer
description of what motivates these things. What problems are
they trying to solve?

Jim

--
Jim Fulton mailto:jim@digicool.com Python Powered!
Technical Director (888) 344-4332 http://www.python.org
Digital Creations http://www.digicool.com http://www.zope.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission. Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

Re: buffer interface considered harmful [ In reply to ]

da at ski

Aug 16, 1999, 9:45 AM

Post #13 of 20 (5715 views)

Permalink

On Mon, 16 Aug 1999, Jim Fulton wrote:

> > Second, there's the extension the the buffer interface as of 1.5.2. This is
> > again only available in C, and it allows C programmers to get an object _as an
> > ASCII string_. This is meant for things like regexp modules, to access any
> > "textual" object as an ASCII string. This is the getcharbuffer interface, and
> > bound to the "t#" specifier in PyArg_ParseTuple.
>
> Hm. So this is making a little more sense. So, there is a notion that
> there are "textual" objects that want to provide a method for getting
> their "text". How does this text differ from what you get from __str__
> or __repr__?

I'll let others give a well thought out rationale. Here are some examples
of use which I think worthwile:

* Consider an mmap()'ed file, twelve gigabytes long. Making mmapfile
objects fit this aspect of the buffer interface allows you to do regexp
searches on it w/o ever building a twelve gigabyte PyString.

* Consider a non-contiguous NumPy array. If the array type supported the
multi-segment buffer interface, extension module writers could
manipulate the data within this array w/o having to worry about the
non-contiguous nature of the data. They'd still have to worry about
the multi-byte nature of the data, but it's still a win. In other
words, I think that the buffer interface could be useful even w/
non-textual data.

* If NumPy was modified to have arrays with data stored in buffer objects
as opposed to the current "char *", and if PIL was modified to have
images stored in buffer objects as opposed to whatever it uses, one
could have arrays and images which shared data.

I think all of these provide examples of motivations which are appealing
to at least some Python users. I make no claim that they motivate the
specific interface. In all the cases I can think of, one or both of two
features are the key asset:

- access to subset of huge data regions w/o creation of huge temporary
variables.

- sharing of data space.

Yes, it's a power tool, and as a such should come with safety goggles.
But then again, the same is true for ExtensionClasses =).

leaving-out-the-regexp-on-NumPy-arrays-example,

--david

PS: I take back the implicit suggestion that buffer() return read-write
buffers when possible.

Re: buffer interface considered harmful [ In reply to ]

jim at digicool

Aug 16, 1999, 10:06 AM

Post #14 of 20 (5705 views)

Permalink

David Ascher wrote:
>
> On Mon, 16 Aug 1999, Jim Fulton wrote:
>
> > > Second, there's the extension the the buffer interface as of 1.5.2. This is
> > > again only available in C, and it allows C programmers to get an object _as an
> > > ASCII string_. This is meant for things like regexp modules, to access any
> > > "textual" object as an ASCII string. This is the getcharbuffer interface, and
> > > bound to the "t#" specifier in PyArg_ParseTuple.
> >
> > Hm. So this is making a little more sense. So, there is a notion that
> > there are "textual" objects that want to provide a method for getting
> > their "text". How does this text differ from what you get from __str__
> > or __repr__?
>
> I'll let others give a well thought out rationale.

I eagerly await this. :)

> Here are some examples
> of use which I think worthwile:
>
> * Consider an mmap()'ed file, twelve gigabytes long. Making mmapfile
> objects fit this aspect of the buffer interface allows you to do regexp
> searches on it w/o ever building a twelve gigabyte PyString.

This seems reasonable, if a bit exotic. :)

> * Consider a non-contiguous NumPy array. If the array type supported the
> multi-segment buffer interface, extension module writers could
> manipulate the data within this array w/o having to worry about the
> non-contiguous nature of the data. They'd still have to worry about
> the multi-byte nature of the data, but it's still a win. In other
> words, I think that the buffer interface could be useful even w/
> non-textual data.

Why is this a good thing? Why should extension module writes
worry abot the non-contiguous nature of the data now? Does the NumPy
C API somehow expose this now? Will multi-segment buffers make it
go away somehow?

> * If NumPy was modified to have arrays with data stored in buffer objects
> as opposed to the current "char *", and if PIL was modified to have
> images stored in buffer objects as opposed to whatever it uses, one
> could have arrays and images which shared data.

Uh, and this would be a good thing? Maybe PIL should just be modified
to use NumPy arrays.

> I think all of these provide examples of motivations which are appealing
> to at least some Python users.

Perhaps, although Guido knows how they'd find out about them. ;)

Jim

--
Jim Fulton mailto:jim@digicool.com Python Powered!
Technical Director (888) 344-4332 http://www.python.org
Digital Creations http://www.digicool.com http://www.zope.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission. Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

Re: buffer interface considered harmful [ In reply to ]

da at ski

Aug 16, 1999, 10:18 AM

Post #15 of 20 (5714 views)

Permalink

On Mon, 16 Aug 1999, Jim Fulton wrote:

>> [regexps on gigabyte files]
>
> This seems reasonable, if a bit exotic. :)

In the bioinformatics world, I think it's everyday stuff.

> Why is this a good thing? Why should extension module writes worry
> abot the non-contiguous nature of the data now? Does the NumPy C API
> somehow expose this now? Will multi-segment buffers make it go away
> somehow?

A NumPy extension module writer needs to create and modify NumPy arrays.
These arrays may be non-contiguous (if e.g. they are the result of
slicing). The NumPy C API exposes the non-contiguous nature, but it's
hard enough to deal with it that I suspect most extension writers require
contiguous arrays, which means unnecessary copies.

Multi-segment buffers won't make the API go away necessarily (backwards
compatibility and all that), but it could make it unnecessary for many
extension writers.

> > * If NumPy was modified to have arrays with data stored in buffer objects
> > as opposed to the current "char *", and if PIL was modified to have
> > images stored in buffer objects as opposed to whatever it uses, one
> > could have arrays and images which shared data.
>
> Uh, and this would be a good thing? Maybe PIL should just be modified
> to use NumPy arrays.

Why? PIL was designed for image processing, and made design decisions
appropriate to that domain. NumPy was designed for multidimensional
numeric array processing, and made design decisions appropriate to that
domain. The intersection of interests exists (e.g. in the medical imaging
world), and I know people who spend a lot of their CPU time moving data
between images and arrays with "stupid" tostring/fromstring operations.
Given the size of the images, it's a prodigious waste of time, and kills
the use of Python in many a project.

> Perhaps, although Guido knows how they'd find out about them. ;)

Uh? These issues have been discussed in the NumPy/PIL world for a while,
with no solution in sight. Recently, I and others saw mentions of buffers
in the source, and they seemed like a reasonable approach, which could be
done w/o a rewrite of either PIL or NumPy.

Don't get me wrong -- I'm all for better documentation of the buffer
stuff, design guidelines, warnings and protocols. I stated as much on
June 15:

http://www.python.org/pipermail/python-dev/1999-June/000338.html

--david

Re: buffer interface considered harmful [ In reply to ]

jim at digicool

Aug 16, 1999, 10:38 AM

Post #16 of 20 (5706 views)

Permalink

David Ascher wrote:
>
> On Mon, 16 Aug 1999, Jim Fulton wrote:
>
> >> [regexps on gigabyte files]
> >
> > This seems reasonable, if a bit exotic. :)
>
> In the bioinformatics world, I think it's everyday stuff.

Right, in some (exotic ;) domains it's not exotic at all.

> > Why is this a good thing? Why should extension module writes worry
> > abot the non-contiguous nature of the data now? Does the NumPy C API
> > somehow expose this now? Will multi-segment buffers make it go away
> > somehow?
>
> A NumPy extension module writer needs to create and modify NumPy arrays.
> These arrays may be non-contiguous (if e.g. they are the result of
> slicing). The NumPy C API exposes the non-contiguous nature, but it's
> hard enough to deal with it that I suspect most extension writers require
> contiguous arrays, which means unnecessary copies.

Hm. This sounds like an API problem to me.

> Multi-segment buffers won't make the API go away necessarily (backwards
> compatibility and all that), but it could make it unnecessary for many
> extension writers.

Multi-segment buffers don't make the mult-segmented nature of the
memory go away. Do they really simplify the API that much?

They seem to strip away an awful lot of information hiding.

> > > * If NumPy was modified to have arrays with data stored in buffer objects
> > > as opposed to the current "char *", and if PIL was modified to have
> > > images stored in buffer objects as opposed to whatever it uses, one
> > > could have arrays and images which shared data.
> >
> > Uh, and this would be a good thing? Maybe PIL should just be modified
> > to use NumPy arrays.
>
> Why? PIL was designed for image processing, and made design decisions
> appropriate to that domain. NumPy was designed for multidimensional
> numeric array processing, and made design decisions appropriate to that
> domain. The intersection of interests exists (e.g. in the medical imaging
> world), and I know people who spend a lot of their CPU time moving data
> between images and arrays with "stupid" tostring/fromstring operations.
> Given the size of the images, it's a prodigious waste of time, and kills
> the use of Python in many a project.

It seems to me that NumPy is sufficiently broad enogh to encompass
image processing.

My main concern is having two systems rely on some low-level "shared
memory" mechanism to achiev effiecient communication.

> > Perhaps, although Guido knows how they'd find out about them. ;)
>
> Uh? These issues have been discussed in the NumPy/PIL world for a while,
> with no solution in sight. Recently, I and others saw mentions of buffers
> in the source, and they seemed like a reasonable approach, which could be
> done w/o a rewrite of either PIL or NumPy.

My point was that people would be lucky to find out about buffers or
about how to use them as things stand.

> Don't get me wrong -- I'm all for better documentation of the buffer
> stuff, design guidelines, warnings and protocols. I stated as much on
> June 15:
>
> http://www.python.org/pipermail/python-dev/1999-June/000338.html

Yes, that was quite a jihad you launched. ;)

Jim

--
Jim Fulton mailto:jim@digicool.com Python Powered!
Technical Director (888) 344-4332 http://www.python.org
Digital Creations http://www.digicool.com http://www.zope.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission. Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

Re: buffer interface considered harmful [ In reply to ]

da at ski

Aug 16, 1999, 11:25 AM

Post #17 of 20 (5718 views)

Permalink

On Mon, 16 Aug 1999, Jim Fulton wrote:

[ Aside:

> It seems to me that NumPy is sufficiently broad enogh to encompass
> image processing.

Well, I'll just say that you could have been right, but w/ the current
NumPy, I don't blame F/ for having developed his own data structures.
NumPy is messy, and some of its design decisions are wrong for image
things (memory handling, casting rules, etc.). It's all water under the
bridge at this point.
]

Back to the main topic:

You say:

> [Multi-segment buffers] seem to strip away an awful lot of information
> hiding.

My impression of the buffer notion was that it is intended to *provide*
information hiding, by giving a simple API to byte arrays which could be
stored in various ways. I do agree that whether those bytes should be
shared or not is a decision which should be weighted carefully.

> My main concern is having two systems rely on some low-level "shared
> memory" mechanism to achiev effiecient communication.

I don't particularly care about the specific buffer interface (the
low-level nature of which is what I think you object to). I do care about
having a well-defined mechanism for sharing memory between objects, and I
think there is value in defining such an interface generically. Maybe the
notion of segmented arrays of bytes is too low-level, and instead we
should think of the data spaces as segmented arrays of chunks, where a
chunk can be one or more bytes? Or do you object to any 'generic'
interface?

Just for fun, here's the list of things which either currently do or have
been talked about possibly in the future supporting some sort of buffer
interface, and my guesses as to chunk size, segmented status and
writeability):

- strings (1 byte, single-segment, r/o)
- unicode strings (2 bytes, single-segment, r/o)
- struct.pack() things (1 byte, single-segment,r/o)
- arrays (1-4? bytes, single-segment, r/w)
- NumPy arrays (1-8 bytes, multi-segment, r/w)
- PIL images (1-? bytes, multi-segment, r/w)
- CObjects (1-byte, single-segment, r/?)
- mmapfiles (1-byte, multi-segment?, r/w)
- non-python-owned memory (1-byte, single-segment, r/w)

--david

Re: buffer interface considered harmful [ In reply to ]

fredrik at pythonware

Aug 17, 1999, 12:23 AM

Post #18 of 20 (5711 views)

Permalink

David Ascher <da@ski.org> wrote:
> Why? PIL was designed for image processing, and made design decisions
> appropriate to that domain. NumPy was designed for multidimensional
> numeric array processing, and made design decisions appropriate to that
> domain. The intersection of interests exists (e.g. in the medical imaging
> world), and I know people who spend a lot of their CPU time moving data
> between images and arrays with "stupid" tostring/fromstring operations.
> Given the size of the images, it's a prodigious waste of time, and kills
> the use of Python in many a project.

as an aside, PIL 1.1 (*) introduces "virtual image memories" which
are, as I mentioned in an earlier post, accessed via an API rather
than via direct pointers. it'll also include an adapter allowing you
to use NumPy objects as image memories.

unfortunately, the buffer interface is not good enough to use
on top of the virtual image memory interface...

</F>

*) 1.1 is our current development thread, which will be
released to plus customers in a number of weeks...

RE: buffer interface considered harmful [ In reply to ]

mhammond at skippinet

Aug 17, 1999, 9:05 AM

Post #19 of 20 (5719 views)

Permalink

Fredrik,
Care to elaborate? Statements like "buffer interface needs a redesign" or
"the buffer interface is not good enough to use on top of the virtual image
memory interface" really only give me the impression you have a bee in your
bonnet over these buffer interfaces.

If you could actually stretch these statements out to provide even _some_
background, problem statement or potential solution it would help. All I
know is "Fredrik doesnt like it for some unexplained reason". You found an
issue with array reallocation - great - but thats a bug rather than a
design flaw.

Can you tell us why its not good enough, and an off-the-cuff design that
would solve it? Or are you suggesting it is unsolvable? I really dont
have a clue what your issue is. Jim (for example) has made his position
and reasoning clear. You have only made your position clear, but your
reasoning is still a mystery.

Mark.

>
> unfortunately, the buffer interface is not good enough to use
> on top of the virtual image memory interface...

Re: buffer interface considered harmful [ In reply to ]

fredrik at pythonware

Aug 17, 1999, 9:48 AM

Post #20 of 20 (5709 views)

Permalink

> Care to elaborate? Statements like "buffer interface needs a redesign" or
> "the buffer interface is not good enough to use on top of the virtual image
> memory interface" really only give me the impression you have a bee in your
> bonnet over these buffer interfaces.

re "good enough":
http://www.python.org/pipermail/python-dev/1999-August/000650.html

re "needs a redesign":
http://www.python.org/pipermail/python-dev/1999-August/000659.html
and to some extent:
http://www.python.org/pipermail/python-dev/1999-August/000658.html

> Jim (for example) has made his position and reasoning clear.

among other things, Jim said:

"At this point, I don't have a good idea what buffers are
for and I don't see alot of evidence that there *is* a design.
I assume that there was a design, but I can't see it".

which pretty much echoes my concerns in:

http://www.python.org/pipermail/python-dev/1999-August/000612.html
http://www.python.org/pipermail/python-dev/1999-August/000648.html

> You found an issue with array reallocation - great - but thats
> a bug rather than a design flaw.

for me, that bug (and the marshal glitch) indicates that the
design isn't as chrystal-clear as it needs to be, for such a
fundamental feature. otherwise, Greg would never have
made that mistake, and Guido would have spotted it when
he added the "buffer" built-in...

so what are you folks waiting for? could someone who
thinks he understands exactly what this thing is spend
an hour on writing that design document, so me and Jim
can put this entire thing behind us?

</F>

PS. btw, was it luck or careful analysis behind the decision
to make buffer() always return read-only buffers, also for
objects implementing the read/write protocol?