Mailing List Archive

mmap
Another topic: what are the chances of adding the mmap module to the core
distribution? It's restricted to a smallish set of platforms (modern
Unices and Win32, I think), but it's quite small, and would be a nice
thing to have available in the core, IMHO.

(btw, the buffer object needs more documentation)

--david
Re: mmap [ In reply to ]
> Another topic: what are the chances of adding the mmap module to the core
> distribution? It's restricted to a smallish set of platforms (modern
> Unices and Win32, I think), but it's quite small, and would be a nice
> thing to have available in the core, IMHO.

If it works on Linux, Solaris, Irix and Windows, and is reasonably
clean, I'll take it. Please send it.

> (btw, the buffer object needs more documentation)

That's for Jack & Greg...

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: mmap [ In reply to ]
On Tue, 15 Jun 1999, Guido van Rossum wrote:
> > Another topic: what are the chances of adding the mmap module to the core
> > distribution? It's restricted to a smallish set of platforms (modern
> > Unices and Win32, I think), but it's quite small, and would be a nice
> > thing to have available in the core, IMHO.
>
> If it works on Linux, Solaris, Irix and Windows, and is reasonably
> clean, I'll take it. Please send it.

Actually, my preference is to see a change to open() rather than a whole
new module. For example, let's say that you open a file, specifying
memory-mapping. Then you create a buffer against that file:

f = open('foo','rm') # 'm' means mem-map
b = buffer(f)
print b[100:200]

Disclaimer: I haven't looked at the mmap modules (AMK's and Mark's) to see
what capabilities are in there. They may not be expressable soly as open()
changes. (adding add'l params for mmap flags might be another way to
handle this)

I'd like to see mmap native in Python. I won't push, though, until I can
run a test to see what kind of savings will occur when you mmap a .pyc
file and open PyBuffer objects against the thing for the code bytes. My
hypothesis is that you can reduce the working set of Python (i.e. amortize
the cost of a .pyc's code over several processes by mmap'ing it); this
depends on the proportion of code in the pyc relative to "other" stuff.

> > (btw, the buffer object needs more documentation)
>
> That's for Jack & Greg...

Quite true. My bad :-( ... That would go into the API doc, I guess... I'll
put this on a todo list, but it could be a little while.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/
Re: mmap [ In reply to ]
Greg wrote:
> Actually, my preference is to see a change to open() rather than a whole
> new module. For example, let's say that you open a file, specifying
> memory-mapping. Then you create a buffer against that file:
>
> f = open('foo','rm') # 'm' means mem-map
> b = buffer(f)
> print b[100:200]
>
> Disclaimer: I haven't looked at the mmap modules (AMK's and Mark's) to see
> what capabilities are in there. They may not be expressable soly as open()
> changes. (adding add'l params for mmap flags might be another way to
> handle this)
>
> I'd like to see mmap native in Python. I won't push, though, until I can
> run a test to see what kind of savings will occur when you mmap a .pyc
> file and open PyBuffer objects against the thing for the code bytes. My
> hypothesis is that you can reduce the working set of Python (i.e. amortize
> the cost of a .pyc's code over several processes by mmap'ing it); this
> depends on the proportion of code in the pyc relative to "other" stuff.

yes, yes, yes!

my good friend the mad scientist (the guy who writes code,
not the flaming cult-ridden brainwashed script kiddie) has
considered writing a whole new "abstract file" backend, to
entirely get rid of stdio in the Python core. some potential
advantages:

-- performance (some stdio implementations are slow)
-- portability (stdio doesn't exist on some platforms!)
-- opens up for cool extensions (memory mapping,
pluggable file handlers, etc).

should I tell him to start hacking?

or is this the same thing as PyBuffer/buffer (I've implemented
PyBuffer support for the unicode class, but that doesn't mean
that I understand how it works...)

</F>

PS. someone once told me that Perl goes "below" the standard
file I/O system. does anyone here know if that's true, and per-
haps even explain how they're doing that...
Re: mmap [ In reply to ]
[me]
> > If it works on Linux, Solaris, Irix and Windows, and is reasonably
> > clean, I'll take it. Please send it.

[Greg]
> Actually, my preference is to see a change to open() rather than a whole
> new module. For example, let's say that you open a file, specifying
> memory-mapping. Then you create a buffer against that file:
>
> f = open('foo','rm') # 'm' means mem-map
> b = buffer(f)
> print b[100:200]

Buh. Changes of this kind to builtins are painful, especially since
we expect that this feature may or may not be supported. And imagine
the poor reader who comes across this for the first time...

What's wrong with

import mmap
f = mmap.open('foo', 'r')

???

> I'd like to see mmap native in Python. I won't push, though, until I can
> run a test to see what kind of savings will occur when you mmap a .pyc
> file and open PyBuffer objects against the thing for the code bytes. My
> hypothesis is that you can reduce the working set of Python (i.e. amortize
> the cost of a .pyc's code over several processes by mmap'ing it); this
> depends on the proportion of code in the pyc relative to "other" stuff.

We've been through this before. I still doubt it will help much.
Anyway, it's a completely independent feature from making the mmap
module(any mmap module) available to users.

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: mmap [ In reply to ]
> my good friend the mad scientist (the guy who writes code,
> not the flaming cult-ridden brainwashed script kiddie) has
> considered writing a whole new "abstract file" backend, to
> entirely get rid of stdio in the Python core. some potential
> advantages:
>
> -- performance (some stdio implementations are slow)
> -- portability (stdio doesn't exist on some platforms!)

You have this backwards -- you'd have to port the abstract backend
first! Also don't forget that a *good* stdio might be using all sorts
of platform-specific tricks that you'd have to copy to match its
performance.

> -- opens up for cool extensions (memory mapping,
> pluggable file handlers, etc).
>
> should I tell him to start hacking?

Tcl/Tk does this. I see some advantages (e.g. you have more control
over and knowledge of how much data is buffered) but also some
disadvantages (more work to port, harder to use from C), plus tons of
changes needed in the rest of Python. I'd say wait until Python 2.0
and let's keep stdio for 1.6.

> PS. someone once told me that Perl goes "below" the standard
> file I/O system. does anyone here know if that's true, and per-
> haps even explain how they're doing that...

Probably just means that they use the C equivalent of os.open() and
friends.

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: mmap [ In reply to ]
On 16 June 1999, Fredrik Lundh said:
> my good friend the mad scientist (the guy who writes code,
> not the flaming cult-ridden brainwashed script kiddie) has
> considered writing a whole new "abstract file" backend, to
> entirely get rid of stdio in the Python core. some potential
> advantages:
[...]
> PS. someone once told me that Perl goes "below" the standard
> file I/O system. does anyone here know if that's true, and per-
> haps even explain how they're doing that...

My understanding (mainly from folklore -- peeking into the Perl source
has been known to turn otherwise staid, solid programmers into raving
lunatics) is that yes, Perl does grovel around in the internals of stdio
implementations to wring a few extra cycles out.

However, what's probably of more interest to you -- I mean your mad
scientist alter ego -- is Perl's I/O abstraction layer: a couple of
years ago, somebody hacked up Perl's guts to do basically what you're
proposing for Python. The main result was a half-baked, unfinished (at
least as of last summer, when I actually asked an expert in person at
the Perl Conference) way of building Perl with AT&T's sfio library
instead of stdio. I think the other things you mentioned, eg. more
natural support for memory-mapped files, have also been bandied about as
advantages of this scheme.

The main problem with Perl's I/O abstraction layer is that extension
modules now have to call e.g. PerlIO_open(), PerlIO_printf(), etc. in
place of their stdio counterparts. Surprise surprise, many extension
modules have not adapted to the new way of doing things, even though
it's been in Perl since version 5.003 (I think). Even more
surprisingly, the fourth-party C libraries that those extension modules
often interface to haven't switched to using Perl's I/O abstraction
layer. This doesn't make a whit of difference if Perl is built in
either the "standard way" (no abstraction layer, just direct stdio) or
with the abstraction layer on top of stdio. But as soon as some poor
fool decides Perl on top of sfio would be neat, lots of extension
modules break -- their I/O calls go nowhere.

I'm sure there is some sneaky way to make it all work using sfio's
binary compatibility layer and some clever macros. This might even have
been done. However, AFAIK it's not been documented anywhere.

This is not merely to bitch about unfinished business in the Perl core;
it's to warn you that others have walked down the road you propose to
tread, and there may be potholes. Now if the Python source really does
get even more modularized for 1.6, you might have a much easier job of
it. ("Modular" is not the word that jumps to mind when one looks at the
Perl source code.)

Greg

/*
* "Far below them they saw the white waters pour into a foaming bowl, and
* then swirl darkly about a deep oval basin in the rocks, until they found
* their way out again through a narrow gate, and flowed away, fuming and
* chattering, into calmer and more level reaches."
*/
-- Tolkein, by way of perl/doio.c

--
Greg Ward - software developer gward@cnri.reston.va.us
Corporation for National Research Initiatives
1895 Preston White Drive voice: +1-703-620-8990
Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913
Re: mmap [ In reply to ]
Fredrik Lundh writes:
>
> my good friend the mad scientist (the guy who writes code,
> not the flaming cult-ridden brainwashed script kiddie) has
> considered writing a whole new "abstract file" backend, to
> entirely get rid of stdio in the Python core. some potential
> advantages:
>
> -- performance (some stdio implementations are slow)
> -- portability (stdio doesn't exist on some platforms!)
> -- opens up for cool extensions (memory mapping,
> pluggable file handlers, etc).
>
> should I tell him to start hacking?
>

I am not in favor of obscuring Python's I/O model too much. When
working with C extensions, it is critical to have access to normal I/O
mechanisms such as 'FILE *' or integer file descriptors. If you hide
all of this behind some sort of abstract I/O layer, it's going to make
life hell for extension writers unless you also provide a way to get
access to the raw underlying data structures. This is a major gripe
I have with the Tcl channel model--namely, there seems to be no easy
way to unravel a Tcl channel into a raw file-descriptor for use in C
(unless I'm being dense and have missed some simple way to do it).

Also, what platforms are we talking about here? I've never come
across any normal machine that had a C compiler, but did not have stdio.
Is this really a serious problem?

Cheers,

Dave
RE: mmap [ In reply to ]
[Greg writes]
> The main problem with Perl's I/O abstraction layer is that extension
> modules now have to call e.g. PerlIO_open(), PerlIO_printf(), etc. in
> place of their stdio counterparts. Surprise surprise, many extension

Interestingly, Python _nearly_ suffers this problem now. Although Python
does use native FILE pointers, this scheme still assumes that Python and
the extensions all use the same stdio.

I understand that on most Unix system this can be taken for granted.
However, to be truly cross-platform, this assumption may not be valid. A
case in point is (surprise surprise :-) Windows. Windows has a number of C
RTL options, and Python and its extensions must be careful to select the
one that shares FILE * and the heap across separately compiled and linked
modules. In-fact, Windows comes with an excellent debug version of the C
RTL, but this gets in Python's way - if even one (but not all) Python
extension attempts to use these debugging features, we die in a big way.

and-dont-even-talk-to-me-about-Windows-CE ly,

Mark.
Re: mmap [ In reply to ]
Greg Ward wrote:
> This is not merely to bitch about unfinished business in the Perl core;
> it's to warn you that others have walked down the road you propose to
> tread, and there may be potholes.

oh, the mad scientist have rushed down that road a few
times before. we'll see if he's prepared to do that again;
it sure won't happen before the unicode stuff is in place...

</F>
Re: mmap [ In reply to ]
> > -- performance (some stdio implementations are slow)
> > -- portability (stdio doesn't exist on some platforms!)
>
> You have this backwards -- you'd have to port the abstract backend
> first! Also don't forget that a *good* stdio might be using all sorts
> of platform-specific tricks that you'd have to copy to match its
> performance.

well, if the backend layer is good enough, I don't
think a stdio-based standard version will be much
slower than todays stdio-only implementation.

> > PS. someone once told me that Perl goes "below" the standard
> > file I/O system. does anyone here know if that's true, and per-
> > haps even explain how they're doing that...
>
> Probably just means that they use the C equivalent of os.open() and
> friends.

hopefully. my original source described this as
"digging around in the innards of the stdio package"
(and so did greg). and the same source claimed it
wasn't yet ported to Linux. sounds weird, to say
the least, but maybe he referred to that sfio
package greg mentioned. I'll do some digging,
but not today.

</F>
Re: mmap [ In reply to ]
David Beazley wrote:
> I am not in favor of obscuring Python's I/O model too much. When
> working with C extensions, it is critical to have access to normal I/O
> mechanisms such as 'FILE *' or integer file descriptors. If you hide
> all of this behind some sort of abstract I/O layer, it's going to make
> life hell for extension writers unless you also provide a way to get
> access to the raw underlying data structures. This is a major gripe
> I have with the Tcl channel model--namely, there seems to be no easy
> way to unravel a Tcl channel into a raw file-descriptor for use in C
> (unless I'm being dense and have missed some simple way to do it).
>
> Also, what platforms are we talking about here? I've never come
> across any normal machine that had a C compiler, but did not have stdio.
> Is this really a serious problem?

in a way, it is a problem today under Windows (in other
words, on most of the machines where Python is used
today). it's very easy to end up with different DLL's using
different stdio implementations, resulting in all kinds of
strange errors. a rewrite could use OS-level handles
instead, and get rid of that problem.

not to mention Windows CE (iirc, Mark had to write his
own stdio-ish package for the CE port), maybe PalmOS,
BeOS's BFile's, and all the other upcoming platforms which
will make Windows look like a fairly decent Unix clone ;-)

...

and in Python, any decent extension writer should write
code that works with arbitrary file objects, right? "if it
cannot deal with StringIO objects, it's broken"...

</F>
Re: mmap [ In reply to ]
Fredrik Lundh writes:
>
> and in Python, any decent extension writer should write
> code that works with arbitrary file objects, right? "if it
> cannot deal with StringIO objects, it's broken"...

I disagree. Given that a lot of people use Python as a glue language
for interfacing with legacy codes, it is unacceptable for extensions
to be forced to use some sort of funky non-standard I/O abstraction.
Unless you are volunteering to rewrite all of these codes to use the
new I/O model, you are always going to need access (in one way or
another) to plain old 'FILE *' and integer file descriptors. Of
course, one can always just provide a function like

FILE *PyFile_AsFile(PyObject *o)

That takes an I/O object and returns a 'FILE *' where supported. (Of
course, if it's not supported, then it doesn't matter if this function
is missing since any extension that needs a 'FILE *' wouldn't work
anyways).

Cheers,

Dave
Re: mmap [ In reply to ]
> > and in Python, any decent extension writer should write
> > code that works with arbitrary file objects, right? "if it
> > cannot deal with StringIO objects, it's broken"...
>
> I disagree. Given that a lot of people use Python as a glue language
> for interfacing with legacy codes, it is unacceptable for extensions
> to be forced to use some sort of funky non-standard I/O abstraction.

oh, you're right, of course. should have added that extra smiley
to that last line. cut and paste from this mail if necessary: ;-)

> Unless you are volunteering to rewrite all of these codes to use the
> new I/O model, you are always going to need access (in one way or
> another) to plain old 'FILE *' and integer file descriptors. Of
> course, one can always just provide a function like
>
> FILE *PyFile_AsFile(PyObject *o)
>
> That takes an I/O object and returns a 'FILE *' where supported.

exactly my idea. when scanning the code, PyFile_AsFile immediately
popped up as a potential pothole (if you need the fileno, there's
already a method for that in the "standard file object interface").

btw, an "abstract file object" could actually make it much easier
to support arbitrary file objects from C/C++ extensions. just map
the calls back to Python. or add a tp_file slot, and things get
really interesting...

> (Of course, if it's not supported, then it doesn't matter if this
> function is missing since any extension that needs a 'FILE *' wouldn't
> work anyways).

yup. I suspect some legacy code may have a hard time running
under CE et al. but of course, with a little macro trickery, no-
thing stops you from recompiling such code so it uses Python's
new "abstract file... okay, okay, I'll stop now ;-)

</F>
Re: mmap [ In reply to ]
Fredrik Lundh writes:
> > > and in Python, any decent extension writer should write
> > > code that works with arbitrary file objects, right? "if it
> > > cannot deal with StringIO objects, it's broken"...
> >
> > I disagree. Given that a lot of people use Python as a glue language
> > for interfacing with legacy codes, it is unacceptable for extensions
> > to be forced to use some sort of funky non-standard I/O abstraction.
>
> oh, you're right, of course. should have added that extra smiley
> to that last line. cut and paste from this mail if necessary: ;-)
>

Good. You had me worried there for a second :-).
>
> yup. I suspect some legacy code may have a hard time running
> under CE et al. but of course, with a little macro trickery, no-
> thing stops you from recompiling such code so it uses Python's
> new "abstract file... okay, okay, I'll stop now ;-)

Macro trickery? Oh yes, we could use that too... (one can never
have too much macro trickery if you ask me :-)

Cheers,

Dave