Mailing List Archive

Towards a Python based import scheme
Back again...

To get a little more constructive I've started hacking away
on Greg Stein's imputil.py to make it work with my DateTime
package. The DateTime packages does a lot of from...import...
and intra-package imports, plus it loads a shared lib as
extension.

The original version of imputil I fetched from Greg's page
did work out of the box (from...import... hassles) and
obviously did not support in-package shared libs. I've added
both features so that the test script in DateTime can run
successfully.

Things that remain are:
· the win32 registry stuff (needs C code)
· the Mac fork stuff (needs C code)
· a working __path__ implementation (is anyone using this attribute
which only is available in packages ?)
· probably a whole bunch of other quirks
· some speedups (there currently are too many stat()s)

Please give it a try:

http://starship.skyport.net/~lemburg/imputil.py

in color:

http://starship.skyport.net/~lemburg/imputil.py.html

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 107 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
Re: Towards a Python based import scheme [ In reply to ]
Marc-Andre wrote:

> To get a little more constructive I've started hacking away on
> Greg Stein's imputil.py to make it work with my DateTime package.

> http://starship.skyport.net/~lemburg/imputil.py

You (and Greg) are missing a rather important patch I
submitted to Greg a long time ago (around line 62):

def _reload_hook(self, module):
# gmcm - Hmmm, reloading of a module may or may not
be impossible,
# (depending on the importer), but at least we can
# look to see if it's ours to reload:
if hasattr(module, '__importer__'):
if getattr(module, '__importer__') == self:
raise SystemError, "reload not yet implemented"
return self.__chain_reload(module)



- Gordon
Re: Towards a Python based import scheme [ In reply to ]
Gordon McMillan wrote:
>...
> def _reload_hook(self, module):
> # gmcm - Hmmm, reloading of a module may or may not
> be impossible,
> # (depending on the importer), but at least we can
> # look to see if it's ours to reload:
> if hasattr(module, '__importer__'):
> if getattr(module, '__importer__') == self:
> raise SystemError, "reload not yet implemented"
> return self.__chain_reload(module)

I've folded this in (finally).

New imputil.py to be published in a bit...

thx!
-g

--
Greg Stein, http://www.lyra.org/
Re: Towards a Python based import scheme [ In reply to ]
M.-A. Lemburg wrote:
>...
> The original version of imputil I fetched from Greg's page
> did work out of the box (from...import... hassles) and

"did not work", I presume. From my original testing, I thought
from...import worked. With more testing, I found that something of the
form "from xml.dom import builder" did not work.

I discovered why it failed (xml.dom was imported by Importer instance I1
but I2 thought it could handle the from...import, and this barfed a
check). I've fixed this by delegating to the proper importer (I1 in my
example) to complete the import. Your solution to check the __importer__
variable in the globals is probably incorrect. If I read/eval it
correctly, that would mean that a module imported by IMP1 could not use
modules imported by IMP2. In other words, a package module could not
import a top-level module defined by a different importer.
(note also that your globals.get() could fail if globals is None)

> obviously did not support in-package shared libs. I've added

I did not fold this in. Your change isn't "in the spirit" of the
Importer mechanism. The "Right Way" to do this is to create a
BuiltinImporter and add that to the chain of importers. The
DirectoryImporter should only import from directories -- no reason for
it to know about builtin stuff. As a result, I did not accept the new
methods on Importer for handling builtins/special modules -- those would
go in the BuiltinImporter.
[.BuiltinImporter should be written and included in imputil.py; I don't
really have the time at the moment to write the thing... 7am and time
for sleep...]

However, your change here did raise a very important design issue:
get_code() needs to be able to return a loaded module, rather than just
a code object. I've folded in your patches for that.

I also folded in many of your extended doc/comments (at least in
concept; not necessarily verbatim). You and Gordon are recognized in the
header now, and I've added a "proper" author notice and licensing
(public domain).

I did not include the "misses" feature that you added to the
DirectoryImporter. I would hate to see a miss-cache get loaded, a module
dropped into the filesystem, and the user never being able to import the
thing.

I didn't fold in your indentation changes or name changes. I liked mine
:-). The __main__ thing at the bottom didn't make much sense to me,
though, since the call to _test_dir() followed by an exit doesn't really
do anything. And yes, I recognize that you can use "python -i
imputil.py" but I'd rather just see "python" followed by "import imputil
; imputil._test_dir()".

Of course, please feel free to generate a new patch if I've missed
something (thinking about it, I missed the OSError thing).

> both features so that the test script in DateTime can run
> successfully.
>
> Things that remain are:
> · the win32 registry stuff (needs C code)

And a new Importer to use it.

> · the Mac fork stuff (needs C code)

Ditto.

> · a working __path__ implementation (is anyone using this attribute
> which only is available in packages ?)

Per the private mail that I sent to you: I explicitly punted on the
__path__ attribute. It can lead to *way* too much confusion. It is also
unavailable for frozen packages (boy oh boy did the win32com get some
ugliness in there to compensate for being frozen w.r.t. its use of
__path__).

The DirectoryImporter can insert the attribute, but it definitely
wouldn't go into the Importer itself. The __path__ attribute is specific
to loading from a filesystem, yet Importer is generic.

> · probably a whole bunch of other quirks
> · some speedups (there currently are too many stat()s)

Yes. I recognize that the "misses" feature was intended to remedy this.
I don't have an immediate answer to the stat() issue. Does the Importer
mechanism actually perform more stats on an import than Python itself?
(it looks like it does one for the isdir() plus two for fetching file
timestamps)

And a big thanx: I appreciate the patches to imputil! The new module is
now available in its "official" location at
http://www.lyra.org/greg/python/

Cheers,
-g

--
Greg Stein, http://www.lyra.org/
Re: Towards a Python based import scheme [ In reply to ]
Greg Stein wrote:
>
> Gordon McMillan wrote:
> >...
> > def _reload_hook(self, module):
> > # gmcm - Hmmm, reloading of a module may or may not
> > be impossible,
> > # (depending on the importer), but at least we can
> > # look to see if it's ours to reload:
> > if hasattr(module, '__importer__'):
> > if getattr(module, '__importer__') == self:
> > raise SystemError, "reload not yet implemented"
> > return self.__chain_reload(module)
>
> I've folded this in (finally).
>
> New imputil.py to be published in a bit...

As a result of all this import discussion I am a bit worried that
the python library *.pyl file format may not be powerful enough.
I have always thought in terms of unique top-level names and a
format which supports import of modules and packages. But this
does not support the full functionality of PYTHONPATH. For example,
PYTHONPATH can (and is) used to select the correct plat-* directory
files. And the format may not support Jim Fulton's fancy local
import scheme. And what if someone invents a third thing to import
besides a module or a package? PYTHONPATH is not going away nor
should it.

How about if the *.pyl file format is exactly a directory structure?
I mean that the table of contents is limited to paths starting with
a directory name only, and that the seperator is '/' instead of '.'.
So a listing would be identical to the output of 'ls -R'. So:
Lib/string.pyc
Lib/exceptions.pyc
Lib/plat-sunos4/...
mx/__init__.pyc
mx/...
package2/...
dir3/...
...

The implied PYTHONPATH for this file is ["Lib", "."]. Since the
format is exactly a directory tree, it is guaranteed that whatever
PYTHONPATH or imports can do now or in the future with a directory
tree, it can still do it with a *.pyl file.

Jim Ahlstrom
Re: Towards a Python based import scheme [ In reply to ]
James C. Ahlstrom wrote:
>...
> As a result of all this import discussion I am a bit worried that
> the python library *.pyl file format may not be powerful enough.

Background for the readers:

.pyl is an extension that I used in my "small" distribution. I think
Gordon uses it, too. In any case, it is effectively a concatenation of
.pyc files along with a TOC mapping fully-qualified dotted module names
to seek-positions within the file.

[.speaking of stat() calls: using a .pyl eliminates them quite nicely --
this may be part of Gordon's observed speed increase when using an
archive]

The .pyl format was discussed a bit on the distutils-sig list and "sort
of" accepted as an okay format for jamming a bunch of modules into a
single file.
[.by "sort of", I mean that the small group who participated in the
discussion were okay with it :-); it is a great, minimalist format, so
it probably won't please people who like a ton of features in a file
format :-) ]

>...
> How about if the *.pyl file format is exactly a directory structure?
> I mean that the table of contents is limited to paths starting with
> a directory name only, and that the seperator is '/' instead of '.'.
> So a listing would be identical to the output of 'ls -R'. So:
> Lib/string.pyc
> Lib/exceptions.pyc
> Lib/plat-sunos4/...
> mx/__init__.pyc
> mx/...
> package2/...
> dir3/...
> ...
>
> The implied PYTHONPATH for this file is ["Lib", "."]. Since the
> format is exactly a directory tree, it is guaranteed that whatever
> PYTHONPATH or imports can do now or in the future with a directory
> tree, it can still do it with a *.pyl file.

People import things using a dotted name. Therefore, I think it makes
the most sense to map that straight to the resulting .pyc file. No
reason to put directories into the file... they make no sense to the end
user. During construction of the .pyl, you would walk the tree finding
all the available modules (and their corresponding dotted name) and
insert them.

Note that you can distribute multiple .pyl files. There could be the
Python standard lib in one file, the mx package in another, etc. As a
module is searched for, the system just peeks into each .pyl in turn,
looking for the module.

Search order is currently defined by order of install() on the Importer
instances. I believe the Right Way to do things is to create
sys.importers (as a list of Importers) and deprecate the sys.path
variable. Python could start up with an Importer than simply scanned
sys.path as a backwards compat measure; it could also leave sys.path
empty and create DirectoryImporters for each path component (this could
cause problems, though, for some apps that believe sys.path shouldn't be
empty, or that use it for magic-munging). I've search the standard lib
in the past -- there are only a couple real uses of sys.path if I
remember rightly (test package and the traceback module).

Cheers,
-g

--
Greg Stein, http://www.lyra.org/
Re: Towards a Python based import scheme [ In reply to ]
"M.-A. Lemburg" wrote:
>
> · a working __path__ implementation (is anyone using this attribute
> which only is available in packages ?)

Yes. I use it for two things:

- I modify it to allow a (logical) package to be spread over
multiple physical locations. (In Zope, products can be installed
in the Zope installation area or in Zope "instance" homes.

- I use it to determine the location(s) of a package.
Our packages usually contain many files, such as DTML
source files, images, data files, etc., that are not Python
modules. We have standard utilities for getting at these files
in packages. This is extremely useful.

Jim

--
Jim Fulton mailto:jim@digicool.com Python Powered!
Technical Director (888) 344-4332 http://www.python.org
Digital Creations http://www.digicool.com http://www.zope.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission. Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.
Re: Towards a Python based import scheme [ In reply to ]
Jim> As a result of all this import discussion I am a bit worried that
Jim> the python library *.pyl file format may not be powerful enough.

Not to rain on anyone's parade, but I want to remind the folks having this
discussion that there are people reading this thread that while fairly well
versed in Python have little idea what anyone is talking about anymore. (I
don't know. Maybe I'm the only one.)

python-dev is clearly the best place to discuss this in the short-term
(anyone for an import SIG?), but whatever is implemented will have to be
understood by lots of people on c.l.py to be of broad applicability.
Perhaps I'm way off base and there are more than a handful of people who
will ever run into the problems being solved here, but if we can partition
the Python programming community into the package wizards and the mere
import mortals, I worry that the potions concocted by the wizards will send
a few of us import mortals to the hospital...

The Java package scheme, while odious to some perhaps, is extremely easy to
understand for anyone who's ever used Windows Explorer or execute "ls -R".

just-a-cautionary-peanut-thrown-in-from-the-bleachers-ly y'rs

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/~skip/
847-971-7098 | Python: Programming the way Guido indented...
Re: Towards a Python based import scheme [ In reply to ]
Jim Ahlstrom wrote:

> As a result of all this import discussion I am a bit worried that
> the python library *.pyl file format may not be powerful enough.
> I have always thought in terms of unique top-level names and a
> format which supports import of modules and packages. But this
> does not support the full functionality of PYTHONPATH. For
> example, PYTHONPATH can (and is) used to select the correct
> plat-* directory files. And the format may not support Jim
> Fulton's fancy local import scheme. And what if someone invents
> a third thing to import besides a module or a package?
> PYTHONPATH is not going away nor should it.

The central idea of imputil is that an importer is responsible for
one little chunk of turf. If the desired module / package isn't
"his", he just passes the request on to the next element in the
chain.

So I don't think there's a need for one canonical do-everything
importer (or archive format). PYTHONPATH is outside any
particular importer. Effectively, you can use a chain of
importers to replace PYTHONPATH. So the platform specific
modules might be found by one particular importer. In other
words, I think it's more effective to specialize individual
importers and chain them up than it is to try to create an
overly-generalized importer.



- Gordon
Re: Towards a Python based import scheme [ In reply to ]
Gordon McMillan wrote:

> So I don't think there's a need for one canonical do-everything
> importer (or archive format). PYTHONPATH is outside any
> particular importer. Effectively, you can use a chain of
> importers to replace PYTHONPATH. So the platform specific
> modules might be found by one particular importer. In other
> words, I think it's more effective to specialize individual
> importers and chain them up than it is to try to create an
> overly-generalized importer.

Greg agrees with you so I defer to the experts on importers.
The feature is meant to support a chain.

Greg wrote:

> The .pyl format was discussed a bit on the distutils-sig list and "sort
> of" accepted as an okay format for jamming a bunch of modules into a
> single file.
> [.by "sort of", I mean that the small group who participated in the
> discussion were okay with it :-); it is a great, minimalist format, so
> it probably won't please people who like a ton of features in a file
>format :-) ]

But I still disagree on the .pyl file format. If there is no Standard
Format and everyone is linking in his own importer, then we will have
exactly the same situation we have now with PYTHONPATH and novel
import hooks. There should be a Standard Format to fix this
problem. In particular, package authors should be able to publish
packages as PYL files and expect them to be usable as is with
no further effort. Sysadmins should be able to manage everything
PYTHONPATH does with a small (one?) number of PYL files and in
a standard way.

Jim Ahlstrom