Mailing List Archive

Python for large projects?
Hi, I've been doing mostly desktop and Client/Server development work
but I'm considering how best to develop web-based and distributed
projects.

I read a paper on the Python.org site comparing Java and Python and
was impressed. I'm interested in other peoples views and experiences
of using Python for: large projects; team developments; distributed
objects; intranet development etc.

TIA
Python for large projects? [ In reply to ]
In article <3774b5d1.690922@news.force9.net>,
joechan25@hotmail.com (Joe Chan) wrote:
> Hi, I've been doing mostly desktop and Client/Server development work
> but I'm considering how best to develop web-based and distributed
> projects.
>
> I read a paper on the Python.org site comparing Java and Python and
> was impressed. I'm interested in other peoples views and experiences
> of using Python for: large projects; team developments; distributed
> objects; intranet development etc.
>
> TIA

Hello Joe

I am using Python for a large web enabled simulation/planning model.
The project is going wonderfully. Keeping it brief, here are the
features that my model would not have if I had chosen other tools I was
considering:

Multiple Interfaces: Microsoft Access, Web , Command line,TK

Multiple Platform: WinNT and SGI Origin (Irix) ->
There will also be a smaller version that will run on WinCE

Numerous Options: The productivity boost allows me to focus on
business logic. This allows me to add more features and options to my
model.

Increased Accuracy: The reduced development time allows me and my co-
workers to do a great deal of testing without disrupting our schedule.

I have also found that using Python softens some of the blows during
the requirement gathering phase. It is nice to be able to put together
a quick mock up of an application component to help your customers
clarify what their requirments are. Also when you find that you have
perhaps erred in your direction, it hurts alot less to throw away two
hours of work than twenty.


Hope this helps,

Bill Wilkinson


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
Python for large projects? [ In reply to ]
Python is great fun, with positive qualities that go on and on.

Large projects come with special problems of their own.
For C++ they have books about this subject.
Large-Scale C++ Software Design by John Lakos
Effective C++ by Scott Meyers.

I can recommend Python for large projects but watch out for the following.
IMHO.

1. Memory management. Search for keywords GC, cycles, reference counting in
this news group.
One problem we had is reading a very large file in and passing the buffer
down to be ripped apart. We got stuck holding the original buffer for this
file until the program exited. At least I think we did, it's hard to know
sometimes. This was a problem because as we ripped it apart our memory needs
grew very fast. We should have kept track of this from the start, it was a
pain to fix later.

This isn't a memory cycle but a problem still.
Don't:
buf=open(file).read()
doTheBigJob(buf)
# buf won't be freed until this function exits. Which might be the
entire run of the program.
# The idea is don't leave a reference at the top level
# Even worse is if at a lower level you want to change buf. Try
this and monitor memory usage.
>>> s=" "*999999
>>> b=s
>>> b=b+" "

Do something like this:
parm={}
parm['buf']=open(file).read()
doTheBigJob(parm)
# Now doTheBigJob can "del parm['buf']" to free the memory.
# Or pass the buffer with in an object of some kind.

Look out for:
dict1={}
dict1['dict']=dict1
# This and variations on this theme become immortal. Unless you kill it
off carefully.

Useful memory leak finding tool. But it won't find the above example unless
dict1 is in an object.
http://www.stud.ifi.uio.no/~larsga/download/python/plumbo.py

2. To use packages or not.
We had multiple directories for the different components of our system.
So we had to setup sys.path and or use packages. I like packages, but get
everyone to accept this up front. We fought about it because no one wanted
to deal with it. Using just the python path is simple but sooner or later
you end up with a module name the same as another on your system. The user
calls up describing a screen full of traceback that makes no sense because
you've just grabbed someone else's config.py file.

3. Decide upon exception handling strategy up front.
Don't mindlessly catch all errors, unless it's your last line of defense:
try:
xxx()
except:
yyyy
This hides errors of all kinds under a single description or handler.
Check out http://lwn.net/1999/0610/devel.phtml "The Python Way" by Tim
Peters

Do something like this:
import projectExceptions
try:
xxx()
except projectExceptions.category1 , msg:
# Our users don't want to see huge tracebacks.
if debug:
traceback.print_exc()
log(sys.exc_type, msg)
else:
log(msg)
except:
# Unknown error
if debug:
log(sys.exc_type)
else:
log(projectExceptions.defaultMsg())

4. How should you handle internationalization.
I don't know.

5. Python doesn't care about types.
Good and bad this. Good you pass me an object and all I care about is
that it has a write method. So much nicer than C++.

Bad, you pass me an object with out a write() and it's not detected
until runtime.
I'm not sure but I think freeze can detect this. But don't write a boat
load of code then try a tool like freeze on it. You'll probably do something
it can't handle.

6. No const or private.
You have to trust everyone.

Had enough ?
Do you think I have some opinions on this ?
--
--Darrell
Python for large projects? [ In reply to ]
On Sat, 26 Jun 1999 16:52:44 -0400, Darrell <news@dorb.com> wrote:
>This isn't a memory cycle but a problem still.
>Don't:
> buf=open(file).read()
> doTheBigJob(buf)
> # buf won't be freed until this function exits. Which might be the
>entire run of the program.
> # The idea is don't leave a reference at the top level
> # Even worse is if at a lower level you want to change buf. Try
>this and monitor memory usage.
>>>> s=" "*999999
>>>> b=s
>>>> b=b+" "
>
>Do something like this:
> parm={}
> parm['buf']=open(file).read()
> doTheBigJob(parm)
> # Now doTheBigJob can "del parm['buf']" to free the memory.
> # Or pass the buffer with in an object of some kind.

Actually, this isn't fully necessary. Borrowing from your first
example above, all you need is:

buf = open(file).read()
doTheBigJob(buf)
del buf

No need to introduce new passing schema or wrap the buffer object in
another object at all; this will clean it up for you.

Mind you, ideally (for a number of reasons) you want to avoid opening
a file in the top level of a module /anyway/, better to embed the
opening and data-extraction in its own function which you then exit,
returning the space used by the variables therein to the pool
naturally.

If the only thing you're doing with the content of file is passing it
to doTheBigJob, then its even /better/ to simply call it as:

doTheBigJob(open(file).read())

.... since that will only contain the content for the temporal extent
of the local its bound to within the function.

>Look out for:
> dict1={}
> dict1['dict']=dict1
> # This and variations on this theme become immortal. Unless you kill it
>off carefully.

Cycles in any structure are dangerous and usually indicate a failure
of the author to understand the actual structure of the data he's
trying to model. On the other hand, some data representations really
/do/ require recursive representations and, as such, probably bear
watching closely whether you're in a strictly reference-counting
environ or in a language implimentation with GC.

>6. No const or private.
> You have to trust everyone.

This always struck me as a really /odd/ concern. If you don't trust
the people on your programming team, why are they on your team? If
you're afraid the hooks into your inner-code are visible, then rethink
why you're concerned. Mathods/slots on Python objects can be prefixed
with _ and __ to make them harder to casually examine, but I've always
equated 'reference hiding' with 'security through obscurity,' it
doesn't protect you and it lends a false sense of hope.

--
Alexander Williams (thantos@gw.total-web.net)
"In the end ... Oblivion Always Wins."
Python for large projects? [ In reply to ]
>
> buf = open(file).read()
> doTheBigJob(buf)
> del buf
>
> No need to introduce new passing schema or wrap the buffer object in
> another object at all; this will clean it up for you.
>
My point was this isn't cleaned up until you return from doTheBigJob.

>
> If the only thing you're doing with the content of file is passing it
> to doTheBigJob, then its even /better/ to simply call it as:
>
> doTheBigJob(open(file).read())
>
No doubt this is a clean way to open the file. But it doesn't work if you
wanted to act on the buffer before passing it on. I'm not attacking python
or reference counting here and agree that for most programs these aren't
even concerns. But when your shinny new application comes together and eats
200meg, you start to wonder. Tracking though everyone's code trying to
understand who has all this memory is real fun. My point was to think about
this stuff from the start, keep an eye on memory usage. Reference counting
can give a false sense of security.


> >6. No const or private.
> > You have to trust everyone.
>
> This always struck me as a really /odd/ concern. If you don't trust
> the people on your programming team, why are they on your team? If
> you're afraid the hooks into your inner-code are visible, then rethink
> why you're concerned. Mathods/slots on Python objects can be prefixed
> with _ and __ to make them harder to casually examine, but I've always
> equated 'reference hiding' with 'security through obscurity,' it
> doesn't protect you and it lends a false sense of hope.
>
Good or bad, this is one C++ guys aren't comfortable with. The project I'm
on now will allow people with minimal programming experience to write a
little python to control the larger application. One of my primary concerns
is performance and an optimization relies on them not changing a large
buffer, only look at it. So the people I don't trust aren't on my team. What
was it Murphy said ? "If it can go wrong"
I'm tempted to do this object in C++.
Agreed using __ to hide a name doesn't seem worth while.

I appreciate your reply because I'm still seeking the truth of these
matters.
--Darrell
Python for large projects? [ In reply to ]
"Darrell" <news@dorb.com> writes:

> Good or bad, this is one C++ guys aren't comfortable with. The project I'm
> on now will allow people with minimal programming experience to write a
> little python to control the larger application. One of my primary concerns
> is performance and an optimization relies on them not changing a large
> buffer, only look at it. So the people I don't trust aren't on my team. What
> was it Murphy said ? "If it can go wrong"
> I'm tempted to do this object in C++.
> Agreed using __ to hide a name doesn't seem worth while.
>

If it is only one object, one can try to wrap it in a class and
overwrite setattr, setitem etc.
This would inhibit that one changes the buffer data accidently. And
put the data into an immutable sequence type (string, tuple).

__Janko

--
Institut fuer Meereskunde phone: 49-431-597 3989
Dept. Theoretical Oceanography fax : 49-431-565876
Duesternbrooker Weg 20 email: jhauser@ifm.uni-kiel.de
24105 Kiel, Germany
Python for large projects? [ In reply to ]
I've used Python for three large distributed applications (each > 100k
lines of Python). The only other recommendation that I would add is
to define interfaces to the code. If this is done with a product like
ILU or FNORB, you can then easily call the Python code from other
languages. In addition, you can replace critical sections with other
languages if needed.

--
Jody Winston
Lamont-Doherty Earth Observatory
RT 9W, Palisades, NY 10964
jody@ldeo.columbia.edu, 914 365 8526, Fax 914 359 1631

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission. Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.
Python for large projects? [ In reply to ]
> If it is only one object, one can try to wrap it in a class and
> overwrite setattr, setitem etc.
> This would inhibit that one changes the buffer data accidently. And
> put the data into an immutable sequence type (string, tuple).
>
> __Janko
>
I'll give that a look, but they can always get to it if they want. Maybe
that's ok, just worries me.

--Darrell
Python for large projects? [ In reply to ]
Jody Winston <jody@ldgo.columbia.edu> writes:

> I've used Python for three large distributed applications (each > 100k
> lines of Python). The only other recommendation that I would add is
> to define interfaces to the code. If this is done with a product like
> ILU or FNORB, you can then easily call the Python code from other
> languages. In addition, you can replace critical sections with other
> languages if needed.
>
I have thought about this as the perfect way to do ``data hiding'',
publish only special interfaces for your data. Has this a significant
overhead, if used on one machiene, or is it overkill just for this
purpose :-)?

__Janko

--
Institut fuer Meereskunde phone: 49-431-597 3989
Dept. Theoretical Oceanography fax : 49-431-565876
Duesternbrooker Weg 20 email: jhauser@ifm.uni-kiel.de
24105 Kiel, Germany
Python for large projects? [ In reply to ]
"Darrell" <news@dorb.com> writes:

> > If it is only one object, one can try to wrap it in a class and
> > overwrite setattr, setitem etc.
> > This would inhibit that one changes the buffer data accidently. And
> > put the data into an immutable sequence type (string, tuple).
> >
> > __Janko
> >
> I'll give that a look, but they can always get to it if they want. Maybe
> that's ok, just worries me.
>
> --Darrell

If you are really really worried about not letting code you let the
user write mucking up your internals, use RExec. I don't think there's
any other way, and I believe it's pretty configurable, although I
haven't used it myself.

HTH
Michael
Python for large projects? [ In reply to ]
>>>>> "Janko" == Janko Hauser <jhauser@ifm.uni-kiel.de> writes:

Janko> Jody Winston <jody@ldgo.columbia.edu> writes:
>> I've used Python for three large distributed applications (each
>> > 100k lines of Python). The only other recommendation that I
>> would add is to define interfaces to the code. If this is done
>> with a product like ILU or FNORB, you can then easily call the
>> Python code from other languages. In addition, you can replace
>> critical sections with other languages if needed.
>>
Janko> I have thought about this as the perfect way to do ``data
Janko> hiding'', publish only special interfaces for your
Janko> data. Has this a significant overhead, if used on one
Janko> machiene, or is it overkill just for this purpose :-)?

For my data, which are large arrays read from disk, the ILU program is
50% slower on a simple benchmark that just transfers the arrays from
one address space to another on the same machine. The xml-rpc
program, which transfers the same data from a client to a server, is 2
times slower as the ILU program[1].

However both of these bechmarks, for my case, do not really mean
anything since the real code is compute bound.

To understand the impact of a distributed object system will have on
your design, you'll need to understand how you will access your data
and the possible bottlenecks.

Footnotes:
[1] That's really amazing in my book since all of the xml-rpc data
is ASCII.

--
Jody Winston
Lamont-Doherty Earth Observatory
RT 9W, Palisades, NY 10964
jody@ldeo.columbia.edu, 914 365 8526, Fax 914 359 1631

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission. Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.
Python for large projects? [ In reply to ]
Darrell wrote:
>
> I'll give that a look, but they can always get to it if they want. Maybe
> that's ok, just worries me.

If you clearly warn them in the docs not to do
it, and they do it anyway and get into trouble,
it's not your fault.

Greg
Python for large projects? [ In reply to ]
In article <3776EF65.A07CBEFC@compaq.com>, Greg Ewing wrote:
>Darrell wrote:
>>
>> I'll give that a look, but they can always get to it if they want. Maybe
>> that's ok, just worries me.
>
>If you clearly warn them in the docs not to do
>it, and they do it anyway and get into trouble,
>it's not your fault.
>
>Greg

But what does management think?? If the user screws something up how
much down time/loss of data will result? I remember people coming back
from desert storm and telling me about how people were playing with
unexploded submunitions, aka small bombs with flaky fuses, I am not
kidding on this. So the question is how much damage can an moron
do with a screw driver to your program and is that an acceptable
tradeoff for the benifits of python? If you are realy parinoid
present it to your boss and ask him to decide.

marc

ps just because you are parinoid does not mean you are wrong.
Python for large projects? [ In reply to ]
> >Darrell wrote:
> >>
> >> I'll give that a look, but they can always get to it if they want.
Maybe
> >> that's ok, just worries me.
> >
> >If you clearly warn them in the docs not to do
> >it, and they do it anyway and get into trouble,
> >it's not your fault.
> >
> >Greg
>
> But what does management think?? If the user screws something up how
> much down time/loss of data will result? I remember people coming back
> from desert storm and telling me about how people were playing with
> unexploded submunitions, aka small bombs with flaky fuses, I am not
> kidding on this. So the question is how much damage can an moron
> do with a screw driver to your program and is that an acceptable
> tradeoff for the benifits of python? If you are realy parinoid
> present it to your boss and ask him to decide.
>
> marc
>
> ps just because you are parinoid does not mean you are wrong.

Yes I'm paranoid. Got trained that way in the medical instrument business.
But now I'm in the publishing business where I'm sure they wonder about me.

I'm glad I brought this up because Michael Hudson suggested using RExec.
This sounds like a perfect solution to my concerns. Even better than C++
const.

--Darrell
Python for large projects? [ In reply to ]
> I'm glad I brought this up because Michael Hudson suggested using RExec.
> This sounds like a perfect solution to my concerns. Even better than C++
> const.
>
> --Darrell
>

Read up on RExec and don't think it's the solution. Thanks anyway. I had
been meaning to learn about RExec.
--Darrell
Python for large projects? [ In reply to ]
Darrell wrote:
>
> Read up on RExec and don't think it's the solution.

I think the only way to make an exposed data object
completely tamper-proof in Python is, as was suggested,
implement it as a C extension.

Maybe it could maybe be a thin C layer that has Python on
the other side. Would it be feasible to design a
general-purpose "rinstance" object that does for
data structures what rexec does for namespaces?

Greg
Python for large projects? [ In reply to ]
Excerpts from ext.python: 27-Jun-99 Re: Python for large projects? Janko
Hauser@ifm.uni-kie (881)

> Jody Winston <jody@ldgo.columbia.edu> writes:

> > I've used Python for three large distributed applications (each > 100k
> > lines of Python). The only other recommendation that I would add is
> > to define interfaces to the code. If this is done with a product like
> > ILU or FNORB, you can then easily call the Python code from other
> > languages. In addition, you can replace critical sections with other
> > languages if needed.
> >
> I have thought about this as the perfect way to do ``data hiding'',
> publish only special interfaces for your data. Has this a significant
> overhead, if used on one machiene, or is it overkill just for this
> purpose :-)?

I've done this a fair amount with Python and ILU. I specify the
`public' ILU interface, which anyone can call, then also specify various
other interfaces, private to one more specific management concern or
another. Typically the private interfaces contain subtypes of the
object types declared in the public interface, with additional methods
that allow more access. Then I implement only the more-derived object
types in my Python code. So there's no real overhead in the code or in
the method calls. The users of the public interface only get to do the
public operations; the more specialized clients get to do the fancier
operations.

Bill
Python for large projects? [ In reply to ]
Looks like Marc-André Lemburg already did this.
http://starship.python.net/crew/lemburg/

--
--Darrell
Greg Ewing <greg.ewing@compaq.com> wrote in message
news:377829FE.EA457FDB@compaq.com...
> Darrell wrote:
> >
> > Read up on RExec and don't think it's the solution.
>
> I think the only way to make an exposed data object
> completely tamper-proof in Python is, as was suggested,
> implement it as a C extension.
>
> Maybe it could maybe be a thin C layer that has Python on
> the other side. Would it be feasible to design a
> general-purpose "rinstance" object that does for
> data structures what rexec does for namespaces?
>
> Greg