Mailing List Archive

Bug report: memory leak in python 1.5.2
Environment: Solaris 2.5.1, gcc

The following script chews up cycles (as expected) as well as memory
(megabytes every few seconds).

Does anyone know why?

======================================================================
import os
while 1:
os.environ['FOO'] = 'BAR'
======================================================================

--
Greg McFarlane: INMS Telstra Australia (gregm@iname.com)
Today's forecast: Sunny, with occasional cloudy periods and a chance
of rain in some areas.
Bug report: memory leak in python 1.5.2 [ In reply to ]
On Wed, 26 May 1999, Greg McFarlane wrote:

> Environment: Solaris 2.5.1, gcc
>
> The following script chews up cycles (as expected) as well as memory
> (megabytes every few seconds).
>
> Does anyone know why?
>
> ======================================================================
> import os
> while 1:
> os.environ['FOO'] = 'BAR'
> ======================================================================

No, but just to keep you sane:
Mine does exactly the same,

Python:
Python 1.5.2 (#1, May 23 1999, 07:05:44) [GCC 2.8.1] on sunos4
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam

uname -a:
SunOS sunset 4.1.4 9 sun4m

(I'll check it later on my linux box)
--
Moshe Zadka <mzadka@geocities.com>.
QOTD: My own exit is more likely to be horizontal then perpendicular.
Bug report: memory leak in python 1.5.2 [ In reply to ]
Greg McFarlane wrote:
>
> Environment: Solaris 2.5.1, gcc
>
> The following script chews up cycles (as expected) as well as memory
> (megabytes every few seconds).
>
> Does anyone know why?
>
> ======================================================================
> import os
> while 1:
> os.environ['FOO'] = 'BAR'
> ======================================================================

The code says:

/* XXX This leaks memory -- not easy to fix :-( */
if ((new = malloc(strlen(s1) + strlen(s2) + 2)) == NULL)
return PyErr_NoMemory();
(void) sprintf(new, "%s=%s", s1, s2);
if (putenv(new)) {
posix_error();
return NULL;
}

Don't why the free(new); is missing though... maybe some putenv()
implementation expect a malloced string (the putenv implementation
in posixmodule.c for NeXT does) while other make a copy first.

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 219 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
Bug report: memory leak in python 1.5.2 [ In reply to ]
Moshe Zadka <moshez@math.huji.ac.il> writes:
> On Wed, 26 May 1999, Greg McFarlane wrote:
>
> > Environment: Solaris 2.5.1, gcc
> >
> > The following script chews up cycles (as expected) as well as memory
> > (megabytes every few seconds).
> >
> > Does anyone know why?
> >
> > ======================================================================
> > import os
> > while 1:
> > os.environ['FOO'] = 'BAR'
> > ======================================================================
>
> No, but just to keep you sane:
> Mine does exactly the same,
>
> Python:
> Python 1.5.2 (#1, May 23 1999, 07:05:44) [GCC 2.8.1] on sunos4
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>
> uname -a:
> SunOS sunset 4.1.4 9 sun4m
>
> (I'll check it later on my linux box)

Well poking around the source to posixmodule.c one finds this:

/* XXX This leaks memory -- not easy to fix :-( */
if ((new = malloc(strlen(s1) + strlen(s2) + 2)) == NULL)
return PyErr_NoMemory();
(void) sprintf(new, "%s=%s", s1, s2);
if (putenv(new)) {
posix_error();
return NULL;
}

so at least someone knows about the problem.

Yours,
Michael
Bug report: memory leak in python 1.5.2 [ In reply to ]
M.-A. Lemburg writes:
> Don't why the free(new); is missing though... maybe some putenv()
> implementation expect a malloced string (the putenv implementation
> in posixmodule.c for NeXT does) while other make a copy first.

At least on Solaris, the malloc()'ed memory must survive until the
same key is re-used.
I think using a Python string for the malloc()'ed memory is
acceptable (it's just a buffer, after all). I took a stab at using a
dictionary to "cache" the last string passed to putenv() for each key;
when using the same key again, the old value is removed from the
dictionary and deleted. Greg's code doesn't appear to leak with my
patch, at any rate (running for over 7 minutes on a 200Mhz
UltraSPARC).
I've appended the patch below for others to try out and verify. If
others get positive results and can't think of reasons not to accept
this, perhaps I can knock Guido over the head with it when he gets
back to town. ;-)


-Fred

--
Fred L. Drake, Jr. <fdrake@acm.org>
Corporation for National Research Initiatives

Index: posixmodule.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Modules/posixmodule.c,v
retrieving revision 2.110
diff -c -c -r2.110 posixmodule.c
*** posixmodule.c 1999/04/07 15:49:41 2.110
--- posixmodule.c 1999/05/26 14:29:08
***************
*** 2776,2781 ****
--- 2776,2785 ----
int putenv( const char *str );
#endif

+ /* Save putenv() parameters as values here, so we can collect them when they
+ * get re-set with another call for the same key. */
+ static PyObject *posix_putenv_garbage;
+
static PyObject *
posix_putenv(self, args)
PyObject *self;
***************
*** 2783,2788 ****
--- 2787,2793 ----
{
char *s1, *s2;
char *new;
+ PyObject *newstr;

if (!PyArg_ParseTuple(args, "ss", &s1, &s2))
return NULL;
***************
*** 2810,2823 ****
} else {
#endif

! /* XXX This leaks memory -- not easy to fix :-( */
! if ((new = malloc(strlen(s1) + strlen(s2) + 2)) == NULL)
! return PyErr_NoMemory();
(void) sprintf(new, "%s=%s", s1, s2);
if (putenv(new)) {
posix_error();
return NULL;
}

#if defined(PYOS_OS2)
}
--- 2815,2842 ----
} else {
#endif

! /* XXX This can leak memory -- not easy to fix :-( */
! newstr = PyString_FromStringAndSize(NULL, strlen(s1) + strlen(s2) + 2);
! if (newstr == NULL)
! return PyErr_NoMemory();
! new = PyString_AS_STRING(newstr);
(void) sprintf(new, "%s=%s", s1, s2);
if (putenv(new)) {
posix_error();
return NULL;
}
+ /* Install the first arg and newstr in posix_putenv_garbage;
+ * this will cause previous value to be collected. This has to
+ * happen after the real putenv() call because the old value
+ * was still accessible until then. */
+ if (PyDict_SetItem(posix_putenv_garbage,
+ PyTuple_GET_ITEM(args, 0), newstr)) {
+ /* really not much we can do; just leak */
+ PyErr_Clear();
+ }
+ else {
+ Py_DECREF(newstr);
+ }

#if defined(PYOS_OS2)
}
***************
*** 3489,3492 ****
--- 3508,3513 ----
return;

PyDict_SetItemString(d, "error", PyExc_OSError);
+
+ posix_putenv_garbage = PyDict_New();
}
Bug report: memory leak in python 1.5.2 [ In reply to ]
Fred L. Drake <fdrake@cnri.reston.va.us> wrote:

: M.-A. Lemburg writes:
: > Don't why the free(new); is missing though... maybe some putenv()
: > implementation expect a malloced string (the putenv implementation
: > in posixmodule.c for NeXT does) while other make a copy first.

: At least on Solaris, the malloc()'ed memory must survive until the
: same key is re-used.
: I think using a Python string for the malloc()'ed memory is
: acceptable (it's just a buffer, after all). I took a stab at using a
: dictionary to "cache" the last string passed to putenv() for each key;
: when using the same key again, the old value is removed from the
: dictionary and deleted. Greg's code doesn't appear to leak with my
: patch, at any rate (running for over 7 minutes on a 200Mhz
: UltraSPARC).
: I've appended the patch below for others to try out and verify. If
: others get positive results and can't think of reasons not to accept
: this, perhaps I can knock Guido over the head with it when he gets
: back to town. ;-)

There is also the situation where some UNIX systems put the environment
initially in the u area, and it is difficult to programmatically determine
where different runtime segments are (where is the heap vs. where is the
u area).

Fred, your solution should work because it takes the problem case: what
to do with the string initially, but I think it might be better to copy
the values at module initialization time. I've included an addition to
Fred's patch to be called instead of the PyDict_New() function (in the
module init function).

Also, how should we deal with this in terms of C applications who might
change the environment? (Embedders beware!)

From a programming standpoint, I don't think that it should be "proper"
to be changing the environment all that much. It's purpose is to
propragate values to child processes, not to store runtime values.

-Arcege

Example environment initialization schema (based on Fred Drake's patch).

PyObject *
system_environ_initialize(environ)
char **environ;
{ int i;
char **p;
PyObject *putenv_garbage;
PyObject *key, *env;

if ((posix_putenv_garbage = PyDict_New()) == NULL)
return NULL;
/* populate the dictionary with the values from the environment */
for (p = environ; *p; p++)
/* get the name of the environment variable */
for (i = 0; (*p)[i] && (*p)[i] != '='; i++) ;
key = PyString_FromStringAndSize(*p, i);
env = PyString_FromString(*p);
PyDict_SetItem(posix_putenv_garbage, key, env);
Py_DECREF(key);
Py_DECREF(env);
}
i = 0;
/* rewrite the entire environment with Python strings */
/* we rewrite the entire environment after reading it, not while
reading the address locations */
while (PyDict_Next(posix_putenv_garbage, &i, &key, &env))
putenv(PyString_AS_STRING(env));
return posix_putenv_garbage;
}
Bug report: memory leak in python 1.5.2 [ In reply to ]
Michael P. Reilly writes:
> There is also the situation where some UNIX systems put the environment
> initially in the u area, and it is difficult to programmatically determine
> where different runtime segments are (where is the heap vs. where is the
> u area).
>
> Fred, your solution should work because it takes the problem case: what
> to do with the string initially, but I think it might be better to copy
> the values at module initialization time. I've included an addition to
> Fred's patch to be called instead of the PyDict_New() function (in the
> module init function).

If I understand correctly, your patch avoids the problem of memory
leaked from the initial environment (a static size). Is this correct?
If so, I'm not sure it's worth the extra code. My intention was to
avoid the Python-induced leak that would allow a long-running Python
script that occaisionally created a subprocess to become a MemoryError
traceback. ;-) In the case of systems without a lot of memory
available, the environment should be kept small to begin with (making
the additional data structures created by the startup code more of a
problem).
I don't think I've ever checked the size of the "typical" UNIX
environment; "printenv | wc -c" tells me I'm running under 2Kb in a
fresh shell. Is that enough to worry about, and slow down
initialization?

> Also, how should we deal with this in terms of C applications who might
> change the environment? (Embedders beware!)

In this case, we may not clear all the possible garbage, but we only
leak for keys that are:

1. Changed from Python at least once, then
2. Changed from C, and
3. Never changed from Python again.

Note that only one copy of the variable gets leaked, not an infinite
succession.
In the case of two Python putenv() calls with C putenv() calls
inbetween, we don't introduce any new leaks; the effect is that the
data from the first Python putenv() isn't collected until the second
Python putenv(). This is acceptable.

> From a programming standpoint, I don't think that it should be "proper"
> to be changing the environment all that much. It's purpose is to
> propragate values to child processes, not to store runtime values.

I agree. Processes that run a lot of children, like HTTP servers
running CGI scripts, won't be using their own environment to do this,
but will create the desired environments on the fly. (Especially if
they're threaded!)


-Fred

--
Fred L. Drake, Jr. <fdrake@acm.org>
Corporation for National Research Initiatives
Bug report: memory leak in python 1.5.2 [ In reply to ]
Fred L. Drake <fdrake@cnri.reston.va.us> wrote:

: Michael P. Reilly writes:
: > There is also the situation where some UNIX systems put the environment
: > initially in the u area, and it is difficult to programmatically determine
: > where different runtime segments are (where is the heap vs. where is the
: > u area).
: >
: > Fred, your solution should work because it takes the problem case: what
: > to do with the string initially, but I think it might be better to copy
: > the values at module initialization time. I've included an addition to
: > Fred's patch to be called instead of the PyDict_New() function (in the
: > module init function).
:
: If I understand correctly, your patch avoids the problem of memory
: leaked from the initial environment (a static size). Is this correct?

My initializer only deals with the possible UNIX implimentations of the
environment, by copying the environment (supposedly) before it is
used. There are no guarantees how putenv will modify the environment.
AIX 4.2.1 states that the environment is expanded as needed for new
values, but with some tests, it still shows that putenv/getenv is using
passed (borrowed) values. If we mix Python-changed environment
variables with some that are not as a blanket system-independant
implimentation, I foresee problems. If it is going to be treated as
volitile memory, we should initialize it in a more appropriate manner
(in my thinking).

Remember that the original definition of UNIX put the environment (and
argument list) in the u area, in-accessible to the process without the
passed copies (in main()) and using putenv/getenv. I've never used
Linux, but I wouldn't be surprised if it used something along those
lines. Mixing dynamic string with static could get dangerous (I
recently debugged a library that would return malloc'd string or a
static string indiscriminantly).

Regardless, it was just a suggestion for consistancy, your patch would
work except on a few, very odd systems where the environment isn't
quite what people expect, I was trying to handle that case. But.. read
on, MacDuff.

: If so, I'm not sure it's worth the extra code. My intention was to
: avoid the Python-induced leak that would allow a long-running Python
: script that occaisionally created a subprocess to become a MemoryError
: traceback. ;-) In the case of systems without a lot of memory
: available, the environment should be kept small to begin with (making
: the additional data structures created by the startup code more of a
: problem).
: I don't think I've ever checked the size of the "typical" UNIX
: environment; "printenv | wc -c" tells me I'm running under 2Kb in a
: fresh shell. Is that enough to worry about, and slow down
: initialization?

A common bare minimum environment is: HOME, USER, SHELL, MAIL. That's
not all that much (should be less then 256 on virtually every system).

Most systems have an upper bound based on a limit of the argument list
and the environment (ARG_MAX). POSIX limits this at 4k, but most
systems have it larger (SunOS is at 1Mb). This means that there's a
limit to how much we would be initializing anyway (with my addition).

: > Also, how should we deal with this in terms of C applications who might
: > change the environment? (Embedders beware!)
:
: In this case, we may not clear all the possible garbage, but we only
: leak for keys that are:
:
: 1. Changed from Python at least once, then
: 2. Changed from C, and
: 3. Never changed from Python again.
:
: Note that only one copy of the variable gets leaked, not an infinite
: succession.
: In the case of two Python putenv() calls with C putenv() calls
: inbetween, we don't introduce any new leaks; the effect is that the
: data from the first Python putenv() isn't collected until the second
: Python putenv(). This is acceptable.

True; my thought was in some system where you get a cowboy programmer
who decides to take control of the environment making everything
borrowed in his library. Then you get:

* cowboy initializes
* python changes (with borrowed memory)
* cowboy frees python-borrowed memory and makes change
* python changes again attempting to free already freed memory
...

My point wasn't that we should handle this case, just to make it known
somewhere that Python is now attempting to borrow some aspects of the
environment (depending on the platform). We can't babysit everything,
but taking control of a system managed facility does takes
responsibility.

(I try not to go so far as making these types of statements anymore,
however hypothetical - too many "people" take it personally. It's
better to let intelligent people read and make there own conclusions.)

: > From a programming standpoint, I don't think that it should be "proper"
: > to be changing the environment all that much. It's purpose is to
: > propragate values to child processes, not to store runtime values.
:
: I agree. Processes that run a lot of children, like HTTP servers
: running CGI scripts, won't be using their own environment to do this,
: but will create the desired environments on the fly. (Especially if
: they're threaded!)

Yes, my point to the statement was that people shouldn't be leaking
much from using the current implimentation we have - assuming they use
it "properly". Maybe there should be better education in the docs of
its purpose.

-Arcege
Bug report: memory leak in python 1.5.2 [ In reply to ]
I like the solutions Fred and Michael have proposed. I don't think we
need worry about a small amount of initial "memory leak", only about
the continual loss due to repeated calls to putenv.

FYI, the reason this became a problem was a long running program
(running in Sydney) that continuously received time-tagged data from a
host in another timezone (Perth). The data was in Perth time and the
program needed to convert it GMT time. The routine called was:

def convertLocalTimeToCalendarTime(timeTuple, timeZone):
saveTimeZone = os.environ['TZ']
os.environ['TZ'] = timeZone
calendarTime = time.mktime(timeTuple)
os.environ['TZ'] = saveTimeZone
return calendarTime

In the current configuration <timeZone> is 'Australia/West' and
<saveTimeZone> is 'Australia/NSW'.

The solution is probably to get the mxDateTime package, but I haven't
looked at that yet :)


On 26 May, Fred L. Drake wrote:
>
> Michael P. Reilly writes:
> > There is also the situation where some UNIX systems put the environment
> > initially in the u area, and it is difficult to programmatically determine
> > where different runtime segments are (where is the heap vs. where is the
> > u area).
> >
> > Fred, your solution should work because it takes the problem case: what
> > to do with the string initially, but I think it might be better to copy
> > the values at module initialization time. I've included an addition to
> > Fred's patch to be called instead of the PyDict_New() function (in the
> > module init function).
>
> If I understand correctly, your patch avoids the problem of memory
> leaked from the initial environment (a static size). Is this correct?
> If so, I'm not sure it's worth the extra code. My intention was to
> avoid the Python-induced leak that would allow a long-running Python
> script that occaisionally created a subprocess to become a MemoryError
> traceback. ;-) In the case of systems without a lot of memory
> available, the environment should be kept small to begin with (making
> the additional data structures created by the startup code more of a
> problem).
> I don't think I've ever checked the size of the "typical" UNIX
> environment; "printenv | wc -c" tells me I'm running under 2Kb in a
> fresh shell. Is that enough to worry about, and slow down
> initialization?
>
> > Also, how should we deal with this in terms of C applications who might
> > change the environment? (Embedders beware!)
>
> In this case, we may not clear all the possible garbage, but we only
> leak for keys that are:
>
> 1. Changed from Python at least once, then
> 2. Changed from C, and
> 3. Never changed from Python again.
>
> Note that only one copy of the variable gets leaked, not an infinite
> succession.
> In the case of two Python putenv() calls with C putenv() calls
> inbetween, we don't introduce any new leaks; the effect is that the
> data from the first Python putenv() isn't collected until the second
> Python putenv(). This is acceptable.
>
> > From a programming standpoint, I don't think that it should be "proper"
> > to be changing the environment all that much. It's purpose is to
> > propragate values to child processes, not to store runtime values.
>
> I agree. Processes that run a lot of children, like HTTP servers
> running CGI scripts, won't be using their own environment to do this,
> but will create the desired environments on the fly. (Especially if
> they're threaded!)
>
>
> -Fred
>
> --
> Fred L. Drake, Jr. <fdrake@acm.org>
> Corporation for National Research Initiatives
>

--
Greg McFarlane: INMS Telstra Australia (gregm@iname.com)
Today's forecast: Sunny, with occasional cloudy periods and a chance
of rain in some areas.