Mailing List Archive

"High water" Memory fragmentation still a thing?
Question:

Does python in general terms (apart from extensions or gc manipulation),
exhibit a "high water" type leak of allocated memory in recent python
versions (2.7+)?

Background:


From the post:
http://chase-seibert.github.io/blog/2013/08/03/diagnosing-memory-leaks-python.html

Begin quote:

Long running Python jobs that consume a lot of memory while running may not
return that memory to the operating system until the process actually
terminates, even if everything is garbage collected properly. That was news
to me, but it's true. What this means is that processes that do need to use
a lot of memory will exhibit a "high water" behavior, where they remain
forever at the level of memory usage that they required at their peak.

Note: this behavior may be Linux specific; there are anecdotal reports that
Python on Windows does not have this problem.

This problem arises from the fact that the Python VM does its own internal
memory management. It's commonly know as memory fragmentation.
Unfortunately, there doesn't seem to be any fool-proof method of avoiding
it.

End Quote

However this paper seems to indicate that that is not a modern problem:
http://www.evanjones.ca/memoryallocator/


--Thanks
Dave Butler
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
Hello,

Croepha <croepha <at> gmail.com> writes:
>
> Question:
>
> Does python in general terms (apart from extensions or gc manipulation),
exhibit a "high water" type leak of allocated memory in recent python
versions (2.7+)?

It is not a leak. It is a quite common pattern of memory fragmentation.
The article is wrong in assuming that Python doesn't return the memory to
the OS. Python does return its empty memory pools to the OS, however the OS
itself may not be able to release that memory, because of heap fragmentation.

As the article mentions, this was improved (mostly fixed?) in 3.3.

Regards

Antoine.


--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On 03.10.2014 21:16, Antoine Pitrou wrote:
> It is not a leak. It is a quite common pattern of memory fragmentation.
> The article is wrong in assuming that Python doesn't return the memory to
> the OS. Python does return its empty memory pools to the OS, however the OS
> itself may not be able to release that memory, because of heap fragmentation.

The article doesn't state if the writer is referring to virtual memory
or resident set size. For long-running 32bit processes it is quite
common to run out of virtual address space. But that's something totally
different than running out of memory. A 64bit process can have a virtual
address size of several GB but only occupy a few hundred MB of physical
memory. People tend to confuse the meaning of VSZ and RSS and forget
about paging.


--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On Fri, Oct 3, 2014 at 1:36 PM, Croepha <croepha@gmail.com>
wrote:

> Long running Python jobs that consume a lot of memory while
> running may not return that memory to the operating system
> until the process actually terminates, even if everything is
> garbage collected properly.

(I see Antoine replied that this is mostly fixed starting in
3.3. My response below was written with my Unix/Python 2
rose-colored glasses firmly affixed to my face.)

Unless you are blessed with a long-running program which does
everything exactly right, I'd be surprised if any such program
behaved the way you want. Python may be extreme in this regard,
since it often allcates small objects, and its special object
allocator (obmalloc - is it still used?) might mitigate the
problem by collecting allocations of similar size together
(malloc implementations probably do some of this as well), but
it's still going to have these issues.

The problem boils down to how the program dynamically allocates
and frees memory, and how the malloc subsystem interacts with
the kernel through the brk and sbrk system calls. (Anywhere I
mention "brk", you can mentally replace it with "sbrk". They do
the same thing - ask for memory from or return memory to the
kernel - using a different set of units, memory addresses or
bytes.) In the general case, programmers call malloc (or
calloc, or realloc) to allocate a chunk of storage from the
heap. (I'm ignoring anything special which Python has layered
on top of malloc. It can mitigate problems, but I don't think
it will fundamentally change the way malloc interacts with the
kernel.) The malloc subsystem maintains a free list (recently
freed bits of memory) from which it can allocate memory without
traipsing off to the kernel. If it can't return a chunk of
memory from the free list, it will (in the most simpleminded
malloc implementation) call brk to grab a new (large) chunk of
memory. The system simply moves the end of the program's
"break", effectively increasing or decreasing the (virtual) size
of the running program. That memory is then doled out to the
user by malloc. If, and only if, every chunk of memory in the
last chunk allocated by a call to brk is placed on malloc's free
list, *and* if the particular malloc implementation on your box
is smart enough to coalesce adjacent chunks of freed memory back
into brk-sized memory chunks, can brk be called once again to
reduce the program's footprint.

Example... I don't know how much memory malloc requests from
brk, so let's make things easy, and make the following
assumptions:

* Assume malloc calls brk to allocate 1MB chunks.

* Assume the program only ever calls malloc(1024).

* Assume malloc's own overhead (free list, etc) is zero.

So, starting afresh, having no free memory, the first time we
call malloc(1024), it calls brk asking for 1MB, then returns its
caller a pointer to that 1024-byte chunk. Now call malloc(1024)
1023 more times. We have used up that entire 1MB chunk of
memory. Now free each of those chunks by calling free() 1024
times. We are left, once again, with a 1MB chunk of free memory.
It might be stitched together by malloc into one single block of
memory, or it might appear on the free list as 1024 chunks of
memory. Should malloc return it to the system with a call to
brk? Maybe. Maybe not. Maybe the malloc subsystem goes
through all the necessary bookkeeping to hand that memory back
to the system, only to find that the next thing the program does
is make another malloc(1024) call. Whoops. Now you have to call
brk again. And system calls are expensive.

Now, consider a similar case. You make 1024 calls to
malloc(1024). Then you free all of them except the 512th
one. Now you have a 1MB chunk of memory on malloc's free list
which is entirely free, except for a small chunk in the middle.
That chunk is broken into three fragments, two free fragments,
separated by one chunk which is still in use. Can't free that.
Well, perhaps you could, but only the top half of it. And you'd
have the same dilemma as before. Should you return it or not?

Final consideration. Suppose your program makes 1023 calls to
malloc(1024) and frees them all, but sometime during that work,
a low-level library far from the programmer's view also calls
malloc to get a 1KB chunk. Even if your program (e.g., the
Python runtime) was perfectly well-behaved and returned all of
the 1023 chunks of memory it had requested, it has no control
over this low-level library. You're still stuck with a
fragmented chunk of memory, free except for a hole somewhere in
the middle.

Long story short, there's only so much you can do to try to
return memory to the system. I am not up-to-date on the latest
malloc intelligence, but it's a challenging enough problem that
it was a long time before any malloc implementation attempted to
automatically return memory pages to the system.

Finally, as this is not really Python-specific, you might want
to do some reading on how malloc is implemented. It can be
pretty arcane, but I'm sure it would make for fascinating
cocktail party conversation. <wink>

Skip
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On Sat, Oct 4, 2014 at 4:36 AM, Croepha <croepha@gmail.com> wrote:
> What this means is that processes that do need to use a lot of memory will
> exhibit a "high water" behavior, where they remain forever at the level of
> memory usage that they required at their peak.

This is almost never true. What you will see, though, is something
like Christian described; pages get allocated, and then partially
used, and you don't always get all that memory back. In theory, a high
level language like Python would be allowed to move objects around to
compact memory, but CPython doesn't do this, and there's no proof that
it'd really help anything anyway. (Look at Christian's comments about
"Should you return it or not?" and the cost of system calls... now
consider the orders-of-magnitude worse cost of actually moving memory
around.)

This is why a lot of long-duration processes are built to be restarted
periodically. It's not strictly necessary, but it can be the most
effective way of solving a problem. I tend to ignore that, though, and
let my processes just keep on running... for 88 wk 4d 23:56:27 so far,
on one of those processes. It's consuming less than half a gig of
virtual memory, quarter gig resident, and it's been doing a fair bit
(it keeps all sorts of things in memory to avoid re-decoding from
disk). So don't worry too much about memory usage until you see that
there's actually a problem; with most Python processes, you'll restart
them to deploy new code sooner than you'll restart them to fix memory
problems. (The above example isn't a Python process, and code updates
happen live.) In fact, I'd advise that as a general policy: Don't
panic about any Python limitation until you've proven that it's
actually a problem. :)

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
Chris Angelico wrote:

> In theory, a high
> level language like Python would be allowed to move objects around to
> compact memory, but CPython doesn't do this, and there's no proof that
> it'd really help anything anyway.

I welcome correction, but I understand that both the JVM and the CLR memory
managers move memory around. That's why Jython and IronPython use
sequential integers as object IDs, since memory locations are not fixed.

Way back in the mid 1980s, Apple Macintoshes used a memory manager which
could move memory around. Given that the Macs of the day had 128K of RAM,
of which something like a third or a half was used for the screen, being
able to move blocks of memory around to avoid fragmentation was critical,
so I guess that proves that it would help at least one thing.


> (Look at Christian's comments about
> "Should you return it or not?" and the cost of system calls... now
> consider the orders-of-magnitude worse cost of actually moving memory
> around.)
>
> This is why a lot of long-duration processes are built to be restarted
> periodically.

Ironically, the cost of restarting the process periodically is likely to be
orders of magnitude more expensive than that of moving a few blocks of
memory around from time to time. Especially on Windows, where starting
processes is expensive, but even on Linux you have to shut the running
application down, then start it up again and rebuild all the internal data
structures that you just tore down...


--
Steven

--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On Sat, Oct 4, 2014 at 11:02 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Chris Angelico wrote:
>
>> In theory, a high
>> level language like Python would be allowed to move objects around to
>> compact memory, but CPython doesn't do this, and there's no proof that
>> it'd really help anything anyway.
>
> I welcome correction, but I understand that both the JVM and the CLR memory
> managers move memory around. That's why Jython and IronPython use
> sequential integers as object IDs, since memory locations are not fixed.

Right; I should have made it clearer that there's no proof that it'd
help anything *in CPython*. Removing the GIL is periodically proposed,
too, but there's no proof that its removal would benefit CPython; it's
not just that nobody's gotten around to writing a memory compactor for
CPython.

> Ironically, the cost of restarting the process periodically is likely to be
> orders of magnitude more expensive than that of moving a few blocks of
> memory around from time to time. Especially on Windows, where starting
> processes is expensive, but even on Linux you have to shut the running
> application down, then start it up again and rebuild all the internal data
> structures that you just tore down...

Maybe. But you deal with a number of things all at once:

1) Code updates (including interpreter updates)
2) Compaction of Python objects
3) Disposal of anything that got "high level leaked" - unintended
longevity caused by a global reference of some sort
4) Cleanup of low-level allocations that don't go through the Python
memory manager
etc etc etc.

So, yes, it's expensive. And sometimes it's not even possible (there's
no way to retain socket connections across a restart, AFAIK). But it's
there if you want it.

Personally, I like to keep processes running, but that's me. :)

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
Chris Angelico wrote:

> On Sat, Oct 4, 2014 at 11:02 AM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> Chris Angelico wrote:
>>
>>> In theory, a high
>>> level language like Python would be allowed to move objects around to
>>> compact memory, but CPython doesn't do this, and there's no proof that
>>> it'd really help anything anyway.
>>
>> I welcome correction, but I understand that both the JVM and the CLR
>> memory managers move memory around. That's why Jython and IronPython use
>> sequential integers as object IDs, since memory locations are not fixed.
>
> Right; I should have made it clearer that there's no proof that it'd
> help anything *in CPython*. Removing the GIL is periodically proposed,
> too, but there's no proof that its removal would benefit CPython; it's
> not just that nobody's gotten around to writing a memory compactor for
> CPython.

I think that you're conflating a couple of different issues, although I
welcome correction.

I don't think that removing the GIL is a requirement for a memory compactor,
or vice versa. I think that PyPy is capable of plugging in various
different garbage collectors, including some without the GIL, which may or
may not include memory compactors. So as far as I can tell, the two are
independent.

As far as the GIL in CPython goes, there have been at least two attempts to
remove it, and they do show strong improvements for multi-threaded code
running on multi-core machines. Alas, they also show significant *slowdown*
for single-core machines, and very little improvement on dual-core
machines.

[.Aside: The thing that people fail to understand is that the GIL is not in
fact something which *prevents* multi-tasking, but it *enables* cooperative
multi-tasking:

http://www.dabeaz.com/python/GIL.pdf

although that's not to say that there aren't some horrible performance
characteristics of the GIL. David Beazley has identified issues with the
GIL which suggest room for improving the GIL and avoiding "GIL battles"
which are responsible for much of the overhead of CPU-bound threads. Any C
programmers who want to hack on the interpreter core?]


Nevertheless, you're right that since nobody has actually built a version of
CPython with memory compactor, there's no *proof* that it would help
anything.


--
Steven

--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On Sat, Oct 4, 2014 at 3:48 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> I think that you're conflating a couple of different issues, although I
> welcome correction.
>
> I don't think that removing the GIL is a requirement for a memory compactor,
> or vice versa. I think that PyPy is capable of plugging in various
> different garbage collectors, including some without the GIL, which may or
> may not include memory compactors. So as far as I can tell, the two are
> independent.
>
> As far as the GIL in CPython goes, there have been at least two attempts to
> remove it, and they do show strong improvements for multi-threaded code
> running on multi-core machines. Alas, they also show significant *slowdown*
> for single-core machines, and very little improvement on dual-core
> machines.

Not conflating, comparing. In both cases, it's perfectly possible *in
theory* to build a CPython with this change. (GIL removal is clearly
possible in theory, as it's been done in practice.) And *in theory*,
there's some benefit to be gained by doing so. But it's not a case of
"why doesn't python-dev just knuckle down and do it already", as
there's no evidence that it's a real improvement. (A "compile to
machine code" feature might well be purely beneficial, and that's
simply a matter of work - how much work, for how many CPU
architectures, to get how much benefit. But at least that's going to
have a fairly good expectation of performance improvement.) A memory
compactor might well help a narrow set of Python programs (namely,
those which allocate heaps and heaps of memory, then throw away most
of it, and keep running for a long time), but the complexity cost will
make it unlikely to be of benefit.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
Croepha <croepha@gmail.com> writes:
> Does python in general terms (apart from extensions or gc manipulation),
> exhibit a "high water" type leak of allocated memory in recent python
> versions (2.7+)?

Likely because it is very difficult to avoid. Would you need to
definitely prevent it is "memory compaction": not only is garbage
collected and freed but in addition, the used memory is also
relocated to form contigous blocks of used and free memory.

Without memory compaction, long running processes tend to suffer
from "memory fragmentation": while sufficient free memory is available,
it is available only in small blocks, not large enough for some memory
requests and those requests then call for more memory from the operating
system. When this memory is later released, it may become split up in
smaller blocks and when another large memory request arrives, new memory
may be requested from the operating system (even though the original
block would have been large enough).

Python tries hard to limit the effect of fragmentation (maintaining the
free blocks in bins of given size) but cannot eliminate it completely.


In order to support memory compaction, all C extensions must adhere
to a strict memory access protocol: they cannot freely use C pointers
to access the memory (as the compaction may invalidate those pointers
at any time) but must beforehand announce "I will now be using this pointer"
(such that the compaction does not move the memory) and afterwards
announce "I am no longer using this pointer" (such that memory
compaction becomes again possible for this memory).

As you see from the description, memory compaction presents a heavy burden
for all extension writers.

--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
dieter <dieter@handshake.de>:

> Without memory compaction, long running processes tend to suffer from
> "memory fragmentation": while sufficient free memory is available, it
> is available only in small blocks, not large enough for some memory
> requests

Now this is a question for the computer scientists. The problem is quite
amenable to purely mathematical/statistical treatment. No doubt they've
been at it for decades.

My personal hunch is that GC in general works best with an ample amount
of RAM, where "ample" means, say, ten times the minimum amount needed.
As a bonus, I'm guessing the ample room would also all but remove the
memory fragmentation issue.


Marko
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> [.Aside: The thing that people fail to understand is that the GIL is not in
> fact something which *prevents* multi-tasking, but it *enables* cooperative
> multi-tasking:
>
> http://www.dabeaz.com/python/GIL.pdf
>
> although that's not to say that there aren't some horrible performance
> characteristics of the GIL. David Beazley has identified issues with the
> GIL which suggest room for improving the GIL and avoiding "GIL battles"
> which are responsible for much of the overhead of CPU-bound threads. Any C
> programmers who want to hack on the interpreter core?]

Didn't the "new GIL" fix some of these problems?


Sturla

--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On Fri, Oct 3, 2014 at 1:01 PM, Skip Montanaro <skip.montanaro@gmail.com> wrote:
> On Fri, Oct 3, 2014 at 1:36 PM, Croepha <croepha@gmail.com>
> wrote:
>
>> Long running Python jobs that consume a lot of memory while
>> running may not return that memory to the operating system
>> until the process actually terminates, even if everything is
>> garbage collected properly.

> The problem boils down to how the program dynamically allocates
> and frees memory, and how the malloc subsystem interacts with
> the kernel through the brk and sbrk system calls. (Anywhere I
> mention "brk", you can mentally replace it with "sbrk". They do
> the same thing - ask for memory from or return memory to the
> kernel - using a different set of units, memory addresses or
> bytes.) In the general case, programmers call malloc (or
> calloc, or realloc) to allocate a chunk of storage from the
> heap. (I'm ignoring anything special which Python has layered
> on top of malloc. It can mitigate problems, but I don't think
> it will fundamentally change the way malloc interacts with the
> kernel.) The malloc subsystem maintains a free list (recently
> freed bits of memory) from which it can allocate memory without
> traipsing off to the kernel. If it can't return a chunk of
> memory from the free list, it will (in the most simpleminded
> malloc implementation) call brk to grab a new (large) chunk of
> memory. The system simply moves the end of the program's
> "break", effectively increasing or decreasing the (virtual) size
> of the running program. That memory is then doled out to the
> user by malloc. If, and only if, every chunk of memory in the
> last chunk allocated by a call to brk is placed on malloc's free
> list, *and* if the particular malloc implementation on your box
> is smart enough to coalesce adjacent chunks of freed memory back
> into brk-sized memory chunks, can brk be called once again to
> reduce the program's footprint.

Actually, ISTR hearing that glibc's malloc+free will use mmap+munmap
to allocate and release chunks of memory, to avoid fragmentation.

Digging around on the 'net a bit, it appears that glibc's malloc does
do this (so on most Linux systems), but only for contiguous chunks of
memory above 128K in size.

Here's a pair of demonstration programs (one in C, one in CPython
3.4), which when run under strace on a Linux system, appear to show
that mmap and munmap are being used:
http://stromberg.dnsalias.org/~strombrg/malloc-and-sbrk.html

HTH
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
Christian Heimes <christian <at> python.org> writes:
>
> The article doesn't state if the writer is referring to virtual memory
> or resident set size.

Actually the article mentions the following recipe:

resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

which means the author is probably looking at resident set size.

Regards

Antoine.


--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On 04/10/2014 02:02, Steven D'Aprano wrote:
> Way back in the mid 1980s, Apple Macintoshes used a memory manager which
> could move memory around.

But the memory manager didn't return a pointer to memory the way malloc
does. It returned a pointer to the pointer and you had to double
dereference it to get the heap address (ISTR, 30 years ago now). The
advantage being the memory manager could shuffle the memory about and
update the pointers. Your pointer to a pointer would still point to the
same block after a shuffle. Of course you couldn't hold on to a partial
dereference across system calls... can you guess why? :-)

Linux has (had) a allocation scheme where blocks came from different
sized areas depending on the size requested. So all requests below 4k
came from one heap area, and so on for 16k, 64k 256k, 1M etc. Meaning
that code requesting a freeing up small amounts fragged the small
allocation zone and so a big allocation would die due to fragmentation
of small amounts. That was in kernel 2.4 days, sorry I'm off the
bleeding edge now with how the allocator works in modern kernels.

Andy
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On Sun, 05 Oct 2014 20:23:42 +0100, mm0fmf wrote:

> On 04/10/2014 02:02, Steven D'Aprano wrote:
>> Way back in the mid 1980s, Apple Macintoshes used a memory manager
>> which could move memory around.
>
> But the memory manager didn't return a pointer to memory the way malloc
> does. It returned a pointer to the pointer and you had to double
> dereference it to get the heap address (ISTR, 30 years ago now).

Correct.


> The
> advantage being the memory manager could shuffle the memory about and
> update the pointers. Your pointer to a pointer would still point to the
> same block after a shuffle. Of course you couldn't hold on to a partial
> dereference across system calls... can you guess why? :-)

Because system calls might trigger a memory compaction or move.


Before the move, you have a managed "pointer to pointer" (handle)
pointing to a managed point which points to a block of memory:


handle -----> pointer -----> "Some stuff here"

Grab a copy of the pointer with a partial deref:

myPtr := handle^; (*I'm an old Pascal guy.*)


So we have this:

handle -----> pointer -----> "Some stuff here"
myPtr -----------------------^


Then you call a system routine that moves memory, and the memory manager
moves the block, updating the pointer, but leaving myPtr pointing at
garbage:


handle -----> pointer ----------------------------> "Some stuff here"
myPtr -----------------------^


and as soon as you try using myPtr, you likely get the dreaded Bomb
dialog box:

http://www.macobserver.com/tmo/article/happy-birthday-mac-how-to-recover-from-the-dreaded-bomb-box-error-message

I'm not suggesting that a 1984 memory manager is the solution to all our
problems. I'm just pointing it out as proof that the concept works. If I
knew more about Java and .Net, I could use them as examples instead :-)



--
Steven
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
Chris Angelico wrote:
> This is why a lot of long-duration processes are built to be restarted
> periodically. It's not strictly necessary, but it can be the most
> effective way of solving a problem. I tend to ignore that, though, and
> let my processes just keep on running... for 88 wk 4d 23:56:27 so far,
> on one of those processes. It's consuming less than half a gig of
> virtual memory, quarter gig resident, and it's been doing a fair bit [...]

A shipping product has to meet a higher standard. Planned process mortality is a reasonably simple strategy for building robust services from tools that have flaws in resource management. It assumes only that the operating system reliably reclaims resources from dead processes.

The usual pattern is to have one or two parent processes that keep several worker processes running but do not themselves directly serve clients. The workers do the heavy lifting and are programmed to eventually die, letting younger workers take over.

For an example see the Apache HTTP daemon, particularly the classic pre-forking server. There's a configuration parameter, "MaxRequestsPerChild", that sets how many requests a process should answer before terminating.

--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On Tue, Oct 7, 2014 at 3:21 AM,
<bryanjugglercryptographer@yahoo.com.dmarc.invalid> wrote:
> Chris Angelico wrote:
>> This is why a lot of long-duration processes are built to be restarted
>> periodically. It's not strictly necessary, but it can be the most
>> effective way of solving a problem. I tend to ignore that, though, and
>> let my processes just keep on running... for 88 wk 4d 23:56:27 so far,
>> on one of those processes. It's consuming less than half a gig of
>> virtual memory, quarter gig resident, and it's been doing a fair bit [...]
>
> A shipping product has to meet a higher standard. Planned process mortality is a reasonably simple strategy for building robust services from tools that have flaws in resource management. It assumes only that the operating system reliably reclaims resources from dead processes.
>

Sure, and that's all well and good. But what I just cited there *is* a
shipping product. That's a live server that runs a game that I'm admin
of. So it's possible to do without the resource safety net of periodic
restarts.

> For an example see the Apache HTTP daemon, particularly the classic pre-forking server. There's a configuration parameter, "MaxRequestsPerChild", that sets how many requests a process should answer before terminating.
>

That assumes that requests can be handled equally by any server
process - and more so, that there are such things as discrete
requests. That's true of HTTP, but not of everything. And even with
HTTP, if you do "long polls" [1] then clients might remain connected
for arbitrary lengths of time; either you have to cut them off when
you terminate the server process (hopefully that isn't too often, or
you lose the benefit of long polling), or you retain processes for
much longer.

Restarting isn't necessary. It's like rebooting a computer: people get
into the habit of doing it, because it "fixes problems", but all that
means is that it allows you to get sloppy with resource management.
Working under the constraint that your one process will remain running
for at least a year forces you to be careful, and IMO results in
better code overall.

ChrisA

[1] https://en.wikipedia.org/wiki/Push_technology#Long_polling
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
dieter wrote:
> As you see from the description, memory compaction presents a heavy burden
> for all extension writers.

Particularly because many CPython extensions are actually interfaces to pre-existing libraries. To leverage the system's facilities CPython has to follow the system's conventions, which memory compaction would break.
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
Chris Angelico wrote:
> Sure, and that's all well and good. But what I just cited there *is* a
> shipping product. That's a live server that runs a game that I'm admin
> of. So it's possible to do without the resource safety net of periodic
> restarts.

Nice that the non-Python server you administer stayed up for 88 weeks, but that doesn't really have anything to do with the issue here. The answer to the OP's title question is "yes", high-water memory fragmentation is a real thing, in most platforms including CPython.

The cited article tells of Celery hitting the problem, and the working solution was to "roll the celery worker processes". That doesn't mean to tell a human administrator to regularly restart the server. It's programmatic and it's a reasonably simple and well-established design pattern.

> > For an example see the Apache HTTP daemon, particularly the classic pre-forking server. There's a configuration parameter, "MaxRequestsPerChild", that sets how many requests a process should answer before terminating.
>
> That assumes that requests can be handled equally by any server
> process - and more so, that there are such things as discrete
> requests. That's true of HTTP, but not of everything.

It's true of HTTP and many other protocols because they were designed to support robust operation even as individual components may fail.

> And even with
> HTTP, if you do "long polls" [1] then clients might remain connected
> for arbitrary lengths of time; either you have to cut them off when
> you terminate the server process (hopefully that isn't too often, or
> you lose the benefit of long polling), or you retain processes for
> much longer.

If you look at actual long-polling protocols, you'll see that the server occasionally closing connections is no problem at all. They're actually designed to be robust even against connections that drop without proper shutdown.

> Restarting isn't necessary. It's like rebooting a computer: people get
> into the habit of doing it, because it "fixes problems", but all that
> means is that it allows you to get sloppy with resource management.

CPython, and for that matter malloc/free, have known problems in resource management, such as the fragmentation issue noted here. There are more. Try a Google site search for "memory leak" on http://bugs.python.org/. Do you think the last memory leak is fixed now?

From what I've seen, planned process replacement is the primary techniques to support long-lived mission-critical services in face of resource management flaws. Upon process termination the OS recovers the resources. I love CPython, but on this point I trust the Linux kernel much more.

--
--Bryan
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On 10/8/2014 10:28 AM, bryanjugglercryptographer@yahoo.com.dmarc.invalid
wrote:

> That doesn't mean to tell a human administrator to regularly restart the server. It's programmatic and it's a reasonably simple and well-established design pattern.

I'd call it more a compensation technique than a design pattern*. You
know, like rebooting windows routinely. :)

Emile


*) Alternately known as a workaround or kludge.

--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On Fri, Oct 10, 2014 at 8:39 AM, Emile van Sebille <emile@fenx.com> wrote:
> On 10/8/2014 10:28 AM, bryanjugglercryptographer@yahoo.com.dmarc.invalid
> wrote:
>
>> That doesn't mean to tell a human administrator to regularly restart the
>> server. It's programmatic and it's a reasonably simple and well-established
>> design pattern.
>
> I'd call it more a compensation technique than a design pattern*. You know,
> like rebooting windows routinely. :)
>
> *) Alternately known as a workaround or kludge.

That sounds about right to me.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
Emile van Sebille <emile@fenx.com> writes:

> On 10/8/2014 10:28 AM,
> bryanjugglercryptographer@yahoo.com.dmarc.invalid wrote:
>
>> That doesn't mean to tell a human administrator to regularly restart the server. It's programmatic and it's a reasonably simple and well-established design pattern.
>
> I'd call it more a compensation technique than a design pattern*. You
> know, like rebooting windows routinely. :)
>
> Emile
>
>
> *) Alternately known as a workaround or kludge.

Well, everything has a price.

Python does not use memory compaction but places many objects on the
heap and therefore, long running processes
usually are subject to memory fragmentation. As a consequence, those
processes need to be restarted from time to time.

Would Python use memory compaction, implementing C extensions would
be much more tedious and error prone and there would be less such
extensions - limiting the domain where Python is used.

One has to choose.


People who do not like the choice of the
"CPython" implementation (no memory compaction) may look at
"Jython" (based on Java, with memory compaction). If they are lucky,
all the extensions they need are available there.

--
https://mail.python.org/mailman/listinfo/python-list
Re: "High water" Memory fragmentation still a thing? [ In reply to ]
On Fri, Oct 10, 2014 at 5:09 PM, dieter <dieter@handshake.de> wrote:
> Python does not use memory compaction but places many objects on the
> heap and therefore, long running processes
> usually are subject to memory fragmentation. As a consequence, those
> processes need to be restarted from time to time.

Pike doesn't use memory compaction either, and also places pretty much
everything on the heap. Yet its processes don't need to be restarted;
in fact, everything's designed around keeping stuff running
permanently. As Emile said, it's not a requirement, it's a
compensation technique. I've yet to build a Python process that runs
for an entire year (as I'm not confident that I can get it so right
that I don't need to update any code), so I can't say for sure that it
would work, but I do have Python processes running for multiple months
without suffering serious fragmentation.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list