Mailing List Archive

Python script seems to stop running when handling very large dataset
Python script works well, but seems to stop running at a certain point when
handling very large dataset.

Can anyone shed light on this?

Regards, David
--
https://mail.python.org/mailman/listinfo/python-list
Re: Python script seems to stop running when handling very large dataset [ In reply to ]
On 30/10/2021 11.42, Shaozhong SHI wrote:
> Python script works well, but seems to stop running at a certain point when
> handling very large dataset.
>
> Can anyone shed light on this?

Storage space?
Taking time to load/format/process data-set?

--
Regards,
=dn
--
https://mail.python.org/mailman/listinfo/python-list
Re: Python script seems to stop running when handling very large dataset [ In reply to ]
On Fri, Oct 29, 2021 at 4:04 PM dn via Python-list <python-list@python.org>
wrote:

> On 30/10/2021 11.42, Shaozhong SHI wrote:
> > Python script works well, but seems to stop running at a certain point
> when
> > handling very large dataset.
> >
> > Can anyone shed light on this?
>
> Storage space?
> Taking time to load/format/process data-set?
>

It could be many things.

What operating system are you on?

If you're on Linux, you can use strace to attach to a running process to
see what it's up to. Check out the -p option. See
https://stromberg.dnsalias.org/~strombrg/debugging-with-syscall-tracers.html

macOS has dtruss, but it's a little hard to enable. dtruss is similar to
strace.

Both of these tools are better for processes doing system calls (kernel
interactions). They do not help nearly as much with CPU-bound processes.

It could also be that you're running out of virtual memory, and the
system's virtual memory system is thrashing.

Does the load average on the system go up significantly when the process
seems to get stuck?

You could try attaching to the process with a debugger, too, EG with pudb:
https://github.com/inducer/pudb/issues/31

Barring those, you could sprinkle some print statements in your code, to
see where it's getting stuck. This tends to be an iterative process, where
you add some prints, run, observe the result, and repeat.

HTH.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Python script seems to stop running when handling very large dataset [ In reply to ]
With so little information provided, not much light will be shed. When
it stops running, are there any errors? How is the dataset being
processed? How large is the dataset? How large a dataset can be
successfully processed? What libraries are being used? What version of
Python are you using? On what operating system? With how much memory?
With how much disk space is used? How much is free? Are you processing
files or using a database? If the latter, what database? Does it write
intermediate files during processing? Can you monitor memory usage
during processing (e.g. with a system monitor) to see how much memory
is consumed?


On Fri, 2021-10-29 at 23:42 +0100, Shaozhong SHI wrote:
> Python script works well, but seems to stop running at a certain
> point when
> handling very large dataset.
>
> Can anyone shed light on this?
>
> Regards, David

--
https://mail.python.org/mailman/listinfo/python-list
Re: Python script seems to stop running when handling very large dataset [ In reply to ]
On 2021-10-29, Shaozhong SHI <shishaozhong@gmail.com> wrote:
> Python script works well, but seems to stop running at a certain point when
> handling very large dataset.
>
> Can anyone shed light on this?

No.

Nobody can help you with the amount of information you have provided.

--
Grant



--
https://mail.python.org/mailman/listinfo/python-list
Re: Python script seems to stop running when handling very large dataset [ In reply to ]
Shaozhong SHI wrote at 2021-10-29 23:42 +0100:
>Python script works well, but seems to stop running at a certain point when
>handling very large dataset.
>
>Can anyone shed light on this?

Some algorithms have non linear runtime.


For example, it is quite easy to write code with
quadratic runtime in Python:
s = ""
for x in ...: s += f(x)
You will see the problem only for large data sets.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Python script seems to stop running when handling very large dataset [ In reply to ]
On Saturday, 30 October 2021, Dieter Maurer <dieter@handshake.de> wrote:

> Shaozhong SHI wrote at 2021-10-29 23:42 +0100:
> >Python script works well, but seems to stop running at a certain point
> when
> >handling very large dataset.
> >
> >Can anyone shed light on this?
>
> Some algorithms have non linear runtime.
>
>
> For example, it is quite easy to write code with
> quadratic runtime in Python:
> s = ""
> for x in ...: s += f(x)
> You will see the problem only for large data sets.
>

Has anyone compared this with iterrow? which looping option is faster?
Regards, David
--
https://mail.python.org/mailman/listinfo/python-list