Mailing List Archive: parallel processes fail at startup when clamd is running

parallel processes fail at startup when clamd is running

Nov 28, 2022, 5:48 AM

Post #1 of 9 (630 views)

We are experiencing a large number of MPI jobs failing indicating the fabric is unavailable when the scans are running. Early in the investigation so not sure if locking, timing, response time or other factors are involved, but I wanted to ask a quick gernal question to see if this is a known issue with easy answers. If not, we will post more detailed information as it is determined.

Re: parallel processes fail at startup when clamd is running [ In reply to ]

clamav-users at lists

Nov 28, 2022, 6:53 AM

Post #2 of 9 (629 views)

Permalink

Hi there,

On Mon, 28 Nov 2022, JOHN URBAN via clamav-users wrote:

> We are experiencing a large number of MPI jobs failing indicating
> the fabric is unavailable when the scans are running. Early in the
> investigation so not sure if locking, timing, response time or other
> factors are involved, but I wanted to ask a quick gernal question to
> see if this is a known issue with easy answers. If not, we will post
> more detailed information as it is determined.

More information would probably help. Please could you clarify why in
your subject you write "when clamd is running", yet in the message you
write "when the scans are running"? Even if it's running, clamd might
not be scanning anything but if it's loaded the official signatures it
will still probably be using a gigabyte or so of RAM, while it's doing
nothing but wait for a client connection.

MPI doesn't figure large in the ClamAV mailing list archives, and MPI
together with ClamAV was equally unrewarding. The old ClamAV Bugzilla
seems to be broken (at least for searches) and the Cisco/Talos ClamAV
Github issues

https://github.com/Cisco-Talos/clamav/search?q=MPI&type=

gave me no results. The closest I could get in my searching was [*]:

https://marc.info/?l=clamav-users&m=128309131408757&w=2

I found it by grepping my local mail archive directory, then perusing
my favourite mail archiver. It's a very old post but even so it might
be helpful.

What are you actually doing with ClamAV?

[*] Sorry for those who don't care for the MARC archive, but it seems
that Pipermail goes back only as far as February 2014. :/

--

73,
Ged.
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users

Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat

Re: parallel processes fail at startup when clamd is running [ In reply to ]

clamav-users at lists

Nov 28, 2022, 8:23 AM

Post #3 of 9 (628 views)

Permalink

On Mon, 28 Nov 2022, JOHN URBAN via clamav-users wrote:

> We are experiencing a large number of MPI jobs failing indicating
> the fabric is unavailable when the scans are running. Early in the
> investigation so not sure if locking, timing, response time or other
> factors are involved, but I wanted to ask a quick gernal question to
> see if this is a known issue with easy answers. If not, we will post
> more detailed information as it is determined.

Not an issue that I am familiar with.

Are the MPI jobs related to clamav, or just running
on a system with clamav ?

Is clamav doing on-access analysis ? If so I wonder whether it is
attempting to access the same file, or worse same file-handle, for
each mpi thread, simultaneously.

If I remember correctly "fabric" can be a technical term to do with
message passing, parallelism and networking.
Is that how you are using it ?

--
Andrew C. Aitchison Kendal, UK
andrew@aitchison.me.uk
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users

Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat

Re: parallel processes fail at startup when clamd is running [ In reply to ]

clamav-users at lists

Nov 28, 2022, 6:24 PM

Post #4 of 9 (627 views)

Permalink

> On 11/28/2022 11:23 AM Andrew C Aitchison via clamav-users <clamav-users@lists.clamav.net> wrote:
>
>
> On Mon, 28 Nov 2022, JOHN URBAN via clamav-users wrote:
>
> > We are experiencing a large number of MPI jobs failing indicating
> > the fabric is unavailable when the scans are running. Early in the
> > investigation so not sure if locking, timing, response time or other
> > factors are involved, but I wanted to ask a quick gernal question to
> > see if this is a known issue with easy answers. If not, we will post
> > more detailed information as it is determined.
>
> Not an issue that I am familiar with.
>
> Are the MPI jobs related to clamav, or just running
> on a system with clamav ?
>
> Is clamav doing on-access analysis ? If so I wonder whether it is
> attempting to access the same file, or worse same file-handle, for
> each mpi thread, simultaneously.
>
> If I remember correctly "fabric" can be a technical term to do with
> message passing, parallelism and networking.
> Is that how you are using it ?
>
> --
> Andrew C. Aitchison Kendal, UK
> andrew@aitchison.me.uk
> _______________________________________________
>
> Manage your clamav-users mailing list subscription / unsubscribe:
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/Cisco-Talos/clamav-documentation
>
> https://docs.clamav.net/#mailing-lists-and-chat

Yes; it covers all the parts to create a network connection, simplest definition is probably "The communications network MPI constructs either by itself or using a daemon". So it covers if you are using IB, ethernet, and so on in particular. In this case they are Infiniband connections using an OFA layer. A connection can include the authentication method and process, the hardware used to pass messages, the protocol used, which libraries and even which compiler was used in the most general usage.
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users

Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat

Re: parallel processes fail at startup when clamd is running [ In reply to ]

clamav-users at lists

Nov 28, 2022, 6:30 PM

Post #5 of 9 (627 views)

Permalink

Doing a scan of the entire locally attached storage on Linux nodes, including /tmp and /var; and the problem is basically that MPI programs trying to launch while that full scan is running fail to start up. Once the programs start they do not commonly fail; but a very high number of jobs trying to start up when the scan is progress fail to start properly. Memory is not a problem; all nodes have >128GB of memory.

> On 11/28/2022 9:53 AM G.W. Haywood via clamav-users <clamav-users@lists.clamav.net> wrote:
>
>
> Hi there,
>
> On Mon, 28 Nov 2022, JOHN URBAN via clamav-users wrote:
>
> > We are experiencing a large number of MPI jobs failing indicating
> > the fabric is unavailable when the scans are running. Early in the
> > investigation so not sure if locking, timing, response time or other
> > factors are involved, but I wanted to ask a quick gernal question to
> > see if this is a known issue with easy answers. If not, we will post
> > more detailed information as it is determined.
>
> More information would probably help. Please could you clarify why in
> your subject you write "when clamd is running", yet in the message you
> write "when the scans are running"? Even if it's running, clamd might
> not be scanning anything but if it's loaded the official signatures it
> will still probably be using a gigabyte or so of RAM, while it's doing
> nothing but wait for a client connection.
>
> MPI doesn't figure large in the ClamAV mailing list archives, and MPI
> together with ClamAV was equally unrewarding. The old ClamAV Bugzilla
> seems to be broken (at least for searches) and the Cisco/Talos ClamAV
> Github issues
>
> https://github.com/Cisco-Talos/clamav/search?q=MPI&type=
>
> gave me no results. The closest I could get in my searching was [*]:
>
> https://marc.info/?l=clamav-users&m=128309131408757&w=2
>
> I found it by grepping my local mail archive directory, then perusing
> my favourite mail archiver. It's a very old post but even so it might
> be helpful.
>
> What are you actually doing with ClamAV?
>
> [*] Sorry for those who don't care for the MARC archive, but it seems
> that Pipermail goes back only as far as February 2014. :/
>
> --
>
> 73,
> Ged.
> _______________________________________________
>
> Manage your clamav-users mailing list subscription / unsubscribe:
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/Cisco-Talos/clamav-documentation
>
> https://docs.clamav.net/#mailing-lists-and-chat
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users

Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat

Re: [ext] Re: parallel processes fail at startup when clamd is running [ In reply to ]

clamav-users at lists

Nov 28, 2022, 11:31 PM

Post #6 of 9 (626 views)

Permalink

* JOHN URBAN via clamav-users <clamav-users@lists.clamav.net>:

> Doing a scan of the entire locally attached storage on Linux nodes,
> including /tmp and /var; and the problem is basically that MPI
> programs trying to launch while that full scan is running fail to
> start up. Once the programs start they do not commonly fail; but a
> very high number of jobs trying to start up when the scan is progress
> fail to start properly. Memory is not a problem; all nodes have >128GB
> of memory.

Since it's so easy to reproduce, why not start those programs using
strace to see which syscalls are failing:

strace --failed-only $program

--
Ralf Hildebrandt
Charité - Universitätsmedizin Berlin
Geschäftsbereich IT | Abteilung Netzwerk

Campus Benjamin Franklin (CBF)
Haus I | 1. OG | Raum 105
Hindenburgdamm 30 | D-12203 Berlin

Tel. +49 30 450 570 155
ralf.hildebrandt@charite.de
https://www.charite.de
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users

Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat

Re: parallel processes fail at startup when clamd is running [ In reply to ]

clamav-users at lists

Nov 29, 2022, 1:58 AM

Post #7 of 9 (626 views)

Permalink

Hi there,

On Mon, 28 Nov 2022, JOHN URBAN via clamav-users wrote:

> Doing a scan of the entire locally attached storage on Linux nodes,

Seems likely that this is just a resource exhaustion problem.

> including /tmp and /var; ...

Probably a bad idea. Recursion in /tmp? Try it without these two,
then, er, maybe bisect.

FWIW I never scan the local filesystem. If it's compromised, what's
the point? If it isn't, what's the point? That doesn't mean that I
won't scan local files of course - but that's different and I'd have
very specific reasons for doing it.

--

73,
Ged.
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users

Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat

Re: [ext] Re: parallel processes fail at startup when clamd is running [ In reply to ]

clamav-users at lists

Nov 29, 2022, 5:13 AM

Post #8 of 9 (626 views)

Permalink

Not quite as easy to set up as I made it sound, as lots of pieces and people involved but that is exactly one of the tests we hope to run today; thanks!

> On 11/29/2022 2:31 AM Ralf Hildebrandt via clamav-users <clamav-users@lists.clamav.net> wrote:
>
>
> * JOHN URBAN via clamav-users <clamav-users@lists.clamav.net>:
>
> > Doing a scan of the entire locally attached storage on Linux nodes,
> > including /tmp and /var; and the problem is basically that MPI
> > programs trying to launch while that full scan is running fail to
> > start up. Once the programs start they do not commonly fail; but a
> > very high number of jobs trying to start up when the scan is progress
> > fail to start properly. Memory is not a problem; all nodes have >128GB
> > of memory.
>
> Since it's so easy to reproduce, why not start those programs using
> strace to see which syscalls are failing:
>
> strace --failed-only $program
>
> --
> Ralf Hildebrandt
> Charité - Universitätsmedizin Berlin
> Geschäftsbereich IT | Abteilung Netzwerk
>
> Campus Benjamin Franklin (CBF)
> Haus I | 1. OG | Raum 105
> Hindenburgdamm 30 | D-12203 Berlin
>
> Tel. +49 30 450 570 155
> ralf.hildebrandt@charite.de
> https://www.charite.de
> _______________________________________________
>
> Manage your clamav-users mailing list subscription / unsubscribe:
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/Cisco-Talos/clamav-documentation
>
> https://docs.clamav.net/#mailing-lists-and-chat
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users

Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat

Re: [ext] Re: parallel processes fail at startup when clamd is running [ In reply to ]

clamav-users at lists

Nov 29, 2022, 5:29 AM

Post #9 of 9 (626 views)

Permalink

* JOHN URBAN <urbanjost@comcast.net>:
> Not quite as easy to set up as I made it sound, as lots of pieces and people involved but that is exactly one of the tests we hope to run today; thanks!

Yes, ths sounds like hours of fun :/
But the insight gained will be rewarding :)
--
Ralf Hildebrandt
Charité - Universitätsmedizin Berlin
Geschäftsbereich IT | Abteilung Netzwerk

Campus Benjamin Franklin (CBF)
Haus I | 1. OG | Raum 105
Hindenburgdamm 30 | D-12203 Berlin

Tel. +49 30 450 570 155
ralf.hildebrandt@charite.de
https://www.charite.de
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users

Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat