Mailing List Archive

pthreads
> The current problem:
> If a clamd thread runs into a timeout, it is/should be cancelled by the
> watcher thread using pthread_cancel(3).
> It really is cancelled, but any resources allocated before (open
> directories, strdup()ed pathnames,

Check that a "malloc" is necessary in all cases.

For example, I noticed that "fname" (the full path name) is malloced
(and freed) for each file, in all directory tree walks. This is
unnecessary and expensive. POSIX defines the macro PATH_MAX, which is
the maximum path length for that platform.

So, if you define fname as "char fname[PATH_MAX];" mallocing and freeing
becomes unnecessary and hence the risk of a leak is reduced.

Also, in the same code, the full pathname is build up using :-

sprintf(fname,"%s/%s"....

all "printf" style calls are expensive. If you define "fname" as
PATH_MAX, then (before you start the walking loop) you can prepare the
full path using :-

strcpy(fname,dirname);
strcat(fname,"/");
cp = fname + strlen(fname);

(where cp is "char *"). Then, for each file, all you have to do is :-

strcpy(cp,d->name);

to make "fname" into the full path name. This is much less expensive
than the previous technique and reduces the chances of a leak.

(see my alternative "rmdirs" code from a previous posting).

> file buffers etc) are not free()d,
> resulting in more or less big memleaks.
> SO one can now either manage a list of resources to be free()d/close()d
> and install a cleanup handler for that, or set a thread-global which is
> honored in all the scanning loops to break out and clean up as usual.

This is exactly why I think (and have expressed in this mailing list)
that using threads for this application is not such a good idea.

(see my previous posting for making a more robust "clamd").

> All I have heard about threads (from a programmers pov) is that is
> mainly equal to multiple processes, but global variables are to be
> avoided. And I don't know about variable scope in threads, either :-(

Basic theory (put simply):-

Unix programs are split into three segments, program (binary+libs), data
/ heap (global variables) and stack. When you fork() the parent and
child have a shared program segment, but each process has its own stack
and data segment. However, at the time of the call, the stack and data
segments of the child appear to be a copy of the parent. i.e. their
function call stacks and global data are independent but copies, and
they share the same code.

A thread is almost exactly the same, except the data segments (globals
and heap) are also shared between the parent and child. Hence your
problem, that if a child leaks memory and dies, the leak is inherited by
the parent. Unlike a fork(), where when the child dies the O/S cleans up.

However, note, most modern Unices have a very clever fork() algorithm.
They apply a "copy on write" technique to the actual memory blocks.
Meaning that for a parent and child process, although their global
variables appear to be independant, no memory block are actually copied,
until either process makes a change. This means that a fork() becomes a
much lighter weight call than it used to be. It also means that, in
system terms, a process fork() isn't really any more expensive than a
thread fork(). Although a exec() is still quite expensive.

At least that's my understanding of it.


James
RE: pthreads [ In reply to ]
>For example, I noticed that "fname" (the full path name) is malloced
> (and freed) for each file, in all directory tree walks. This is
> unnecessary and expensive. POSIX defines the macro PATH_MAX, which is
> the maximum path length for that platform.

It always used to be that way. I noticed some one had changed it, I don't
know who or why. Anyway I changed it back a week or two ago. Do you have the
latest development release?

-Nigel