Mailing List Archive

Messy children.
Hey,

I'm being a pollutant, sorry :)

Anyway, in our backhand directory (the one containing the arriba file, the
parent domain socket, and child domain sockets), we end up with 24,000
files hanging around, which steadily increases. Does this affect operation
at all? Should backhand be smart enough to ignore these files, or perhaps
remove them if it runs over them?

We're getting thousands of [Mon Apr 23 17:50:33 2001] [error]
mod_backhand: could not get valid connection -- forced local

errors, too. When things seem fine. Anyone else have a magic fix for this
kind of thing?

thanks, sorry for more mail...
-Alan
Messy children. [ In reply to ]
> I'm being a pollutant, sorry :)

Hardly...

> Anyway, in our backhand directory (the one containing the arriba file,
> the
> parent domain socket, and child domain sockets), we end up with 24,000
> files hanging around, which steadily increases. Does this affect
> operation
> at all? Should backhand be smart enough to ignore these files, or
> perhaps
> remove them if it runs over them?

Mod_backhand _should_ clean them up. And it should have no problem
overwriting them. And it definitely won't be confused by them. Any
directory with 24000 files in it will have performance issues on
standard unix filesystems. mod_backhand never does a readdir on this
directory, so if you are using a fast tree/btree/btree* based
filesystem, then you should see no noticable degradation in performance.

I just fixed the "cleaning" up part in CVS.

> We're getting thousands of [Mon Apr 23 17:50:33 2001] [error]
> mod_backhand: could not get valid connection -- forced local

This could be do to a lot of things. It is hard to tell where the
problem is occurring. There could be a client/server comminucation
issue (between the moderator and the children), but on what level that
is occurring is very hard to tell from that error message. What it
means is that the child has selected another server to forward the
request to bu can't get the connection it desires.

Is it because is (1) can't ask the question -- doesn't have a connection
to the moderator (2) gets a jibberish answer from the moderator -- this
"shouldn't" happen too often or (3) gets a connection from the moderator
that is dead.

(1) is a critical failure, (2) and (3) on the other hand will have to
happen several times consecutively in order to induce the beahviour
described. This is bad -- perhaps the applications are advertising
keep-alive sessions and the server is closing the connection? Very very
odd.

> errors, too. When things seem fine. Anyone else have a magic fix for
> this
> kind of thing?

No magic here. BSD is missing a fundamentally useful tool --
strace/truss. ktrace is horrible poking around to see what is going
on. It requires two stages. It would be interesting to see a trace
from one of the "uncooperative" apache children and an lsof and trace of
the moderator as well.

--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
Messy children. [ In reply to ]
> Mod_backhand _should_ clean them up. And it should have no problem
> overwriting them. And it definitely won't be confused by them. Any
> directory with 24000 files in it will have performance issues on
> standard unix filesystems. mod_backhand never does a readdir on this
> directory, so if you are using a fast tree/btree/btree* based
> filesystem, then you should see no noticable degradation in performance.
>
> I just fixed the "cleaning" up part in CVS.

Alright. It was weird so I thought I'd mention it.

> This could be do to a lot of things. It is hard to tell where the
> problem is occurring. There could be a client/server comminucation
> issue (between the moderator and the children), but on what level that
> is occurring is very hard to tell from that error message. What it
> means is that the child has selected another server to forward the
> request to bu can't get the connection it desires.
>
> Is it because is (1) can't ask the question -- doesn't have a connection
> to the moderator (2) gets a jibberish answer from the moderator -- this
> "shouldn't" happen too often or (3) gets a connection from the moderator
> that is dead.
>
> (1) is a critical failure, (2) and (3) on the other hand will have to
> happen several times consecutively in order to induce the beahviour
> described. This is bad -- perhaps the applications are advertising
> keep-alive sessions and the server is closing the connection? Very very
> odd.

[Mon Apr 23 17:46:21 2001] [error] (9)Bad file descriptor: mod_backhand:
MBCSP error (making request)

[dormando@krelian logs]$ grep -c "Bad file descriptor" error_log.backhand
11592

That help too? Also, is there any reason why the moderator isn't logging
anything? It seems to always be running, too, since I can see the pid it's
supposed to be on and see it process things. It just doesn't get anything
particularily useful done...

> No magic here. BSD is missing a fundamentally useful tool --
> strace/truss. ktrace is horrible poking around to see what is going
> on. It requires two stages. It would be interesting to see a trace
> from one of the "uncooperative" apache children and an lsof and trace of
> the moderator as well.

We have truss. No strace. I kept trying to watch a proces die with truss,
but the one I picked to watch would never die, so I got bored.

-Alan
Messy children. [ In reply to ]
> [Mon Apr 23 17:46:21 2001] [error] (9)Bad file descriptor: mod_backhand:
> MBCSP error (making request)
>
> [dormando@krelian logs]$ grep -c "Bad file descriptor"
> error_log.backhand
> 11592
>
> That help too? Also, is there any reason why the moderator isn't logging
> anything? It seems to always be running, too, since I can see the pid
> it's
> supposed to be on and see it process things. It just doesn't get
> anything
> particularily useful done...

The moderator usually doesn't log much (and the PID is in the
error_log)... I will work on a PID file for the moderator -- that makes
perfect sense.

It looks like the Apache child is unable to make a request to the
moderator because the file descriptor it has is bad. This is a _bad_
_bad_ thing. And the process you are tracing aren't dying -- they are
alive and printing these errors :-)

The code should attempt to reconnect to the bparent socket un that error
condition. If you deleted the bparent socket file (in UnixSocketDir) or
it has the wrong permissions (I don't see how that could happen), then
the connection would fail and you could see and error like this.
However, nestled somewhere in the error_log should be a hint as to _why_
the file descriptor is bad. You are seeing the effects of a broken
connection to the moderator, but and looking at the error_log section
when the connection actually broke -- I certainly hope there are error
messages there.

>> No magic here. BSD is missing a fundamentally useful tool --
>> strace/truss. ktrace is horrible poking around to see what is going
>> on. It requires two stages. It would be interesting to see a trace
>> from one of the "uncooperative" apache children and an lsof and trace
>> of
>> the moderator as well.
>
> We have truss. No strace. I kept trying to watch a proces die with
> truss,
> but the one I picked to watch would never die, so I got bored.

truss, on BSD? cool. Where do I get that?

--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
Messy children. [ In reply to ]
--umrsQkkrw7viUWFs
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

> truss, on BSD? cool. Where do I get that?

I believe that it's a part of a stock FreeBSD system:

/usr/bin/truss

-sc

--=20
Sean Chittenden

--umrsQkkrw7viUWFs
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Comment: Sean Chittenden <sean@chittenden.org>

iEYEARECAAYFAjrl11YACgkQn09c7x7d+q3kCQCePuKP6TePWyRJ03b6kc4SqZIJ
nXEAoL5qeBJq1J87ADrktA7I8CDrM/Bu
=2D6L
-----END PGP SIGNATURE-----

--umrsQkkrw7viUWFs--
Messy children. [ In reply to ]
Whee,

Alright... If you would like, there is a large error log of backhand
logging turned onto max we could try to give you. It's about 150 megs
uncompressed, 10 megs compressed, and took thirty minutes to make...

Now, I know that the moderator doesn't log much, but when we have +netall
(or maybe +net1,+net2) it should log at least two or three things (sending
fd, receiving fd, received fd) per backhanded request, but it's not at
all.

The bparent socket looks to stick around fine.... and it _does_ backhand
correctly sometimes. I think.

I know it logs the pid to the error log, and that's what I've been getting
it from, but I wanted to automated the process of knocking its priority
up a bit, but grepping a growing error log isn't very good. I guess it
doesn't matter now though, since that's not even our problem, and the
machine should always be at least 50% idle...

mlee,
-Alan
Messy children. [ In reply to ]
--On mardi 24 avril 2001 01:04 -0400 Theo Schlossnagle <jesus@omniti.com>
wrote:

> BSD is missing a fundamentally useful tool --
> strace/truss. ktrace is horrible poking around to see what is going on.

My friend John Hughes has made strace work on FreeBSD. I believe his
patches made it into the version available at
http://sourceforge.net/projects/strace/
Otherwise let me know and I'll ask him what the status is. It's certainly
running fine here on my FreeBSD 4.x systems.

--
Eric