Mailing List Archive

Log rotation problems
Hello!

When log file is rotated by newsyslog (FreeBSD) Quagga 0.96.2 stops
writing new messages to file until restart. Same problem is with
original zebra and IPInfusion's ZebOS. I think this is a bug and it
should be fixed.

Yours truly,
Boris Kovalenko
Log rotation problems [ In reply to ]
Hello!

When log file is rotated by newsyslog (FreeBSD) Quagga 0.96.2 stops
writing new messages to file until restart. Same problem is with
original zebra and IPInfusion's ZebOS. I think this is a bug and it
should be fixed.

Yours truly,
Boris Kovalenko
Log
Re: Log rotation problems [ In reply to ]
On Thu, 2003-08-28 at 01:50, Boris Kovalenko wrote:
> When log file is rotated by newsyslog (FreeBSD) Quagga 0.96.2 stops
> writing new messages to file until restart. Same problem is with
> original zebra and IPInfusion's ZebOS. I think this is a bug and it
> should be fixed.

It might surprise you to learn that zebra is indeed still writing to the
log file, and that is the cause of your trouble.

You see, when files on unix are renamed, and running processes have file
descriptors attached to those files, the file descriptors continue to
access the same file, which now simply has a new name. File descriptors
are attached to inodes on the filesystem, and the inodes are linked into
the directory tree via dentries. So the impact of newsyslog's rename(2)
call is to add a new dentry called something like zebra.log.yesterday,
and remove the old dentry zebra.log. Each inode on a filesystem has a
unique number, and you can use `ls -i` or `ls --inode` to view them.

bordercore1:~# ls -li /var/log/zebra/zebra.log
32344 -rw-r----- 1 root root 119 Apr 11 16:49
/var/log/zebra/zebra.log
bordercore1:~# mv /var/log/zebra/zebra.log /var/log/zebra/demonstration
bordercore1:~# ls -li /var/log/zebra/demonstration
32344 -rw-r----- 1 root root 119 Apr 11 16:49
/var/log/zebra/demonstration

As you can see above, the file that is called /var/log/zebra/zebra.log
has inode number 32344. However, I renamed it, and it retained that very
same inode number, yet had a different name. This will always be the
case when you rename files -within the same filesystem-.

What if the files are on different filesystems, you ask? Well, in that
case, the user-space program typically creates new.file on the target
filesystem; reads data from old.file; writes it into new.file; and
finally, unlink(2) is called to remove the dentry for old.file.

But, wait! Each inode has a -reference count- as well! You see, the same
file, or inode, can have multiple dentries in a filesystem, meaning the
same file actually has more than one name!

bordercore1:~# ls -li /var/log/zebra/zebra.log
32344 -rw-r----- 1 root root 119 Apr 11 16:49
bordercore1:~# ln /var/log/zebra/zebra.log /var/log/zebra/hard.link
bordercore1:~# ls -li /var/log/zebra/hard.link
32344 -rw-r----- 2 root root 119 Apr 11 16:49
/var/log/zebra/hard.link
bordercore1:~# ls -li /var/log/zebra/zebra.log
32344 -rw-r----- 2 root root 119 Apr 11 16:49
/var/log/zebra/zebra.log
bordercore1:~# rm /var/log/zebra/hard.link

Above you can see that I create an additional dentry for inode 32344
called hard.link, and view its new dentry using our old friend, ls(1).
Notice the numeric value in the third column of ls(1) output? In our
earlier example, and for the huge majority of dentries you examine, you
will see the numeral 1 in that column, indicating that the underlying
inode has a dentry reference count of 1. Once we add the new "hard link"
(see the ln(1) man page for "hard link", which is quite different than a
symbolic link), the displayed reference count for that inode is 2. This
is true for both dentries, since the reference count has to do with the
underlying inode.

Why is the reference count important? Well, the kernel maintains another
reference count for all open inodes (or "open files"). If you delete all
the filesystem dentries to inode 32344, /var/log/zebra/zebra.log, but
zebra is still running with that file opened, the filesystem will not
"delete" that file until the zebra process has closed the file. You can
see here that my zebra process still has inode 32344 opened. lsof(8)
even looks into the kernel's dcache (let's not talk about that) to find
out the name of the file for you. Zebra can still write to the file even
if you unlink(2) all its dentries, because the underlying file, at inode
32344, won't be free'd until zebra closes the file. If you "deleted" the
dentry and re-ran the below lsof command, you would still receive an
inode number and device number (3,7 is my /var filesystem); but you
would not see any pathname information because the dentries are gone!

bordercore1:~# lsof -p `cat /var/run/zebra.pid` |grep 32344
zebra 163 root 20w REG 3,7 4466 32344
/var/log/zebra/zebra.log

Even if newsyslog rotates the file to a different filesystem, it doesn't
give you the desired effect.

Why have I written such a long-winded email explaining the process in
intimate detail? I have some experience in web access.log rotation and
analysis, and I can tell you, it is desirable to "rotate" your log files
without sending your web server a costly SIGHUP / SIGUSR1 / SIGWHATEVER.
Our solution to this is to rely on the kernel's guarantee that write(2)
operations are atomic, and that our web server only writes whole "lines"
to the files. Given all this, you can copy all the data from the log
file into a new file (or a pipe to gzip, etc.), then truncate(2) the
access.log file to 0 length. It's possible that you can lose a tiny
number of lines by doing this (due to delays between your last read(2)
and your truncate(2), as the web server just *might* get a write(2) in),
but we treat this as an acceptable loss. The whole operation takes place
using a mechanism that keeps the same inode number for access.log, which
means the web server (I could just start saying "apache") never needs to
close and re-open the file. Amazingly, if the kernel had a *truncate*(2)
function which allowed you to specify a quantity of bytes you wanted to
eliminate from the beginning of the file, and took care of re-organizing
the file in-kernel, you would have a 100% safe mechanism. There are ISPs
who have done this to their open-source kernels, all to make `mv
access.log access.log.yesterday; kill -USR1 `cat web.server.pid` clean!

I'd be willing to bet that your zebra log files are less sensitive, in a
monetary respect, than web server log files; so you might find the same
"log rotation" solution useful in your environment.

Of course, if you prefer to simply force zebra to close-and-reopen all
its log files, that is an option as well.

I hope this post has been informative, and perhaps even a nice change
from the "zebra vs zebra" posts of late :-)

--
Jeff S Wheeler <jsw@five-elements.com>

P.S. Please make the mailing list set a Reply-To: header.
Re: Log rotation problems [ In reply to ]
<cut>
> --
> Jeff S Wheeler <jsw@five-elements.com>
>
> P.S. Please make the mailing list set a Reply-To: header.
<cut>

Umm... I would welcome this as well.

--
Theodor Milkov Head Network Administrator
Davidov Net Phone: +359 (2) 730158
Re: Log rotation problems [ In reply to ]
Hello!

Very nice and informative post :) But I know all You wrote. Also I
know that the same problem are present in apache for example and I
should killhup it to restart logs. But with apache killhup is
_documented_ and I have not found what happens if I killhup zebra. Also,
why to keep log files open permanenly? Why to not do open write close
sequence? So, if killhup is sufficient to restart logs please document
this. And what should I killhup? Only zebra daemon or each running?

Yours truly,
Boris Kovalenko
Re: Log rotation problems [ In reply to ]
On Thu, 28 Aug 2003, Boris Kovalenko wrote:

> Hello!
>
> Very nice and informative post :) But I know all You wrote. Also I
> know that the same problem are present in apache for example and I
> should killhup it to restart logs. But with apache killhup is
> _documented_ and I have not found what happens if I killhup zebra. Also,
> why to keep log files open permanenly? Why to not do open write close
> sequence? So, if killhup is sufficient to restart logs please document
> this. And what should I killhup? Only zebra daemon or each running?
>
> Yours truly,
> Boris Kovalenko

"open write close" would bring even a FAST machine to its knees if you
turned on debug logging and then brought up a peering session with say 30K
routes. It would be a sloppy, no-clue way to code it.




--
John Fraizer | High-Security Datacenter Services |
President | Dedicated circuits 64k - 155M OC3 |
EnterZone, Inc | Virtual, Dedicated, Colocation |
http://www.enterzone.net/ | Network Consulting Services |
Re: Log rotation problems [ In reply to ]
John Fraizer wrote:

>On Thu, 28 Aug 2003, Boris Kovalenko wrote:
>
>
>
>>Hello!
>>
>> Very nice and informative post :) But I know all You wrote. Also I
>>know that the same problem are present in apache for example and I
>>should killhup it to restart logs. But with apache killhup is
>>_documented_ and I have not found what happens if I killhup zebra. Also,
>>why to keep log files open permanenly? Why to not do open write close
>>sequence? So, if killhup is sufficient to restart logs please document
>>this. And what should I killhup? Only zebra daemon or each running?
>>
>>Yours truly,
>> Boris Kovalenko
>>
>>
>
>"open write close" would bring even a FAST machine to its knees if you
>turned on debug logging and then brought up a peering session with say 30K
>routes. It would be a sloppy, no-clue way to code it.
>
>
>
>
>
>
I haven't looked at the source in ages due to the reasons why this list
came into existance.

Looking at them again USR1 to every daemon should rotate logs and logs
only exactly as per the common BSD semantics.

If it doesn't - shout. I will look at it anyway tonight (in my so called
free time).

Cheers,
A.
Re: Log rotation problems [ In reply to ]
On Thu, 28 Aug 2003, Boris Kovalenko wrote:

> Hello!
>
> why to keep log files open permanenly? Why to not do open write close
> sequence?

yes, that's the best way to write log files.

> So, if killhup is sufficient to restart logs please document this.
> And what should I killhup? Only zebra daemon or each running?

each daemon (you wish to have reopen its log file).

> Yours truly,
> Boris Kovalenko

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
"I'm not stupid, I'm not expendable, and I'M NOT GOING!"
Re: Log rotation problems [ In reply to ]
On Thu, 28 Aug 2003, Paul Jakma wrote:

> each daemon (you wish to have reopen its log file).

urg. SIGUSR1 as someone else pointed out.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Overflow on /dev/null, please empty the bit bucket.
Re: Log rotation problems [ In reply to ]
Paul Jakma wrote:

>On Thu, 28 Aug 2003, Boris Kovalenko wrote:
>
>
>
>>Hello!
>>
>>why to keep log files open permanenly? Why to not do open write close
>>sequence?
>>
>>
>
>yes, that's the best way to write log files.
>
>
>
>>So, if killhup is sufficient to restart logs please document this.
>>And what should I killhup? Only zebra daemon or each running?
>>
>>
>
>each daemon (you wish to have reopen its log file).
>
1. USR1, killhup will reload configs and you do not really want to do
this :-)
2. SIGPIPE is set to ignore and I do not see any checks for file
permissions or links in the log file opening and reopening bit, which
should allow you to put a fifo in place of the log file and rotate logs
in whatever reads off that log file at leasure. On whatever single
signal you would like to use to hit the thing that will read off the fifo.

Cheers,

>
>
>
>>Yours truly,
>> Boris Kovalenko
>>
>>
>
>regards,
>
>


--

A. R. Ivanov
E-mail mailto:arivanov@sigsegv.cx
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov <arivanov@sigsegv.cx>
Fingerprint: C824 CBD7 EE4B D7F8 5331 89D5 FCDA 572E DDE5 E715
pub 2048R/DB33CE33 1999-01-05 Anton Ivanov <anton.ivanov@level3.com>
Fingerprint: 11 2C 68 F3 79 58 58 58 56 F5 94 E6 F3 2B 3B 7A
Re: Log rotation problems [ In reply to ]
On Thu, 28 Aug 2003, Anton Ivanov wrote:

> 1. USR1, killhup will reload configs and you do not really want to do
> this :-)

yes USR1 :) - and most (all?) daemons do not support conf file
rereading.

> 2. SIGPIPE is set to ignore and I do not see any checks for file
> permissions or links in the log file opening and reopening bit,
> which should allow you to put a fifo in place of the log file and
> rotate logs in whatever reads off that log file at leasure. On
> whatever single signal you would like to use to hit the thing that
> will read off the fifo.

neat.

> Cheers,

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Divorce is a game played by lawyers.
-- Cary Grant