Mailing List Archive

drbd-0.6.1-pre3.tar.gz
Hi,

While this week development moved on at lower speed, we got even more
contributes at the documentation front.

*) By using tiobench you could ebserve that DRBD dropped the connection
without obvious reason. -- It turned out that this was caused by
signals delivered to the applications, while it sleeps in DRBD's
send_msg call. -- This is fixed now for all but the SIGXCPU signal.

*) Brazilian Portuguese translation of the manualpages.
by Cleber Rodrigues Rosa Junior

The current TODO list:

*) Fix the SIGXCPU signal.
*) Have a close look at the postpone packets -- I got the impression that
they do not work at all currently.
*) call fsync after writing the state files
*) tests of primary <-> secondary translations immediately followed by
write accesses. (inspired by Eduard Guzovsky's bug reports/fixes)
*) Ensure that it compiles/works on 2.2.x kernels.
*) Fix the build system:
.) Missing parts
.) Make documentation building a seperate target
.) Maybe use new build system by Cleber Rodrigues Rosa Junior

PS: While working with protocol C it turned out that a tl-size of 4096 is
needed with Linux-2.4.x ...

-Philipp
RE: drbd-0.6.1-pre3.tar.gz [ In reply to ]
>
> ...
> *) Have a close look at the postpone packets -- I got the
> impression that
> they do not work at all currently.
>
> ...

I am testing DRBD with protocol C on 2.2.14 Linux. In my test
environment most of the server disk I/O is initiated by NFS
clients. I used to have a lot of "postpone packets" that significantly
reduced I/O performance. I was able to completely eliminate
"postpone packets" by the following change


*** drbd_receiver.c 2001/09/24 22:28:44 1.1
--- drbd_receiver.c 2001/09/24 23:55:34 1.2
***************
*** 592,598 ****
*/
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,0)
if (drbd_conf[minor].conf.wire_protocol == DRBD_PROT_C) {
! if (drbd_conf[minor].unacked_cnt >= (NR_REQUEST / 4)) {
run_task_queue(&tq_disk);
}
}
--- 592,599 ----
*/
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,0)
if (drbd_conf[minor].conf.wire_protocol == DRBD_PROT_C) {
! //if (drbd_conf[minor].unacked_cnt >= (NR_REQUEST / 4)) {
! if (drbd_conf[minor].unacked_cnt >= 1) {
run_task_queue(&tq_disk);
}
}

This is a "quick and dirty" fix - there must be a more elegant way
to do it, but it solved my immediate problem.

My gut feeling is that DRBD should not need postpone packets at all
as they could mask some underlying problems. Why should disk I/O
take more than several seconds to complete ?

Thanks,

-Ed

> -Philipp
>
>
>
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> https://lists.sourceforge.net/lists/listinfo/drbd-devel
>
Re: drbd-0.6.1-pre3.tar.gz [ In reply to ]
* Guzovsky, Eduard <EGuzovsky@example.com> [011002 02:17]:
>
> >
> > ...
> > *) Have a close look at the postpone packets -- I got the
> > impression that
> > they do not work at all currently.
> >
> > ...
>
> I am testing DRBD with protocol C on 2.2.14 Linux. In my test
> environment most of the server disk I/O is initiated by NFS
> clients. I used to have a lot of "postpone packets" that significantly
> reduced I/O performance. I was able to completely eliminate
> "postpone packets" by the following change
>
>
> *** drbd_receiver.c 2001/09/24 22:28:44 1.1
> --- drbd_receiver.c 2001/09/24 23:55:34 1.2
> ***************
> *** 592,598 ****
> */
> #if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,0)
> if (drbd_conf[minor].conf.wire_protocol == DRBD_PROT_C) {
> ! if (drbd_conf[minor].unacked_cnt >= (NR_REQUEST / 4)) {
> run_task_queue(&tq_disk);
> }
> }
> --- 592,599 ----
> */
> #if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,0)
> if (drbd_conf[minor].conf.wire_protocol == DRBD_PROT_C) {
> ! //if (drbd_conf[minor].unacked_cnt >= (NR_REQUEST / 4)) {
> ! if (drbd_conf[minor].unacked_cnt >= 1) {
> run_task_queue(&tq_disk);
> }
> }
>
> This is a "quick and dirty" fix - there must be a more elegant way
> to do it, but it solved my immediate problem.

I like this :) -- Heh. This ensures best performace for protocol C, but
reduces the usefullnes of the disk scheduler. -- But probabely overall
it improves your situation. -- Nice.

> My gut feeling is that DRBD should not need postpone packets at all
> as they could mask some underlying problems. Why should disk I/O
> take more than several seconds to complete ?

Unfortunately it is possible. E.g. configure two DRBD-devices on top
of two partitions which are on _ONE_ physical disk.


Node1: Node2:
drbd0 (on hda1) drbd0 (on hda1)
drbd1 (on hda2) drbd1 (on hda2)

Scenario 1
drbd0@Node1 is primary and drbd0s are connected.
drbd1 are not set up.
Now run tiobench or bonnie on drbd0@Node1 and
tiobench or bonnie on hda2@Node2 and now try to run "ls" on
a third partition on hda. -- You get times of up to minutes for
a single block access.

scenation 2
drbd0@Node1 is primary and drbd0s are connected.
drbd1@Node2 is primary and drbd1s are connected.
Run a benchmark on the primary devices.

PS: Good to hear that the postpones are at least somehow working on 2.2.x,
on 2.4.x I have not seen the message about a postpone packet for quite
some time now.

-Philipp