Mailing List Archive

NIC cries when using testLVS
Hi!

I think this is a bit Off-Topic, but maybe someone experienced the
same problems using testLVS:

I am using a 3c905 - TX - M on a Pentium 3. I connect a 100MBps
line to it and simulate 100.000 different clients with testLVS.
I'm using the newest kernel and LVS version 1.0.0.

At the moment I start testLVS on the client, I can't use the
redirector any more. Typing the command "ipvsadm" takes 5 minutes
(really!), but it shows that all 100.000 connections are
established. The kernel spits warnings that the interrupt is
very busy. xload shows over 600% load.

Even with a new driver there wasn't big difference.

On the cache however, we experience around 40.000 packets per
second.

This is poor, isn't it?

I didn't think that it's so easy to push a P3 to the limits.
But I think it isn't the processor, it is the NIC that puts
all the work on the processor.

Any comments?




Thomas

--
In a world without walls and fences - who needs windows and gates?
Re: NIC cries when using testLVS [ In reply to ]
> I didn't think that it's so easy to push a P3 to the limits.
> But I think it isn't the processor, it is the NIC that puts
> all the work on the processor.

What driver and card revision are you using? 3c59x, or the 3c90x from
3com? I've noticed that 3c59x chokes on 3c905C cards at high load -- you
end up getting a TON of collisions -- and that the 3c90x driver relieves
this problem. Maybe the problem is related?

Thanks,

Kyle Sparger
Re: NIC cries when using testLVS [ In reply to ]
Hi!

> redirector any more. Typing the command "ipvsadm" takes 5 minutes
> (really!), but it shows that all 100.000 connections are
> established. The kernel spits warnings that the interrupt is
> very busy. xload shows over 600% load.

Oh, yes, I forgot: on a P1 with 120 Mhz and some No-name 10MBps
card, I reached a limit of 75.000 established connections with
a processor load of 70%.



Thomas


--
In a world without walls and fences - who needs windows and gates?
Re: NIC cries when using testLVS [ In reply to ]
Hi!

> What driver and card revision are you using? 3c59x,

I used that but it seemed to do unpredictable under high load (*).
And, it threw the kernel messages.
So, I searched an update and found the 3com site:

> or the 3c90x from

This works more stable, but the throughput is far under my
expectations.

> 3com? I've noticed that 3c59x chokes on 3c905C cards at high load -- you
> end up getting a TON of collisions -- and that the 3c90x driver relieves
> this problem. Maybe the problem is related?

I think so. How do you find out the number of collisions?

BTW (*): I seem to have problems (at least with the 3c59x driver)
that LVS sometimes doesn't forward ANYTHING. This is an error
that occurs not very often, but it seems as if LVS isn't running
at all.

I break the testLVS, wait 2 minutes, restart LVS, restart testLVS
and everything works fine. I have no idea, if it's a problem
of the card, the driver, ipchains or LVS :-(



Thomas


--
In a world without walls and fences - who needs windows and gates?
Re: NIC cries when using testLVS [ In reply to ]
> I think so. How do you find out the number of collisions?

ifconfig ethX will show collisions the driver noticed; sometimes, you
also have to look at it from the other end to see all of them.

Kyle Sparger
Re: NIC cries when using testLVS [ In reply to ]
Kyle Sparger wrote:
>
> > I didn't think that it's so easy to push a P3 to the limits.
> > But I think it isn't the processor, it is the NIC that puts
> > all the work on the processor.
>
> What driver and card revision are you using? 3c59x, or the 3c90x from
> 3com? I've noticed that 3c59x chokes on 3c905C cards at high load -- you
> end up getting a TON of collisions -- and that the 3c90x driver relieves
> this problem. Maybe the problem is related?

Please try it again with this driver and see if it works better:

http://www.uow.edu.au/~andrewm/linux/#3c59x-bc

Or try to explain you problems (dmesg, kernlog excerpt, ...)
Best regards,
Roberto Nibali, ratz

--
mailto: `echo NrOatSz@tPacA.cMh | sed 's/[NOSPAM]//g'`
Re: NIC cries when using testLVS [ In reply to ]
Hi Thomas,

I already tried twice to reply to the list but it seems that
there has been changed something on the mailserver. My mails
don't get through.

Thomas Proell wrote:
>
> Hi!
>
> I think this is a bit Off-Topic, but maybe someone experienced the
> same problems using testLVS:
>
> I am using a 3c905 - TX - M on a Pentium 3. I connect a 100MBps
> line to it and simulate 100.000 different clients with testLVS.
> I'm using the newest kernel and LVS version 1.0.0.

Get the updated (for speed) driver from:
http://www.uow.edu.au/~andrewm/linux/#3c59x-bc

if this doesn't work, implement TCP zero-copy ;)

> At the moment I start testLVS on the client, I can't use the
> redirector any more. Typing the command "ipvsadm" takes 5 minutes
> (really!), but it shows that all 100.000 connections are
> established. The kernel spits warnings that the interrupt is
> very busy. xload shows over 600% load.

Oh, wait a minute, I had this too, what exactly does it say?

> This is poor, isn't it?

How big are the packets?

> I didn't think that it's so easy to push a P3 to the limits.

It shouldn't be. Let's investigate this further.

> But I think it isn't the processor, it is the NIC that puts
> all the work on the processor.

Mhh, please try it with the modified 3c59x driver from the
page above.

Best regards,
Roberto Nibali, ratz

--
mailto: `echo NrOatSz@tPacA.cMh | sed 's/[NOSPAM]//g'`
RE: NIC cries when using testLVS [ In reply to ]
I had similar issues using the default loaded "3c59x" driver. After loading
the 3COM supplied "3c90x" driver, the symptoms vanished.

Dave
Re: NIC cries when using testLVS [ In reply to ]
Hello,

On Wed, 22 Nov 2000, Thomas Proell wrote:

> Hi!
>
> I think this is a bit Off-Topic, but maybe someone experienced the
> same problems using testLVS:
>
> I am using a 3c905 - TX - M on a Pentium 3. I connect a 100MBps
> line to it and simulate 100.000 different clients with testLVS.
> I'm using the newest kernel and LVS version 1.0.0.
>
> At the moment I start testLVS on the client, I can't use the
> redirector any more. Typing the command "ipvsadm" takes 5 minutes
> (really!), but it shows that all 100.000 connections are
> established. The kernel spits warnings that the interrupt is

100,000 inactive connection may be, they can't be established.

> very busy. xload shows over 600% load.

Is the director with one NIC only?

> Even with a new driver there wasn't big difference.
>
> On the cache however, we experience around 40.000 packets per
> second.

>40,000
testlvs ---------->
eth0 director
real server <-----'
40,000

40,000 + 40,000 packets on director's eth0? Full duplex? Switched
hub?

Do we need to add replies from the real servers in this sum?

> This is poor, isn't it?
>
> I didn't think that it's so easy to push a P3 to the limits.
> But I think it isn't the processor, it is the NIC that puts
> all the work on the processor.
>
> Any comments?


Regards

--
Julian Anastasov <ja@ssi.bg>
Re: NIC cries when using testLVS [ In reply to ]
Hi!

> 100,000 inactive connection may be, they can't be established.

sure, sorry.

> > very busy. xload shows over 600% load.
>
> Is the director with one NIC only?

Are you a fortune teller? Yes, only one NIC.


>
> >40,000
> testlvs ---------->
> eth0 director
> real server <-----'
> 40,000
>
> 40,000 + 40,000 packets on director's eth0? Full duplex? Switched
> hub?

How can I know? It's the network of our lab. I have to ask...

> Do we need to add replies from the real servers in this sum?

I guess not. I made a "route reject". With testLVS, only SYN-packets
are sent.


> > This is poor, isn't it?

Well, I slept over it, and maybe it isn't as poor as I thought in
the beginning. I got 2900 packets per second on a 10MBit network.
There, the network was the problem.
Now, I got more then 10 times as much, on a network that can do
10 times as much.

Only the busy redirector is a bit stunning.


Thomas
Re: NIC cries when using testLVS [ In reply to ]
Hi!

> Get the updated (for speed) driver from:
> http://www.uow.edu.au/~andrewm/linux/#3c59x-bc

Will try!

> if this doesn't work, implement TCP zero-copy ;)

I didn't get this joke. Do you thing about routing via
/dev/null?

> > very busy. xload shows over 600% load.
>
> Oh, wait a minute, I had this too, what exactly does it say?

Nice question. I was told that xload only meassures the load
of user processes, not of the kernel. But maybe, when the kernel
is slow, your user processes can't be processed as fast as normal
and you experience a higher load?

> How big are the packets?

It is testLVS, that means you send only syn-packets, and the server
rejects them.

> > But I think it isn't the processor, it is the NIC that puts
> > all the work on the processor.
>
> Mhh, please try it with the modified 3c59x driver from the
> page above.

Well, I don't have too much hope on it, since I have the newest
kernel (with the newes non-modified 3c59x) AND I tried the
driver for 3c90x from 3Com directly. And that one was better.


Thomas
RE: NIC cries when using testLVS [ In reply to ]
Hi!

> I had similar issues using the default loaded "3c59x" driver.
> After loading the 3COM supplied "3c90x" driver, the symptoms
> vanished.

Yes, the warnings vanished, but the throughput didn't improve.
Did you make meassurments? What results did you get? Maybe
I'm only overestimating the power of the whole system.


BTW: Could you turn off html?

Thomas

--
In a world without walls and fences - who needs windows and gates?
Re: NIC cries when using testLVS [ In reply to ]
Bon jour, Thomas,

> > if this doesn't work, implement TCP zero-copy ;)
>
> I didn't get this joke. Do you thing about routing via
> /dev/null?

Somewhat into that direction :) No, seriously I didn't mean it as a
joke. If you're interested, grab yourself a copy of:
http://www.digital.com/info/DTJS05/
and read the lkml thread about TCP zero-copy (and supported NICs):
http://boudicca.tux.org/hypermail/linux-kernel/2000week36/0979.html

> > > very busy. xload shows over 600% load.
> >
> > Oh, wait a minute, I had this too, what exactly does it say?
>
> Nice question. I was told that xload only meassures the load
> of user processes, not of the kernel. But maybe, when the kernel
> is slow, your user processes can't be processed as fast as normal
> and you experience a higher load?

Could you please do your tests without X-windows? X is such a memory
eater that performance tests at your level might be influenced by it.

Tha latter is a question of schedule(). I'd love to have access to a
lab like you do. We could make very cool stress tests.

> > How big are the packets?
>
> It is testLVS, that means you send only syn-packets, and the server
> rejects them.

Julian, does tcp_max_syn_backlog with enabled tcp_syncookies have any
impact on the timeout_synack or are they handled differently.

> Well, I don't have too much hope on it, since I have the newest
> kernel (with the newes non-modified 3c59x) AND I tried the
> driver for 3c90x from 3Com directly. And that one was better.

What do you mean by the newest kernel with the newest driver? Could
you tell us the versions? I know about 4 different 3c59x driver versions
floating around.

> Thomas

Best regards,
Roberto Nibali, ratz

--
mailto: `echo NrOatSz@tPacA.cMh | sed 's/[NOSPAM]//g'`
Re: NIC cries when using testLVS [ In reply to ]
Hi!

> Could you please do your tests without X-windows? X is such a memory
> eater that performance tests at your level might be influenced by it.

Yes, I know. But even on the console I have problems, so I started
X-window to get the info of xload - just to give you additional
informations!

> Tha latter is a question of schedule(). I'd love to have access to a
> lab like you do. We could make very cool stress tests.

Yes, this project is very good supported by the European Comunity
(why should I always pay and never win?) :-) :-)

> What do you mean by the newest kernel with the newest driver? Could
> you tell us the versions? I know about 4 different 3c59x driver versions
> floating around.

O.K:
Kernel 2.2.17 from www.kernel.org
LVS 1.0.0 from www.linux-vs.org
3c59x from this kernel (August, 16th 2000)
3c90x from www.support.3com.com v1.1

so the 3c59x is that driver that was shipped with the kernel.
I assume it is up to date.



Thomas
Re: NIC cries when using testLVS [ In reply to ]
Hi Thomas,

> Yes, I know. But even on the console I have problems, so I started
> X-window to get the info of xload - just to give you additional
> informations!

How many TX errors do you have under the SYN flood (maybe even per sec)?

> Yes, this project is very good supported by the European Comunity
> (why should I always pay and never win?) :-) :-)

Fair enough ;) So then I guess we take you as our test rabbit, or
I mean your lab.

> O.K:
> Kernel 2.2.17 from www.kernel.org
> LVS 1.0.0 from www.linux-vs.org
> 3c59x from this kernel (August, 16th 2000)
> 3c90x from www.support.3com.com v1.1

good!

> so the 3c59x is that driver that was shipped with the kernel.
> I assume it is up to date.

Nope, but never mind. The never one from www.scyld.com hasn't any
improvements towards you problem and the one from 2.2.18-pre22
either.

See, I can't really believe that you can saturate the box. Wait,
I think about something ;)

Later,
Roberto Nibali, ratz

--
mailto: `echo NrOatSz@tPacA.cMh | sed 's/[NOSPAM]//g'`
Re: NIC cries when using testLVS [ In reply to ]
Hi!

> How many TX errors do you have under the SYN flood (maybe even per sec)?

None. On any machine.

> Fair enough ;) So then I guess we take you as our test rabbit, or
> I mean your lab.

Unforunately they don't like if I waste their money and then give
the results to everybody. But if it's necessray for the work... :-)



Thomas
Re: NIC cries when using testLVS [ In reply to ]
Hi

> 40,000 + 40,000 packets on director's eth0? Full duplex? Switched hub?

The guy arrived, I asked him, and that's what he told me:
100 MBps
Full duplex
Connected by switch



Regards

Thomas
Re: NIC cries when using testLVS [ In reply to ]
Hi Thomas,

> The guy arrived, I asked him, and that's what he told me:
> 100 MBps
> Full duplex
> Connected by switch

I'm getting more and more curious about it, but maybe we should take
this now off-list since we're more and more OT. And if we found out
something interesting, we post it then, ok?

Could you try with 100Mbit HD and auto-negotiation? Try resetting the
eeprom settings to their factory values. There are some know issues
with the 3com Driver if running in FD mode like the mii-status FD flag
is disregarded and fun like that. Ok just try to ftp a big file (400MB)
through your machine with the current settings and see if you really
go close to 200Mbit/s (Should be 180Mbit).

Regards,
Roberto Nibali, ratz

--
mailto: `echo NrOatSz@tPacA.cMh | sed 's/[NOSPAM]//g'`
Re: NIC cries when using testLVS [ In reply to ]
Hello,

On Thu, 23 Nov 2000, ratz wrote:

> > > How big are the packets?
> >
> > It is testLVS, that means you send only syn-packets, and the server
> > rejects them.
>
> Julian, does tcp_max_syn_backlog with enabled tcp_syncookies have any
> impact on the timeout_synack or are they handled differently.

Yes, it seems there is one trick here. If you enable the rp_filter
for the indev in the real server the packets with saddr from the rejected
network are treated as source martians. But with rp_filter=0 route -Cn
shows that an incoming route is created in the routing cache, so it seems
this packet reaches the inqueue. This can be a good reason the syn backlog
to overflow. So, the recommendations is: don't enable the SYN cookies in
this test, of course, if you don't want to test the SYN cookies support.
And rp_filter=1 can help to drop the packets faster. Then you don't need
to alter any tcp settings.

So,

{all,eth0}/rp_filter=1

or

rp_filter=0 => tcp_syncookies=0


> Best regards,
> Roberto Nibali, ratz


Regards

--
Julian Anastasov <ja@ssi.bg>
Re: NIC cries when using testLVS [ In reply to ]
Hello,

On Thu, 23 Nov 2000, Thomas Proell wrote:

> > > This is poor, isn't it?
>
> Well, I slept over it, and maybe it isn't as poor as I thought in
> the beginning. I got 2900 packets per second on a 10MBit network.
> There, the network was the problem.
> Now, I got more then 10 times as much, on a network that can do
> 10 times as much.

Yes, the results look correct. Can you try with
{all,ethX}/rp_filter=1 in the real server(s)?

> Only the busy redirector is a bit stunning.
>
>
> Thomas


Regards

--
Julian Anastasov <ja@ssi.bg>
Re: NIC cries when using testLVS [ In reply to ]
Hi!

> Yes, the results look correct. Can you try with

??? I was getting a 400MB-file via ftp an got only 16MBytes per sec.
with 100% CPU load. I still think there's something wrong...

> {all,ethX}/rp_filter=1 in the real server(s)?

what exactly does that mean?


Thomas
Re: NIC cries when using testLVS [ In reply to ]
Thomas Proell wrote:
>
> Hi!
>
> > Yes, the results look correct. Can you try with
>
> ??? I was getting a 400MB-file via ftp an got only 16MBytes per sec.
> with 100% CPU load. I still think there's something wrong...

Are you sure that you got 16Mbytes/s? Because this would be ok.

> > {all,ethX}/rp_filter=1 in the real server(s)?
>
> what exactly does that mean?

cd /proc/sys/net/ipv4/conf; for i in all eth*; do echo 1 > $i/rp_filter; done

regards,
Roberto Nibali, ratz

--
mailto: `echo NrOatSz@tPacA.cMh | sed 's/[NOSPAM]//g'`
Re: NIC cries when using testLVS [ In reply to ]
Hi!

> Are you sure that you got 16Mbytes/s? Because this would be ok.

Hm. It told me 2.1e+03 Kbytes/sec, that would be

2100 Kbytes/sec = 2 Mbytes/s = 16 MBit/sec. Sorry :-}


Thomas
Re: NIC cries when using testLVS [ In reply to ]
Hi!

Together with Roberto Nibali <ratz> I found the main reasons for
the slow redirector. DMA wasn't activated. Then, those $#%@! who
installed the kernel on the other machine used an old version
of the driver with a new kernel ("downpatching" :)

O.K. FTP'ing a 300MB file takes 29sec now - 10 MByte per second,
80 MBps on a 100 MBps line. Is the still problem or is the missing
20% normal? (Cards on autodetect/negotiate)

rp_filter=1 didn't do much in the ftp scenario, but you can't tell
for sure, because it's always changing a bit - maybe 5%.

BUT:
When simulating 1 client (director has no work), up to 52,000 packets
per second arrive on the cache. This decreases with the increasing
number of simulated clients.
When simulating around 100,000 clients, the director starts not
to serve in real time: ipvsadm shows only 99,934 inactive conns.
When simulating 150,000 clients, we get errors because he can't
allocate memory - we have 256MB I guess. But nevertheless, the
load shown by top is about 400%


> Yes, it seems there is one trick here. If you enable the rp_filter
> for the indev in the real server the packets with saddr from the rejected
> network are treated as source martians. But with rp_filter=0 route -Cn
> shows that an incoming route is created in the routing cache, so it seems
> this packet reaches the inqueue. This can be a good reason the syn backlog
> to overflow. So, the recommendations is: don't enable the SYN cookies in
> this test, of course, if you don't want to test the SYN cookies support.
> And rp_filter=1 can help to drop the packets faster. Then you don't need
> to alter any tcp settings.

I'll have to do those settings and do further tests with testLVS.



Regards

Thomas



--
If a train station is where the train stops, what's a workstation?
Re: NIC cries when using testLVS [ In reply to ]
Hello,

On Fri, 24 Nov 2000, Thomas Proell wrote:

> Hi!
>
> Together with Roberto Nibali <ratz> I found the main reasons for
> the slow redirector. DMA wasn't activated. Then, those $#%@! who
> installed the kernel on the other machine used an old version
> of the driver with a new kernel ("downpatching" :)
>
> O.K. FTP'ing a 300MB file takes 29sec now - 10 MByte per second,
> 80 MBps on a 100 MBps line. Is the still problem or is the missing
> 20% normal? (Cards on autodetect/negotiate)

Try with 50-60MB file, use the result from the second test,
i.e. when the file is cached in the ftp server. You can achieve more
than 11MB/sec.

> rp_filter=1 didn't do much in the ftp scenario, but you can't tell
> for sure, because it's always changing a bit - maybe 5%.

No, the rp_filter usage is not related to the net performance.

Use this on the real server:

echo 1 > /proc/sys/net/ipv4/conf/all/rp_filter
echo 1 > /proc/sys/net/ipv4/conf/eth0/rp_filter

For more info:
less /usr/src/linux/Documentation/proc.txt

When using testlvs with reject rule in the real server, setting
rp_filter to 1 in the real server (if it is Linux) allows the packets
10/8 -> VIP to be dropped as source martians. By this way they don't
reach the input SYN queue (no SYN cookies generated). So, use rp_filter
to help the real server to survive the flood. The alternative is to use
ipchains -I input -s 10.0.0.0/8 -d VIP -j DENY


> BUT:
> When simulating 1 client (director has no work), up to 52,000 packets
> per second arrive on the cache. This decreases with the increasing
> number of simulated clients.
> When simulating around 100,000 clients, the director starts not
> to serve in real time: ipvsadm shows only 99,934 inactive conns.
> When simulating 150,000 clients, we get errors because he can't
> allocate memory - we have 256MB I guess. But nevertheless, the
> load shown by top is about 400%

May be you need to update to the recent procps 2.0.*:
ps --version

May be the memory in the director is not used only from LVS.
150,000 entries mean 19MB memory and this is too low for 256MB
director.

> Regards
>
> Thomas


Regards

--
Julian Anastasov <ja@ssi.bg>

1 2  View All