Mailing List Archive

[lvs-users] Port mapping with LVS-DR using fwmark
I've searched Google and this mailing list but haven't quite seen the same
configuration and/or setup as mine.

The ldirectord documentation states that port mapping on the same server
where the director resides is not possible other than masq, however it says
"non-fwmark". My setup is using fwmark, however, when trying to port map
from port 80 to another port, the client connection hangs. Here are the
exact details of my setup:

The VIP is on the same box as the director and RIP 172.17.0.16. This setup
works fine when no port mapping is being done, but I need to move the port
to something higher than 1024.

virtual=172.17.0.24:80
real=172.17.0.16:50000 gate 100
real=172.17.0.17:50000 gate 100
service=http
scheduler=rr
protocol=tcp
checktype=connect
fwmark=100

iptables:
iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
--dport 80 -j MARK --set-xmark 0x64/0xffffffff
iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
--to-ports 50000
iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
--to-ports 50000

Issue:
curl -v 'http://172.17.0.24'
* About to connect() to 172.17.0.24 port 80 (#0)
* Trying 172.17.0.24...

00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
0,nop,wscale 7], length 0
00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
0,nop,wscale 7], length 0
00:41:44.503658 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
9521949 ecr 2520815062,nop,wscale 7], length 0
00:41:44.503663 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
9521949 ecr 2520815062,nop,wscale 7], length 0

So the problem I'm having is that the source ip is not being translated by
iptables but sent via lvs as the RIP. Is there a kernel option, iptables
option or ipvsadm option that would allow it to change it back to the VIP?

Any help would be very appreciated!

Jacoby
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Jacoby,

You could put the iptables rules on each real server instead? (which
would do the same trick.)
LVS is on the INPUT chain so its very hard to use iptables rules like
this on the director node.
Their may be a way, I just don't know of it.



On 17 January 2014 01:27, Jacoby Hickerson <hickersonjl@gmail.com> wrote:
> I've searched Google and this mailing list but haven't quite seen the same
> configuration and/or setup as mine.
>
> The ldirectord documentation states that port mapping on the same server
> where the director resides is not possible other than masq, however it says
> "non-fwmark". My setup is using fwmark, however, when trying to port map
> from port 80 to another port, the client connection hangs. Here are the
> exact details of my setup:
>
> The VIP is on the same box as the director and RIP 172.17.0.16. This setup
> works fine when no port mapping is being done, but I need to move the port
> to something higher than 1024.
>
> virtual=172.17.0.24:80
> real=172.17.0.16:50000 gate 100
> real=172.17.0.17:50000 gate 100
> service=http
> scheduler=rr
> protocol=tcp
> checktype=connect
> fwmark=100
>
> iptables:
> iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> --dport 80 -j MARK --set-xmark 0x64/0xffffffff
> iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
> --to-ports 50000
> iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
> --to-ports 50000
>
> Issue:
> curl -v 'http://172.17.0.24'
> * About to connect() to 172.17.0.24 port 80 (#0)
> * Trying 172.17.0.24...
>
> 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
> 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
> 0,nop,wscale 7], length 0
> 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
> 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
> 0,nop,wscale 7], length 0
> 00:41:44.503658 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
> 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
> 9521949 ecr 2520815062,nop,wscale 7], length 0
> 00:41:44.503663 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
> 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
> 9521949 ecr 2520815062,nop,wscale 7], length 0
>
> So the problem I'm having is that the source ip is not being translated by
> iptables but sent via lvs as the RIP. Is there a kernel option, iptables
> option or ipvsadm option that would allow it to change it back to the VIP?
>
> Any help would be very appreciated!
>
> Jacoby
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-request@LinuxVirtualServer.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users



--
Regards,

Malcolm Turnbull.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Thanks Malcolm for the response. That is how it is setup, the real server
is the same as the director node for one of the nodes. Even if connecting
to only the primary node while all others are offline it does not work.

I noticed that if I use xinetd forwarding it works, but that's not what I
want to use. I wonder why iptables wouldn't be able to work, if there is a
method using iptables that'd be great.

Jacoby


On Thu, Jan 16, 2014 at 11:41 PM, Malcolm Turnbull <malcolm@loadbalancer.org
> wrote:

> Jacoby,
>
> You could put the iptables rules on each real server instead? (which
> would do the same trick.)
> LVS is on the INPUT chain so its very hard to use iptables rules like
> this on the director node.
> Their may be a way, I just don't know of it.
>
>
>
> On 17 January 2014 01:27, Jacoby Hickerson <hickersonjl@gmail.com> wrote:
> > I've searched Google and this mailing list but haven't quite seen the
> same
> > configuration and/or setup as mine.
> >
> > The ldirectord documentation states that port mapping on the same server
> > where the director resides is not possible other than masq, however it
> says
> > "non-fwmark". My setup is using fwmark, however, when trying to port map
> > from port 80 to another port, the client connection hangs. Here are the
> > exact details of my setup:
> >
> > The VIP is on the same box as the director and RIP 172.17.0.16. This
> setup
> > works fine when no port mapping is being done, but I need to move the
> port
> > to something higher than 1024.
> >
> > virtual=172.17.0.24:80
> > real=172.17.0.16:50000 gate 100
> > real=172.17.0.17:50000 gate 100
> > service=http
> > scheduler=rr
> > protocol=tcp
> > checktype=connect
> > fwmark=100
> >
> > iptables:
> > iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> > --dport 80 -j MARK --set-xmark 0x64/0xffffffff
> > iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
> > --to-ports 50000
> > iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
> > --to-ports 50000
> >
> > Issue:
> > curl -v 'http://172.17.0.24'
> > * About to connect() to 172.17.0.24 port 80 (#0)
> > * Trying 172.17.0.24...
> >
> > 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
> > 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
> > 0,nop,wscale 7], length 0
> > 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
> > 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
> > 0,nop,wscale 7], length 0
> > 00:41:44.503658 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
> > 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
> > 9521949 ecr 2520815062,nop,wscale 7], length 0
> > 00:41:44.503663 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
> > 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
> > 9521949 ecr 2520815062,nop,wscale 7], length 0
> >
> > So the problem I'm having is that the source ip is not being translated
> by
> > iptables but sent via lvs as the RIP. Is there a kernel option, iptables
> > option or ipvsadm option that would allow it to change it back to the
> VIP?
> >
> > Any help would be very appreciated!
> >
> > Jacoby
> > _______________________________________________
> > Please read the documentation before posting - it's available at:
> > http://www.linuxvirtualserver.org/
> >
> > LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> > Send requests to lvs-users-request@LinuxVirtualServer.org
> > or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>
>
>
> --
> Regards,
>
> Malcolm Turnbull.
>
> Loadbalancer.org Ltd.
> Phone: +44 (0)870 443 8779
> http://www.loadbalancer.org/
>
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-request@LinuxVirtualServer.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Jacoby,

iptables will work on a different physical server, but does not work
on the director node this is due to the way that LVS interacts with
netfilter.

More discussion here....
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.rewrite_ports.html

If you need port re-direction you could use HAProxy instead (but its
not transparent).



On 17 January 2014 18:54, Jacoby Hickerson <hickersonjl@gmail.com> wrote:
> Thanks Malcolm for the response. That is how it is setup, the real server
> is the same as the director node for one of the nodes. Even if connecting
> to only the primary node while all others are offline it does not work.
>
> I noticed that if I use xinetd forwarding it works, but that's not what I
> want to use. I wonder why iptables wouldn't be able to work, if there is a
> method using iptables that'd be great.
>
> Jacoby
>
>
> On Thu, Jan 16, 2014 at 11:41 PM, Malcolm Turnbull <malcolm@loadbalancer.org
>> wrote:
>
>> Jacoby,
>>
>> You could put the iptables rules on each real server instead? (which
>> would do the same trick.)
>> LVS is on the INPUT chain so its very hard to use iptables rules like
>> this on the director node.
>> Their may be a way, I just don't know of it.
>>
>>
>>
>> On 17 January 2014 01:27, Jacoby Hickerson <hickersonjl@gmail.com> wrote:
>> > I've searched Google and this mailing list but haven't quite seen the
>> same
>> > configuration and/or setup as mine.
>> >
>> > The ldirectord documentation states that port mapping on the same server
>> > where the director resides is not possible other than masq, however it
>> says
>> > "non-fwmark". My setup is using fwmark, however, when trying to port map
>> > from port 80 to another port, the client connection hangs. Here are the
>> > exact details of my setup:
>> >
>> > The VIP is on the same box as the director and RIP 172.17.0.16. This
>> setup
>> > works fine when no port mapping is being done, but I need to move the
>> port
>> > to something higher than 1024.
>> >
>> > virtual=172.17.0.24:80
>> > real=172.17.0.16:50000 gate 100
>> > real=172.17.0.17:50000 gate 100
>> > service=http
>> > scheduler=rr
>> > protocol=tcp
>> > checktype=connect
>> > fwmark=100
>> >
>> > iptables:
>> > iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
>> > --dport 80 -j MARK --set-xmark 0x64/0xffffffff
>> > iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
>> > --to-ports 50000
>> > iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
>> > --to-ports 50000
>> >
>> > Issue:
>> > curl -v 'http://172.17.0.24'
>> > * About to connect() to 172.17.0.24 port 80 (#0)
>> > * Trying 172.17.0.24...
>> >
>> > 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
>> > 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
>> > 0,nop,wscale 7], length 0
>> > 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
>> > 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
>> > 0,nop,wscale 7], length 0
>> > 00:41:44.503658 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
>> > 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
>> > 9521949 ecr 2520815062,nop,wscale 7], length 0
>> > 00:41:44.503663 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
>> > 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
>> > 9521949 ecr 2520815062,nop,wscale 7], length 0
>> >
>> > So the problem I'm having is that the source ip is not being translated
>> by
>> > iptables but sent via lvs as the RIP. Is there a kernel option, iptables
>> > option or ipvsadm option that would allow it to change it back to the
>> VIP?
>> >
>> > Any help would be very appreciated!
>> >
>> > Jacoby
>> > _______________________________________________
>> > Please read the documentation before posting - it's available at:
>> > http://www.linuxvirtualserver.org/
>> >
>> > LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
>> > Send requests to lvs-users-request@LinuxVirtualServer.org
>> > or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>>
>>
>>
>> --
>> Regards,
>>
>> Malcolm Turnbull.
>>
>> Loadbalancer.org Ltd.
>> Phone: +44 (0)870 443 8779
>> http://www.loadbalancer.org/
>>
>> _______________________________________________
>> Please read the documentation before posting - it's available at:
>> http://www.linuxvirtualserver.org/
>>
>> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
>> Send requests to lvs-users-request@LinuxVirtualServer.org
>> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>>
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-request@LinuxVirtualServer.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users



--
Regards,

Malcolm Turnbull.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Thanks! I also saw this discussion which seemed a bit closer, but I'm
unfamiliar with policy routing:
http://archive.linuxvirtualserver.org/html/lvs-users/2003-10/msg00034.html

Jacoby


On Fri, Jan 17, 2014 at 11:17 AM, Malcolm Turnbull <malcolm@loadbalancer.org
> wrote:

> Jacoby,
>
> iptables will work on a different physical server, but does not work
> on the director node this is due to the way that LVS interacts with
> netfilter.
>
> More discussion here....
> http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.rewrite_ports.html
>
> If you need port re-direction you could use HAProxy instead (but its
> not transparent).
>
>
>
> On 17 January 2014 18:54, Jacoby Hickerson <hickersonjl@gmail.com> wrote:
> > Thanks Malcolm for the response. That is how it is setup, the real
> server
> > is the same as the director node for one of the nodes. Even if
> connecting
> > to only the primary node while all others are offline it does not work.
> >
> > I noticed that if I use xinetd forwarding it works, but that's not what I
> > want to use. I wonder why iptables wouldn't be able to work, if there
> is a
> > method using iptables that'd be great.
> >
> > Jacoby
> >
> >
> > On Thu, Jan 16, 2014 at 11:41 PM, Malcolm Turnbull <
> malcolm@loadbalancer.org
> >> wrote:
> >
> >> Jacoby,
> >>
> >> You could put the iptables rules on each real server instead? (which
> >> would do the same trick.)
> >> LVS is on the INPUT chain so its very hard to use iptables rules like
> >> this on the director node.
> >> Their may be a way, I just don't know of it.
> >>
> >>
> >>
> >> On 17 January 2014 01:27, Jacoby Hickerson <hickersonjl@gmail.com>
> wrote:
> >> > I've searched Google and this mailing list but haven't quite seen the
> >> same
> >> > configuration and/or setup as mine.
> >> >
> >> > The ldirectord documentation states that port mapping on the same
> server
> >> > where the director resides is not possible other than masq, however it
> >> says
> >> > "non-fwmark". My setup is using fwmark, however, when trying to port
> map
> >> > from port 80 to another port, the client connection hangs. Here are
> the
> >> > exact details of my setup:
> >> >
> >> > The VIP is on the same box as the director and RIP 172.17.0.16. This
> >> setup
> >> > works fine when no port mapping is being done, but I need to move the
> >> port
> >> > to something higher than 1024.
> >> >
> >> > virtual=172.17.0.24:80
> >> > real=172.17.0.16:50000 gate 100
> >> > real=172.17.0.17:50000 gate 100
> >> > service=http
> >> > scheduler=rr
> >> > protocol=tcp
> >> > checktype=connect
> >> > fwmark=100
> >> >
> >> > iptables:
> >> > iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m
> tcp
> >> > --dport 80 -j MARK --set-xmark 0x64/0xffffffff
> >> > iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
> >> > --to-ports 50000
> >> > iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
> >> > --to-ports 50000
> >> >
> >> > Issue:
> >> > curl -v 'http://172.17.0.24'
> >> > * About to connect() to 172.17.0.24 port 80 (#0)
> >> > * Trying 172.17.0.24...
> >> >
> >> > 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
> >> > 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
> >> > 0,nop,wscale 7], length 0
> >> > 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
> >> > 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
> >> > 0,nop,wscale 7], length 0
> >> > 00:41:44.503658 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.],
> seq
> >> > 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
> >> > 9521949 ecr 2520815062,nop,wscale 7], length 0
> >> > 00:41:44.503663 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.],
> seq
> >> > 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
> >> > 9521949 ecr 2520815062,nop,wscale 7], length 0
> >> >
> >> > So the problem I'm having is that the source ip is not being
> translated
> >> by
> >> > iptables but sent via lvs as the RIP. Is there a kernel option,
> iptables
> >> > option or ipvsadm option that would allow it to change it back to the
> >> VIP?
> >> >
> >> > Any help would be very appreciated!
> >> >
> >> > Jacoby
> >> > _______________________________________________
> >> > Please read the documentation before posting - it's available at:
> >> > http://www.linuxvirtualserver.org/
> >> >
> >> > LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> >> > Send requests to lvs-users-request@LinuxVirtualServer.org
> >> > or go to http://lists.graemef.net/mailman/listinfo/lvs-users
> >>
> >>
> >>
> >> --
> >> Regards,
> >>
> >> Malcolm Turnbull.
> >>
> >> Loadbalancer.org Ltd.
> >> Phone: +44 (0)870 443 8779
> >> http://www.loadbalancer.org/
> >>
> >> _______________________________________________
> >> Please read the documentation before posting - it's available at:
> >> http://www.linuxvirtualserver.org/
> >>
> >> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> >> Send requests to lvs-users-request@LinuxVirtualServer.org
> >> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
> >>
> > _______________________________________________
> > Please read the documentation before posting - it's available at:
> > http://www.linuxvirtualserver.org/
> >
> > LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> > Send requests to lvs-users-request@LinuxVirtualServer.org
> > or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>
>
>
> --
> Regards,
>
> Malcolm Turnbull.
>
> Loadbalancer.org Ltd.
> Phone: +44 (0)870 443 8779
> http://www.loadbalancer.org/
>
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-request@LinuxVirtualServer.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Hello,

On Thu, 16 Jan 2014, Jacoby Hickerson wrote:

> I've searched Google and this mailing list but haven't quite seen the same
> configuration and/or setup as mine.
>
> The ldirectord documentation states that port mapping on the same server
> where the director resides is not possible other than masq, however it says
> "non-fwmark". My setup is using fwmark, however, when trying to port map
> from port 80 to another port, the client connection hangs. Here are the
> exact details of my setup:
>
> The VIP is on the same box as the director and RIP 172.17.0.16. This setup

VIPs are always on director. You mean RIP 172.17.0.16
and VIP 172.17.0.24 are on same box? Then there are 2 cases:
local and non-local client?

Only the masq forwarding method can change daddr and
dport in packet and it should work even for local client and
for local real server. Except if some recent regression happens.

> works fine when no port mapping is being done, but I need to move the port
> to something higher than 1024.
>
> virtual=172.17.0.24:80
> real=172.17.0.16:50000 gate 100
> real=172.17.0.17:50000 gate 100
> service=http
> scheduler=rr
> protocol=tcp
> checktype=connect
> fwmark=100
>
> iptables:
> iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> --dport 80 -j MARK --set-xmark 0x64/0xffffffff
> iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
> --to-ports 50000
> iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
> --to-ports 50000

Do you have CONFIG_IP_VS_NFCT enabled and sysctl var
/proc/sys/net/ipv4/vs/conntrack set to 1? IIRC, it is needed
when packets modified by IPVS need to be inspected later by netfilter
in POST_ROUTING or INPUT chain as is the case with the LocalNode
feature. If "conntrack" is not enabled, IPVS marks the packets as
untracked to avoid consuming memory for conntracks, the conntracks
are created and destroyed with every packet.

If consuming memory is not a problem, you can find that
conntrack=1 with tuned nf_conntrack_max is faster than conntrack=0.

Let me know if conntrack=1 does not help, so that
we can track the problem further.

> Issue:
> curl -v 'http://172.17.0.24'
> * About to connect() to 172.17.0.24 port 80 (#0)
> * Trying 172.17.0.24...
>
> 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
> 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
> 0,nop,wscale 7], length 0
> 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
> 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
> 0,nop,wscale 7], length 0
> 00:41:44.503658 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
> 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
> 9521949 ecr 2520815062,nop,wscale 7], length 0
> 00:41:44.503663 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
> 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
> 9521949 ecr 2520815062,nop,wscale 7], length 0
>
> So the problem I'm having is that the source ip is not being translated by
> iptables but sent via lvs as the RIP. Is there a kernel option, iptables
> option or ipvsadm option that would allow it to change it back to the VIP?
>
> Any help would be very appreciated!
>
> Jacoby

Regards

--
Julian Anastasov <ja@ssi.bg>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Thanks Julian! After enabling CONFIG_IP_VS_NFCT and setting conntrack to 1
that resolved the problem.

However, how leery should I be with it consuming memory? Is there a test
to monitor this consumption? Currently the nf_conntrack_max is set to the
default: 65536

Also yes the RIP and VIP are on the same box with local and non-local
clients.

Jacoby


On Fri, Jan 17, 2014 at 1:39 PM, Julian Anastasov <ja@ssi.bg> wrote:

>
> Hello,
>
> On Thu, 16 Jan 2014, Jacoby Hickerson wrote:
>
> > I've searched Google and this mailing list but haven't quite seen the
> same
> > configuration and/or setup as mine.
> >
> > The ldirectord documentation states that port mapping on the same server
> > where the director resides is not possible other than masq, however it
> says
> > "non-fwmark". My setup is using fwmark, however, when trying to port map
> > from port 80 to another port, the client connection hangs. Here are the
> > exact details of my setup:
> >
> > The VIP is on the same box as the director and RIP 172.17.0.16. This
> setup
>
> VIPs are always on director. You mean RIP 172.17.0.16
> and VIP 172.17.0.24 are on same box? Then there are 2 cases:
> local and non-local client?
>
> Only the masq forwarding method can change daddr and
> dport in packet and it should work even for local client and
> for local real server. Except if some recent regression happens.
>
> > works fine when no port mapping is being done, but I need to move the
> port
> > to something higher than 1024.
> >
> > virtual=172.17.0.24:80
> > real=172.17.0.16:50000 gate 100
> > real=172.17.0.17:50000 gate 100
> > service=http
> > scheduler=rr
> > protocol=tcp
> > checktype=connect
> > fwmark=100
> >
> > iptables:
> > iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> > --dport 80 -j MARK --set-xmark 0x64/0xffffffff
> > iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
> > --to-ports 50000
> > iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
> > --to-ports 50000
>
> Do you have CONFIG_IP_VS_NFCT enabled and sysctl var
> /proc/sys/net/ipv4/vs/conntrack set to 1? IIRC, it is needed
> when packets modified by IPVS need to be inspected later by netfilter
> in POST_ROUTING or INPUT chain as is the case with the LocalNode
> feature. If "conntrack" is not enabled, IPVS marks the packets as
> untracked to avoid consuming memory for conntracks, the conntracks
> are created and destroyed with every packet.
>
> If consuming memory is not a problem, you can find that
> conntrack=1 with tuned nf_conntrack_max is faster than conntrack=0.
>
> Let me know if conntrack=1 does not help, so that
> we can track the problem further.
>
> > Issue:
> > curl -v 'http://172.17.0.24'
> > * About to connect() to 172.17.0.24 port 80 (#0)
> > * Trying 172.17.0.24...
> >
> > 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
> > 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
> > 0,nop,wscale 7], length 0
> > 00:41:44.503581 IP 172.17.0.2.46099 > 172.17.0.24.80: Flags [S], seq
> > 1066084928, win 14600, options [mss 1460,sackOK,TS val 2520815062 ecr
> > 0,nop,wscale 7], length 0
> > 00:41:44.503658 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
> > 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
> > 9521949 ecr 2520815062,nop,wscale 7], length 0
> > 00:41:44.503663 IP 172.17.0.16.50000 > 172.17.0.2.46099: Flags [S.], seq
> > 824291086, ack 1066084929, win 14480, options [mss 1460,sackOK,TS val
> > 9521949 ecr 2520815062,nop,wscale 7], length 0
> >
> > So the problem I'm having is that the source ip is not being translated
> by
> > iptables but sent via lvs as the RIP. Is there a kernel option, iptables
> > option or ipvsadm option that would allow it to change it back to the
> VIP?
> >
> > Any help would be very appreciated!
> >
> > Jacoby
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Hello,

On Fri, 17 Jan 2014, Jacoby Hickerson wrote:

> Thanks Julian!  After enabling CONFIG_IP_VS_NFCT and setting conntrack to 1
> that resolved the problem.  
> However, how leery should I be with it consuming memory?  Is there a test to
> monitor this consumption?  Currently the nf_conntrack_max is set to the
> default: 65536

cat /proc/slabinfo | grep nf_conntrack
or 'slabtop' can show the object size used by conntracks.
It should be 240+ bytes. You can expect one conntrack per
IPVS connection. You can also see conntracks with
cat /proc/net/nf_conntrack | less

cat /proc/sys/net/netfilter/nf_conntrack_count shows
the current number of conntracks. You can look for 'count'
at __nf_conntrack_alloc() and nf_conntrack_free() to see
how it is implemented.

Regards

--
Julian Anastasov <ja@ssi.bg>
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Thanks again Julian that is very helpful information. And so far enabling
IPVS nf conntrack has no adverse effect on performance after looking at the
information you provided.


On Sat, Jan 18, 2014 at 12:44 AM, Julian Anastasov <ja@ssi.bg> wrote:

>
> Hello,
>
> On Fri, 17 Jan 2014, Jacoby Hickerson wrote:
>
> > Thanks Julian! After enabling CONFIG_IP_VS_NFCT and setting conntrack
> to 1
> > that resolved the problem.
> > However, how leery should I be with it consuming memory? Is there a
> test to
> > monitor this consumption? Currently the nf_conntrack_max is set to the
> > default: 65536
>
> cat /proc/slabinfo | grep nf_conntrack
> or 'slabtop' can show the object size used by conntracks.
> It should be 240+ bytes. You can expect one conntrack per
> IPVS connection. You can also see conntracks with
> cat /proc/net/nf_conntrack | less
>
> cat /proc/sys/net/netfilter/nf_conntrack_count shows
> the current number of conntracks. You can look for 'count'
> at __nf_conntrack_alloc() and nf_conntrack_free() to see
> how it is implemented.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
I spoke to soon about this configuration working, the output of ipvsadm
lead me to believe connections and packets were being load balanced,
however they are now all coming from the real server which also is the
director. I'll call this the 'first node'.

Here is some more info in addition to the details in my previous emails:
First node routing:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use
Iface
0.0.0.0 172.17.0.16 0.0.0.0 UG 0 0 0 bond0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond0

Second node routing:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use
Iface
0.0.0.0 172.17.0.16 0.0.0.0 UG 0 0 0
bond0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0
bond0

The second node also has the VIP on the loopback device

How I setup the director
ipvsadm -C
ipvsadm -A -f 100 -s rr
ipvsadm -a -f 100 -r 172.17.0.16 -w 100
ipvsadm -a -f 100 -r 172.17.0.17 -w 100

After conducting a few wget's here are the expected results from ipvsadm
while not port mapping:
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Conns InPkts OutPkts InBytes
OutBytes
-> RemoteAddress:Port
FWM 100 12 61 0 5416
0
-> 172.17.0.16:0 6 25 0 2422
0
-> 172.17.0.17:0 6 36 0 2994
0

Actual results from ipvsadm when using port mapping:
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Conns InPkts OutPkts InBytes
OutBytes
-> RemoteAddress:Port
FWM 100 2 8 4 778
216
-> 172.17.0.16:0 1 4 4 389
216
-> 172.17.0.17:0 1 4 0 389
0

Packets are being sent from the RIP of the first node only. From my
understanding when using DR OutPkts should always be zero.

Output from tcpdump is a bit strange (first pass):
02:10:51.986719 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [S], seq
1873056683, win 14600, options [mss 1460,sackOK,TS val 3044575792 ecr
0,nop,wscale 7], length 0
02:10:51.986719 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [S], seq
1873056683, win 14600, options [mss 1460,sackOK,TS val 3044575792 ecr
0,nop,wscale 7], length 0
02:10:51.986719 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [S], seq
1873056683, win 14600, options [mss 1460,sackOK,TS val 3044575792 ecr
0,nop,wscale 7], length 0
02:10:51.986814 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [S.], seq
2970678457, ack 1873056684, win 14480, options [mss 1460,sackOK,TS val
978483 ecr 3044575792,nop,wscale 7], length 0
02:10:51.986919 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [S.], seq
2970678457, ack 1873056684, win 14480, options [mss 1460,sackOK,TS val
978483 ecr 3044575792,nop,wscale 7], length 0
02:10:51.987030 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [.], ack 1, win
115, options [nop,nop,TS val 3044575793 ecr 978483], length 0
02:10:51.987030 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [.], ack 1, win
115, options [nop,nop,TS val 3044575793 ecr 978483], length 0
*02:10:51.987030 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack
2970678458, win 115, options [nop,nop,TS val 3044575793 ecr 978483], length
0*
02:10:51.987079 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [P.], seq
1:174, ack 1, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
length 173
02:10:51.987079 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [P.], seq
1:174, ack 1, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
length 173
*02:10:51.987079 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [P.], seq
0:173, ack 1, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
length 173*
02:10:51.987126 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [.], ack 174,
win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
02:10:51.987128 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [.], ack 174,
win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
02:10:51.987241 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [F.], seq 1,
ack 174, win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
02:10:51.987245 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [F.], seq 1,
ack 174, win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
02:10:51.987426 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [.], ack 2, win
115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
02:10:51.987426 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [.], ack 2, win
115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
*02:10:51.987426 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack 2,
win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0*
02:10:51.987480 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [F.], seq 174,
ack 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
02:10:51.987480 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [F.], seq 174,
ack 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
*02:10:51.987480 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [F.], seq
173, ack 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484], length
0*
02:10:51.987527 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [.], ack 175,
win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
02:10:51.987529 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [.], ack 175,
win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0

Second pass:
02:10:59.360433 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [S], seq
2134630706, win 14600, options [mss 1460,sackOK,TS val 3044583166 ecr
0,nop,wscale 7], length 0
02:10:59.360433 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [S], seq
2134630706, win 14600, options [mss 1460,sackOK,TS val 3044583166 ecr
0,nop,wscale 7], length 0
02:10:59.360519 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [S.], seq
2952275478, ack 2134630707, win 14480, options [mss 1460,sackOK,TS val
985857 ecr 3044583166,nop,wscale 7], length 0
02:10:59.360524 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [S.], seq
2952275478, ack 2134630707, win 14480, options [mss 1460,sackOK,TS val
985857 ecr 3044583166,nop,wscale 7], length 0
02:10:59.360627 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [.], ack 1, win
115, options [nop,nop,TS val 3044583166 ecr 985857], length 0
02:10:59.360627 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [.], ack 1, win
115, options [nop,nop,TS val 3044583166 ecr 985857], length 0
02:10:59.360728 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [P.], seq
1:174, ack 1, win 115, options [nop,nop,TS val 3044583166 ecr 985857],
length 173
02:10:59.360728 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [P.], seq
1:174, ack 1, win 115, options [nop,nop,TS val 3044583166 ecr 985857],
length 173
02:10:59.360751 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [.], ack 174,
win 122, options [nop,nop,TS val 985857 ecr 3044583166], length 0
02:10:59.360753 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [.], ack 174,
win 122, options [nop,nop,TS val 985857 ecr 3044583166], length 0
02:10:59.361058 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [F.], seq 1,
ack 174, win 122, options [nop,nop,TS val 985858 ecr 3044583166], length 0
02:10:59.361062 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [F.], seq 1,
ack 174, win 122, options [nop,nop,TS val 985858 ecr 3044583166], length 0
02:10:59.361270 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [F.], seq 174,
ack 2, win 115, options [nop,nop,TS val 3044583167 ecr 985858], length 0
02:10:59.361270 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [F.], seq 174,
ack 2, win 115, options [nop,nop,TS val 3044583167 ecr 985858], length 0
02:10:59.361331 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [.], ack 175,
win 122, options [nop,nop,TS val 985858 ecr 3044583167], length 0
02:10:59.361334 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [.], ack 175,
win 122, options [nop,nop,TS val 985858 ecr 3044583167], length 0

The end result is that the packets are always coming from the first node
and never balanced to the second node.

Thanks for any further help, seems the solution is really close!

Jacoby


On Sat, Jan 18, 2014 at 12:44 AM, Julian Anastasov <ja@ssi.bg> wrote:

>
> Hello,
>
> On Fri, 17 Jan 2014, Jacoby Hickerson wrote:
>
> > Thanks Julian! After enabling CONFIG_IP_VS_NFCT and setting conntrack
> to 1
> > that resolved the problem.
> > However, how leery should I be with it consuming memory? Is there a
> test to
> > monitor this consumption? Currently the nf_conntrack_max is set to the
> > default: 65536
>
> cat /proc/slabinfo | grep nf_conntrack
> or 'slabtop' can show the object size used by conntracks.
> It should be 240+ bytes. You can expect one conntrack per
> IPVS connection. You can also see conntracks with
> cat /proc/net/nf_conntrack | less
>
> cat /proc/sys/net/netfilter/nf_conntrack_count shows
> the current number of conntracks. You can look for 'count'
> at __nf_conntrack_alloc() and nf_conntrack_free() to see
> how it is implemented.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Just to clarify the packets are going to the loopback of node 1, when they
should be going to node 2. This is shown in the tcpdump output:
Here is the output from the lo device of the first node:
02:10:51.987030 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack
2970678458, win 115, options [nop,nop,TS val 3044575793 ecr 978483], length
0
02:10:51.987079 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [P.], seq
0:173, ack 1, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
length 173
02:10:51.987426 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack 2,
win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
02:10:51.987480 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [F.], seq
173, ack 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484], length
0

Thanks,
Jacoby


On Wed, Jan 22, 2014 at 7:04 PM, Jacoby Hickerson <hickersonjl@gmail.com>wrote:

> I spoke to soon about this configuration working, the output of ipvsadm
> lead me to believe connections and packets were being load balanced,
> however they are now all coming from the real server which also is the
> director. I'll call this the 'first node'.
>
> Here is some more info in addition to the details in my previous emails:
> First node routing:
> Kernel IP routing table
> Destination Gateway Genmask Flags Metric Ref Use
> Iface
> 0.0.0.0 172.17.0.16 0.0.0.0 UG 0 0 0
> bond0
> 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0
> bond0
>
> Second node routing:
> Kernel IP routing table
> Destination Gateway Genmask Flags Metric Ref Use
> Iface
> 0.0.0.0 172.17.0.16 0.0.0.0 UG 0 0 0
> bond0
> 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0
> bond0
>
> The second node also has the VIP on the loopback device
>
> How I setup the director
> ipvsadm -C
> ipvsadm -A -f 100 -s rr
> ipvsadm -a -f 100 -r 172.17.0.16 -w 100
> ipvsadm -a -f 100 -r 172.17.0.17 -w 100
>
> After conducting a few wget's here are the expected results from ipvsadm
> while not port mapping:
> IP Virtual Server version 1.2.1 (size=4096)
> Prot LocalAddress:Port Conns InPkts OutPkts InBytes
> OutBytes
> -> RemoteAddress:Port
> FWM 100 12 61 0 5416
> 0
> -> 172.17.0.16:0 6 25 0 2422
> 0
> -> 172.17.0.17:0 6 36 0 2994
> 0
>
> Actual results from ipvsadm when using port mapping:
> IP Virtual Server version 1.2.1 (size=4096)
> Prot LocalAddress:Port Conns InPkts OutPkts InBytes
> OutBytes
> -> RemoteAddress:Port
> FWM 100 2 8 4 778
> 216
> -> 172.17.0.16:0 1 4 4 389
> 216
> -> 172.17.0.17:0 1 4 0 389
> 0
>
> Packets are being sent from the RIP of the first node only. From my
> understanding when using DR OutPkts should always be zero.
>
> Output from tcpdump is a bit strange (first pass):
> 02:10:51.986719 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [S], seq
> 1873056683, win 14600, options [mss 1460,sackOK,TS val 3044575792 ecr
> 0,nop,wscale 7], length 0
> 02:10:51.986719 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [S], seq
> 1873056683, win 14600, options [mss 1460,sackOK,TS val 3044575792 ecr
> 0,nop,wscale 7], length 0
> 02:10:51.986719 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [S], seq
> 1873056683, win 14600, options [mss 1460,sackOK,TS val 3044575792 ecr
> 0,nop,wscale 7], length 0
> 02:10:51.986814 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [S.], seq
> 2970678457, ack 1873056684, win 14480, options [mss 1460,sackOK,TS val
> 978483 ecr 3044575792,nop,wscale 7], length 0
> 02:10:51.986919 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [S.], seq
> 2970678457, ack 1873056684, win 14480, options [mss 1460,sackOK,TS val
> 978483 ecr 3044575792,nop,wscale 7], length 0
> 02:10:51.987030 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [.], ack 1,
> win 115, options [nop,nop,TS val 3044575793 ecr 978483], length 0
> 02:10:51.987030 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [.], ack 1,
> win 115, options [nop,nop,TS val 3044575793 ecr 978483], length 0
> *02:10:51.987030 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack
> 2970678458, win 115, options [nop,nop,TS val 3044575793 ecr 978483], length
> 0*
> 02:10:51.987079 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [P.], seq
> 1:174, ack 1, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
> length 173
> 02:10:51.987079 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [P.], seq
> 1:174, ack 1, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
> length 173
> *02:10:51.987079 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [P.], seq
> 0:173, ack 1, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
> length 173*
> 02:10:51.987126 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [.], ack 174,
> win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
> 02:10:51.987128 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [.], ack 174,
> win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
> 02:10:51.987241 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [F.], seq 1,
> ack 174, win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
> 02:10:51.987245 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [F.], seq 1,
> ack 174, win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
> 02:10:51.987426 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [.], ack 2,
> win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
> 02:10:51.987426 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [.], ack 2,
> win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
> *02:10:51.987426 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack
> 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0*
> 02:10:51.987480 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [F.], seq 174,
> ack 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
> 02:10:51.987480 IP 172.17.0.2.54276 > 172.17.0.24.80: Flags [F.], seq 174,
> ack 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
> *02:10:51.987480 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [F.], seq
> 173, ack 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484], length
> 0*
> 02:10:51.987527 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [.], ack 175,
> win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
> 02:10:51.987529 IP 172.17.0.24.80 > 172.17.0.2.54276: Flags [.], ack 175,
> win 122, options [nop,nop,TS val 978484 ecr 3044575793], length 0
>
> Second pass:
> 02:10:59.360433 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [S], seq
> 2134630706, win 14600, options [mss 1460,sackOK,TS val 3044583166 ecr
> 0,nop,wscale 7], length 0
> 02:10:59.360433 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [S], seq
> 2134630706, win 14600, options [mss 1460,sackOK,TS val 3044583166 ecr
> 0,nop,wscale 7], length 0
> 02:10:59.360519 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [S.], seq
> 2952275478, ack 2134630707, win 14480, options [mss 1460,sackOK,TS val
> 985857 ecr 3044583166,nop,wscale 7], length 0
> 02:10:59.360524 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [S.], seq
> 2952275478, ack 2134630707, win 14480, options [mss 1460,sackOK,TS val
> 985857 ecr 3044583166,nop,wscale 7], length 0
> 02:10:59.360627 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [.], ack 1,
> win 115, options [nop,nop,TS val 3044583166 ecr 985857], length 0
> 02:10:59.360627 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [.], ack 1,
> win 115, options [nop,nop,TS val 3044583166 ecr 985857], length 0
> 02:10:59.360728 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [P.], seq
> 1:174, ack 1, win 115, options [nop,nop,TS val 3044583166 ecr 985857],
> length 173
> 02:10:59.360728 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [P.], seq
> 1:174, ack 1, win 115, options [nop,nop,TS val 3044583166 ecr 985857],
> length 173
> 02:10:59.360751 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [.], ack 174,
> win 122, options [nop,nop,TS val 985857 ecr 3044583166], length 0
> 02:10:59.360753 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [.], ack 174,
> win 122, options [nop,nop,TS val 985857 ecr 3044583166], length 0
> 02:10:59.361058 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [F.], seq 1,
> ack 174, win 122, options [nop,nop,TS val 985858 ecr 3044583166], length 0
> 02:10:59.361062 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [F.], seq 1,
> ack 174, win 122, options [nop,nop,TS val 985858 ecr 3044583166], length 0
> 02:10:59.361270 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [F.], seq 174,
> ack 2, win 115, options [nop,nop,TS val 3044583167 ecr 985858], length 0
> 02:10:59.361270 IP 172.17.0.2.54277 > 172.17.0.24.80: Flags [F.], seq 174,
> ack 2, win 115, options [nop,nop,TS val 3044583167 ecr 985858], length 0
> 02:10:59.361331 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [.], ack 175,
> win 122, options [nop,nop,TS val 985858 ecr 3044583167], length 0
> 02:10:59.361334 IP 172.17.0.24.80 > 172.17.0.2.54277: Flags [.], ack 175,
> win 122, options [nop,nop,TS val 985858 ecr 3044583167], length 0
>
> The end result is that the packets are always coming from the first node
> and never balanced to the second node.
>
> Thanks for any further help, seems the solution is really close!
>
> Jacoby
>
>
> On Sat, Jan 18, 2014 at 12:44 AM, Julian Anastasov <ja@ssi.bg> wrote:
>
>>
>> Hello,
>>
>> On Fri, 17 Jan 2014, Jacoby Hickerson wrote:
>>
>> > Thanks Julian! After enabling CONFIG_IP_VS_NFCT and setting conntrack
>> to 1
>> > that resolved the problem.
>> > However, how leery should I be with it consuming memory? Is there a
>> test to
>> > monitor this consumption? Currently the nf_conntrack_max is set to the
>> > default: 65536
>>
>> cat /proc/slabinfo | grep nf_conntrack
>> or 'slabtop' can show the object size used by conntracks.
>> It should be 240+ bytes. You can expect one conntrack per
>> IPVS connection. You can also see conntracks with
>> cat /proc/net/nf_conntrack | less
>>
>> cat /proc/sys/net/netfilter/nf_conntrack_count shows
>> the current number of conntracks. You can look for 'count'
>> at __nf_conntrack_alloc() and nf_conntrack_free() to see
>> how it is implemented.
>>
>> Regards
>>
>> --
>> Julian Anastasov <ja@ssi.bg>
>>
>
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Hello,

On Thu, 23 Jan 2014, Jacoby Hickerson wrote:

> Just to clarify the packets are going to the loopback of node 1, when they
> should be going to node 2.  This is shown in the tcpdump output:Here is the
> output from the lo device of the first node:
> 02:10:51.987030 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack
> 2970678458, win 115, options [nop,nop,TS val 3044575793 ecr 978483], length
> 0
> 02:10:51.987079 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [P.], seq
> 0:173, ack 1, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
> length 173
> 02:10:51.987426 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack 2,
> win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
> 02:10:51.987480 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [F.], seq
> 173, ack 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484], length
> 0

...

> Packets are being sent from the RIP of the first node only.  From my
> understanding when using DR OutPkts should always be zero.

When LocalNode (local RIP) is used, we can see
the local reply in LOCAL_OUT hook. It happens for NAT but
also for DR. So, it is normal. But we see these replies
after DNAT in LOCAL_OUT, see ip_vs_ops[] for reference.

> The end result is that the packets are always coming from the first
> node and never balanced to the second node.
>
> Thanks for any further help, seems the solution is really close!

Can you provide more understandable description
for the test, for example:

- client box:
IP1: X.X.X.X/N dev DEV
IP2: ...

- director:
IP1: ...
VIP: XXX
are client and director same box

- real server:
IP1: ...

iptable rules used. By this way I can try to
duplicate the problem. Now I see some IPs in tcpdump
output but I'm not sure what kind of traffic is shown,
where is started the tcpdump, on what box, on what
interface, external, internal...

Regards

--
Julian Anastasov <ja@ssi.bg>
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Certainly and that makes sense, I will consolidate what I've emailed before
with the additional information here.

# PC info: Linux 3.12.5 for real servers 1 and 2, and Linux 3.9.10 for the
client box.

There are 3 boxes total, client box, director/RIP1( real server 1) and RIP2
(real server 2):
- client box:
inet 172.17.0.2/16 brd 172.17.255.255 scope global eth1 #CIP

- director which is the same as real server 1 (RIP1). The client is on a
separate box.
inet 172.17.0.16/16 brd 172.17.255.255 scope global bond0
#RIP1
inet 172.17.0.24/16 brd 172.17.255.255 scope global secondary bond0:2 #VIP

- real server 2 (RIP2)
inet 172.17.0.24/32 scope global lo:0 #VIP on loopback
inet 172.17.0.17/16 brd 172.17.255.255 scope global bond0 #RIP2

# ipvs setup on real server 1 (RIP1) only
ipvsadm -C
ipvsadm -A -f 100 -s rr
ipvsadm -a -f 100 -r 172.17.0.16 -w 100
ipvsadm -a -f 100 -r 172.17.0.17 -w 100

# iptable rules (these rules are set for both real server 1 and real server
2)
iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
--dport 80 -j MARK --set-xmark 0x64/0xffffffff
iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
--to-ports 50000
iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
--to-ports 50000

The test I'm conducting is an http get from the client box connecting to
the VIP:
- Issue the following command on the client box:
curl -v 'http://172.17.0.24'

On both real servers there is an nginx webserver listening on port 50000

I also turned on debugging and ran the curl command with port mapping using
level 12 debug (this is output when the issue occurs of no load balancing).
Debug output on real server 1 after executing the curl command the first
time:

Jan 24 23:05:44 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns 0
refcnt 1 weight 100
Jan 24 23:05:44 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37455 v:
172.17.0.16:50130 d:172.17.0.17:50130 fwd:R s:65276 conn->flags:183
conn->refcnt:1 dest->refcnt:2
Jan 24 23:05:44 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37455 v:
172.17.0.16:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2
Jan 24 23:05:44 pc01 kernel: IPVS: TCP input [S...] 172.17.0.17:50130->
172.17.0.2:37455 state: NONE->SYN_RECV conn->refcnt:2
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
172.17.0.2:37455 not hit
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:50130->
172.17.0.2:37455 not hit
Jan 24 23:05:44 pc01 kernel: IPVS: lookup service: fwm 0 TCP
172.17.0.2:37455 not hit
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37455->
172.17.0.16:50130 not hit
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37455->
172.17.0.16:50130 hit
Jan 24 23:05:44 pc01 kernel: IPVS: TCP input [..A.] 172.17.0.17:50130->
172.17.0.2:37455 state: SYN_RECV->ESTABLISHED conn->refcnt:2
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37455->
172.17.0.16:50130 not hit
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37455->
172.17.0.16:50130 hit
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
172.17.0.2:37455 not hit
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:50130->
172.17.0.2:37455 not hit
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
172.17.0.2:37455 not hit
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:50130->
172.17.0.2:37455 not hit
Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37455->
172.17.0.16:50130 not hit
Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37455->
172.17.0.16:50130 hit
Jan 24 23:05:44 pc01 kernel: IPVS: TCP input [.FA.] 172.17.0.17:50130->
172.17.0.2:37455 state: ESTABLISHED->FIN_WAIT conn->refcnt:2

Debug output on real server 1 after executing the curl command a second
time:

Jan 24 23:05:45 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
Jan 24 23:05:45 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns 0
refcnt 1 weight 100
Jan 24 23:05:45 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37456 v:
172.17.0.16:50130 d:172.17.0.16:50130 fwd:R s:65276 conn->flags:183
conn->refcnt:1 dest->refcnt:2
Jan 24 23:05:45 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37456 v:
172.17.0.16:50130 d:172.17.0.16:50130 conn->flags:101C3 conn->refcnt:2
Jan 24 23:05:45 pc01 kernel: IPVS: TCP input [S...] 172.17.0.16:50130->
172.17.0.2:37456 state: NONE->SYN_RECV conn->refcnt:2
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
172.17.0.2:37456 hit
Jan 24 23:05:45 pc01 kernel: IPVS: Leave: handle_response,
net/netfilter/ipvs/ip_vs_core.c line 1094
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37456->
172.17.0.16:50130 not hit
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37456->
172.17.0.16:50130 hit
Jan 24 23:05:45 pc01 kernel: IPVS: TCP input [..A.] 172.17.0.16:50130->
172.17.0.2:37456 state: SYN_RECV->ESTABLISHED conn->refcnt:2
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37456->
172.17.0.16:50130 not hit
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37456->
172.17.0.16:50130 hit
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
172.17.0.2:37456 hit
Jan 24 23:05:45 pc01 kernel: IPVS: Leave: handle_response,
net/netfilter/ipvs/ip_vs_core.c line 1094
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
172.17.0.2:37456 hit
Jan 24 23:05:45 pc01 kernel: IPVS: TCP output [.FA.] 172.17.0.16:50130->
172.17.0.2:37456 state: ESTABLISHED->FIN_WAIT conn->refcnt:2
Jan 24 23:05:45 pc01 kernel: IPVS: Leave: handle_response,
net/netfilter/ipvs/ip_vs_core.c line 1094
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37456->
172.17.0.16:50130 not hit
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37456->
172.17.0.16:50130 hit
Jan 24 23:05:45 pc01 kernel: IPVS: TCP input [.FA.] 172.17.0.16:50130->
172.17.0.2:37456 state: FIN_WAIT->TIME_WAIT conn->refcnt:2
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
172.17.0.2:37456 hit
Jan 24 23:05:45 pc01 kernel: IPVS: Leave: handle_response,
net/netfilter/ipvs/ip_vs_core.c line 1094
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out UDP 172.17.0.16:50014->
239.192.0.1:50015 not hit
Jan 24 23:05:45 pc01 kernel: IPVS: packet type=2 proto=17 daddr=239.192.0.1
ignored in hook 1
Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out UDP 127.0.0.1:45176->
127.0.0.1:53 not hit
Jan 24 23:05:45 pc01 kernel: IPVS: lookup/in UDP 127.0.0.1:45176->
127.0.0.1:53 not hit
Jan 24 23:05:45 pc01 kernel: IPVS: lookup service: fwm 0 UDP 127.0.0.1:53not hit

Below is an example of good results when connecting directly to port 50000.
For this scenario I removed port 80 and updated iptables with fwmark for
port 50000:
iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
--dport 50000 -j MARK --set-xmark 0x64/0xffffffff

Debug output on real server 1 when not port mapping first test (curl -v
'http://172.17.0.24:50000'):

Jan 25 00:19:37 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
Jan 25 00:19:37 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns 0
refcnt 1 weight 100
Jan 25 00:19:37 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:42815 v:
172.17.0.24:50130 d:172.17.0.17:50130 fwd:R s:4 conn->flags:183
conn->refcnt:1 dest->refcnt:2
Jan 25 00:19:37 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:42815 v:
172.17.0.24:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2
Jan 25 00:19:37 pc01 kernel: IPVS: TCP input [S...] 172.17.0.17:50130->
172.17.0.2:42815 state: NONE->SYN_RECV conn->refcnt:2
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 25 00:19:37 pc01 kernel: IPVS: new dst 172.17.0.17, src 172.17.0.16,
refcnt=1
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:37 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42815->
172.17.0.24:50130 not hit
Jan 25 00:19:37 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42815->
172.17.0.24:50130 hit
Jan 25 00:19:37 pc01 kernel: IPVS: TCP input [..A.] 172.17.0.17:50130->
172.17.0.2:42815 state: SYN_RECV->ESTABLISHED conn->refcnt:2
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:37 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42815->
172.17.0.24:50130 not hit
Jan 25 00:19:37 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42815->
172.17.0.24:50130 hit
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:37 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42815->
172.17.0.24:50130 not hit
Jan 25 00:19:37 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42815->
172.17.0.24:50130 hit
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:37 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42815->
172.17.0.24:50130 not hit
Jan 25 00:19:37 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42815->
172.17.0.24:50130 hit
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:37 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42815->
172.17.0.24:50130 not hit
Jan 25 00:19:37 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42815->
172.17.0.24:50130 hit
Jan 25 00:19:37 pc01 kernel: IPVS: TCP input [.FA.] 172.17.0.17:50130->
172.17.0.2:42815 state: ESTABLISHED->FIN_WAIT conn->refcnt:2
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116

Debug output on real server 1 when not port mapping second test (curl -v
'http://172.17.0.24:50000'):

Jan 25 00:19:39 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
Jan 25 00:19:39 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns 0
refcnt 1 weight 100
Jan 25 00:19:39 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:42816 v:
172.17.0.24:50130 d:172.17.0.16:50130 fwd:R s:65276 conn->flags:183
conn->refcnt:1 dest->refcnt:2
Jan 25 00:19:39 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:42816 v:
172.17.0.24:50130 d:172.17.0.16:50130 conn->flags:101C3 conn->refcnt:2
Jan 25 00:19:39 pc01 kernel: IPVS: TCP input [S...] 172.17.0.16:50130->
172.17.0.2:42816 state: NONE->SYN_RECV conn->refcnt:2
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 25 00:19:39 pc01 kernel: IPVS: new dst 172.17.0.16, src 172.17.0.16,
refcnt=1
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50130->
172.17.0.2:42816 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50130->
172.17.0.2:42816 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: lookup service: fwm 0 TCP
172.17.0.2:42816 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42816->
172.17.0.24:50130 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42816->
172.17.0.24:50130 hit
Jan 25 00:19:39 pc01 kernel: IPVS: TCP input [..A.] 172.17.0.16:50130->
172.17.0.2:42816 state: SYN_RECV->ESTABLISHED conn->refcnt:2
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42816->
172.17.0.24:50130 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42816->
172.17.0.24:50130 hit
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50130->
172.17.0.2:42816 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50130->
172.17.0.2:42816 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50130->
172.17.0.2:42816 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50130->
172.17.0.2:42816 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42816->
172.17.0.24:50130 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42816->
172.17.0.24:50130 hit
Jan 25 00:19:39 pc01 kernel: IPVS: TCP input [.FA.] 172.17.0.16:50130->
172.17.0.2:42816 state: ESTABLISHED->FIN_WAIT conn->refcnt:2
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50130->
172.17.0.2:42816 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50130->
172.17.0.2:42816 not hit
Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:39545->
172.17.0.16:3306 not hit

The tcpdump command that was used was as follows on real server 1:
tcpdump -iany -nn port 80 or port 50000

I realized later that using 'any' device isn't as helpful when trying to
pinpoint loopback traffic, so that's what my follow up email was referring
to.

Thanks again for the support, feel free to ask for any additional
information to help debug.

Jacoby


On Sat, Jan 25, 2014 at 6:25 AM, Julian Anastasov <ja@ssi.bg> wrote:

>
> Hello,
>
> On Thu, 23 Jan 2014, Jacoby Hickerson wrote:
>
> > Just to clarify the packets are going to the loopback of node 1, when
> they
> > should be going to node 2. This is shown in the tcpdump output:Here is
> the
> > output from the lo device of the first node:
> > 02:10:51.987030 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack
> > 2970678458, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
> length
> > 0
> > 02:10:51.987079 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [P.], seq
> > 0:173, ack 1, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
> > length 173
> > 02:10:51.987426 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack
> 2,
> > win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
> > 02:10:51.987480 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [F.], seq
> > 173, ack 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484],
> length
> > 0
>
> ...
>
> > Packets are being sent from the RIP of the first node only. From my
> > understanding when using DR OutPkts should always be zero.
>
> When LocalNode (local RIP) is used, we can see
> the local reply in LOCAL_OUT hook. It happens for NAT but
> also for DR. So, it is normal. But we see these replies
> after DNAT in LOCAL_OUT, see ip_vs_ops[] for reference.
>
> > The end result is that the packets are always coming from the first
> > node and never balanced to the second node.
> >
> > Thanks for any further help, seems the solution is really close!
>
> Can you provide more understandable description
> for the test, for example:
>
> - client box:
> IP1: X.X.X.X/N dev DEV
> IP2: ...
>
> - director:
> IP1: ...
> VIP: XXX
> are client and director same box
>
> - real server:
> IP1: ...
>
> iptable rules used. By this way I can try to
> duplicate the problem. Now I see some IPs in tcpdump
> output but I'm not sure what kind of traffic is shown,
> where is started the tcpdump, on what box, on what
> interface, external, internal...
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Apologies, the debug output showing port 50130 should be 50000

ex:
IPVS: lookup/in TCP 172.17.0.24:*50130*->172.17.0.2:42816 not hit
should be:
IPVS: lookup/in TCP 172.17.0.24:*50000*->172.17.0.2:42816 not hit

I have attached the file 'ipvs_debug_output' with correct debug so that
this thread is not cluttered.

Thanks!

Jacoby


On Mon, Jan 27, 2014 at 4:00 PM, Jacoby Hickerson <hickersonjl@gmail.com>wrote:

> Certainly and that makes sense, I will consolidate what I've emailed
> before with the additional information here.
>
> # PC info: Linux 3.12.5 for real servers 1 and 2, and Linux 3.9.10 for the
> client box.
>
> There are 3 boxes total, client box, director/RIP1( real server 1) and
> RIP2 (real server 2):
> - client box:
> inet 172.17.0.2/16 brd 172.17.255.255 scope global eth1 #CIP
>
> - director which is the same as real server 1 (RIP1). The client is on a
> separate box.
> inet 172.17.0.16/16 brd 172.17.255.255 scope global bond0
> #RIP1
> inet 172.17.0.24/16 brd 172.17.255.255 scope global secondary bond0:2
> #VIP
>
> - real server 2 (RIP2)
> inet 172.17.0.24/32 scope global lo:0 #VIP on
> loopback
> inet 172.17.0.17/16 brd 172.17.255.255 scope global bond0 #RIP2
>
> # ipvs setup on real server 1 (RIP1) only
> ipvsadm -C
> ipvsadm -A -f 100 -s rr
> ipvsadm -a -f 100 -r 172.17.0.16 -w 100
> ipvsadm -a -f 100 -r 172.17.0.17 -w 100
>
> # iptable rules (these rules are set for both real server 1 and real
> server 2)
> iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> --dport 80 -j MARK --set-xmark 0x64/0xffffffff
> iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
> --to-ports 50000
> iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
> --to-ports 50000
>
> The test I'm conducting is an http get from the client box connecting to
> the VIP:
> - Issue the following command on the client box:
> curl -v 'http://172.17.0.24'
>
> On both real servers there is an nginx webserver listening on port 50000
>
> I also turned on debugging and ran the curl command with port mapping
> using level 12 debug (this is output when the issue occurs of no load
> balancing).
> Debug output on real server 1 after executing the curl command the first
> time:
>
> Jan 24 23:05:44 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns 0
> refcnt 1 weight 100
> Jan 24 23:05:44 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37455 v:
> 172.17.0.16:50130 d:172.17.0.17:50130 fwd:R s:65276 conn->flags:183
> conn->refcnt:1 dest->refcnt:2
> Jan 24 23:05:44 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37455 v:
> 172.17.0.16:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2
> Jan 24 23:05:44 pc01 kernel: IPVS: TCP input [S...] 172.17.0.17:50130->
> 172.17.0.2:37455 state: NONE->SYN_RECV conn->refcnt:2
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1031
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
> 172.17.0.2:37455 not hit
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:50130->
> 172.17.0.2:37455 not hit
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup service: fwm 0 TCP
> 172.17.0.2:37455 not hit
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37455->
> 172.17.0.16:50130 not hit
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37455->
> 172.17.0.16:50130 hit
> Jan 24 23:05:44 pc01 kernel: IPVS: TCP input [..A.] 172.17.0.17:50130->
> 172.17.0.2:37455 state: SYN_RECV->ESTABLISHED conn->refcnt:2
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1031
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37455->
> 172.17.0.16:50130 not hit
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37455->
> 172.17.0.16:50130 hit
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1031
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
> 172.17.0.2:37455 not hit
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:50130->
> 172.17.0.2:37455 not hit
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
> 172.17.0.2:37455 not hit
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:50130->
> 172.17.0.2:37455 not hit
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37455->
> 172.17.0.16:50130 not hit
> Jan 24 23:05:44 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37455->
> 172.17.0.16:50130 hit
> Jan 24 23:05:44 pc01 kernel: IPVS: TCP input [.FA.] 172.17.0.17:50130->
> 172.17.0.2:37455 state: ESTABLISHED->FIN_WAIT conn->refcnt:2
>
> Debug output on real server 1 after executing the curl command a second
> time:
>
> Jan 24 23:05:45 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
> Jan 24 23:05:45 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns 0
> refcnt 1 weight 100
> Jan 24 23:05:45 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37456 v:
> 172.17.0.16:50130 d:172.17.0.16:50130 fwd:R s:65276 conn->flags:183
> conn->refcnt:1 dest->refcnt:2
> Jan 24 23:05:45 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37456 v:
> 172.17.0.16:50130 d:172.17.0.16:50130 conn->flags:101C3 conn->refcnt:2
> Jan 24 23:05:45 pc01 kernel: IPVS: TCP input [S...] 172.17.0.16:50130->
> 172.17.0.2:37456 state: NONE->SYN_RECV conn->refcnt:2
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
> 172.17.0.2:37456 hit
> Jan 24 23:05:45 pc01 kernel: IPVS: Leave: handle_response,
> net/netfilter/ipvs/ip_vs_core.c line 1094
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37456->
> 172.17.0.16:50130 not hit
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37456->
> 172.17.0.16:50130 hit
> Jan 24 23:05:45 pc01 kernel: IPVS: TCP input [..A.] 172.17.0.16:50130->
> 172.17.0.2:37456 state: SYN_RECV->ESTABLISHED conn->refcnt:2
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37456->
> 172.17.0.16:50130 not hit
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37456->
> 172.17.0.16:50130 hit
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
> 172.17.0.2:37456 hit
> Jan 24 23:05:45 pc01 kernel: IPVS: Leave: handle_response,
> net/netfilter/ipvs/ip_vs_core.c line 1094
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
> 172.17.0.2:37456 hit
> Jan 24 23:05:45 pc01 kernel: IPVS: TCP output [.FA.] 172.17.0.16:50130->
> 172.17.0.2:37456 state: ESTABLISHED->FIN_WAIT conn->refcnt:2
> Jan 24 23:05:45 pc01 kernel: IPVS: Leave: handle_response,
> net/netfilter/ipvs/ip_vs_core.c line 1094
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:37456->
> 172.17.0.16:50130 not hit
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:37456->
> 172.17.0.16:50130 hit
> Jan 24 23:05:45 pc01 kernel: IPVS: TCP input [.FA.] 172.17.0.16:50130->
> 172.17.0.2:37456 state: FIN_WAIT->TIME_WAIT conn->refcnt:2
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:50130->
> 172.17.0.2:37456 hit
> Jan 24 23:05:45 pc01 kernel: IPVS: Leave: handle_response,
> net/netfilter/ipvs/ip_vs_core.c line 1094
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out UDP 172.17.0.16:50014->
> 239.192.0.1:50015 not hit
> Jan 24 23:05:45 pc01 kernel: IPVS: packet type=2 proto=17
> daddr=239.192.0.1 ignored in hook 1
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out UDP 127.0.0.1:45176->
> 127.0.0.1:53 not hit
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/in UDP 127.0.0.1:45176->
> 127.0.0.1:53 not hit
> Jan 24 23:05:45 pc01 kernel: IPVS: lookup service: fwm 0 UDP 127.0.0.1:53not hit
>
> Below is an example of good results when connecting directly to port
> 50000. For this scenario I removed port 80 and updated iptables with
> fwmark for port 50000:
> iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> --dport 50000 -j MARK --set-xmark 0x64/0xffffffff
>
> Debug output on real server 1 when not port mapping first test (curl -v
> 'http://172.17.0.24:50000'):
>
> Jan 25 00:19:37 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
> Jan 25 00:19:37 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns 0
> refcnt 1 weight 100
> Jan 25 00:19:37 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:42815 v:
> 172.17.0.24:50130 d:172.17.0.17:50130 fwd:R s:4 conn->flags:183
> conn->refcnt:1 dest->refcnt:2
> Jan 25 00:19:37 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:42815 v:
> 172.17.0.24:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2
> Jan 25 00:19:37 pc01 kernel: IPVS: TCP input [S...] 172.17.0.17:50130->
> 172.17.0.2:42815 state: NONE->SYN_RECV conn->refcnt:2
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:37 pc01 kernel: IPVS: new dst 172.17.0.17, src 172.17.0.16,
> refcnt=1
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1031
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42815->
> 172.17.0.24:50130 not hit
> Jan 25 00:19:37 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42815->
> 172.17.0.24:50130 hit
> Jan 25 00:19:37 pc01 kernel: IPVS: TCP input [..A.] 172.17.0.17:50130->
> 172.17.0.2:42815 state: SYN_RECV->ESTABLISHED conn->refcnt:2
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1031
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42815->
> 172.17.0.24:50130 not hit
> Jan 25 00:19:37 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42815->
> 172.17.0.24:50130 hit
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1031
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42815->
> 172.17.0.24:50130 not hit
> Jan 25 00:19:37 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42815->
> 172.17.0.24:50130 hit
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1031
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42815->
> 172.17.0.24:50130 not hit
> Jan 25 00:19:37 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42815->
> 172.17.0.24:50130 hit
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1031
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42815->
> 172.17.0.24:50130 not hit
> Jan 25 00:19:37 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42815->
> 172.17.0.24:50130 hit
> Jan 25 00:19:37 pc01 kernel: IPVS: TCP input [.FA.] 172.17.0.17:50130->
> 172.17.0.2:42815 state: ESTABLISHED->FIN_WAIT conn->refcnt:2
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
>
> Debug output on real server 1 when not port mapping second test (curl -v
> 'http://172.17.0.24:50000'):
>
> Jan 25 00:19:39 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
> Jan 25 00:19:39 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns 0
> refcnt 1 weight 100
> Jan 25 00:19:39 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:42816 v:
> 172.17.0.24:50130 d:172.17.0.16:50130 fwd:R s:65276 conn->flags:183
> conn->refcnt:1 dest->refcnt:2
> Jan 25 00:19:39 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:42816 v:
> 172.17.0.24:50130 d:172.17.0.16:50130 conn->flags:101C3 conn->refcnt:2
> Jan 25 00:19:39 pc01 kernel: IPVS: TCP input [S...] 172.17.0.16:50130->
> 172.17.0.2:42816 state: NONE->SYN_RECV conn->refcnt:2
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:39 pc01 kernel: IPVS: new dst 172.17.0.16, src 172.17.0.16,
> refcnt=1
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50130->
> 172.17.0.2:42816 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50130->
> 172.17.0.2:42816 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup service: fwm 0 TCP
> 172.17.0.2:42816 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42816->
> 172.17.0.24:50130 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42816->
> 172.17.0.24:50130 hit
> Jan 25 00:19:39 pc01 kernel: IPVS: TCP input [..A.] 172.17.0.16:50130->
> 172.17.0.2:42816 state: SYN_RECV->ESTABLISHED conn->refcnt:2
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42816->
> 172.17.0.24:50130 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42816->
> 172.17.0.24:50130 hit
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50130->
> 172.17.0.2:42816 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50130->
> 172.17.0.2:42816 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50130->
> 172.17.0.2:42816 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50130->
> 172.17.0.2:42816 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:42816->
> 172.17.0.24:50130 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:42816->
> 172.17.0.24:50130 hit
> Jan 25 00:19:39 pc01 kernel: IPVS: TCP input [.FA.] 172.17.0.16:50130->
> 172.17.0.2:42816 state: ESTABLISHED->FIN_WAIT conn->refcnt:2
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50130->
> 172.17.0.2:42816 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50130->
> 172.17.0.2:42816 not hit
> Jan 25 00:19:39 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:39 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:39545->
> 172.17.0.16:3306 not hit
>
> The tcpdump command that was used was as follows on real server 1:
> tcpdump -iany -nn port 80 or port 50000
>
> I realized later that using 'any' device isn't as helpful when trying to
> pinpoint loopback traffic, so that's what my follow up email was referring
> to.
>
> Thanks again for the support, feel free to ask for any additional
> information to help debug.
>
> Jacoby
>
>
> On Sat, Jan 25, 2014 at 6:25 AM, Julian Anastasov <ja@ssi.bg> wrote:
>
>>
>> Hello,
>>
>> On Thu, 23 Jan 2014, Jacoby Hickerson wrote:
>>
>> > Just to clarify the packets are going to the loopback of node 1, when
>> they
>> > should be going to node 2. This is shown in the tcpdump output:Here is
>> the
>> > output from the lo device of the first node:
>> > 02:10:51.987030 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack
>> > 2970678458, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
>> length
>> > 0
>> > 02:10:51.987079 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [P.], seq
>> > 0:173, ack 1, win 115, options [nop,nop,TS val 3044575793 ecr 978483],
>> > length 173
>> > 02:10:51.987426 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [.], ack
>> 2,
>> > win 115, options [nop,nop,TS val 3044575793 ecr 978484], length 0
>> > 02:10:51.987480 IP 172.17.0.2.54276 > 172.17.0.16.50000: Flags [F.], seq
>> > 173, ack 2, win 115, options [nop,nop,TS val 3044575793 ecr 978484],
>> length
>> > 0
>>
>> ...
>>
>> > Packets are being sent from the RIP of the first node only. From my
>> > understanding when using DR OutPkts should always be zero.
>>
>> When LocalNode (local RIP) is used, we can see
>> the local reply in LOCAL_OUT hook. It happens for NAT but
>> also for DR. So, it is normal. But we see these replies
>> after DNAT in LOCAL_OUT, see ip_vs_ops[] for reference.
>>
>> > The end result is that the packets are always coming from the first
>> > node and never balanced to the second node.
>> >
>> > Thanks for any further help, seems the solution is really close!
>>
>> Can you provide more understandable description
>> for the test, for example:
>>
>> - client box:
>> IP1: X.X.X.X/N dev DEV
>> IP2: ...
>>
>> - director:
>> IP1: ...
>> VIP: XXX
>> are client and director same box
>>
>> - real server:
>> IP1: ...
>>
>> iptable rules used. By this way I can try to
>> duplicate the problem. Now I see some IPs in tcpdump
>> output but I'm not sure what kind of traffic is shown,
>> where is started the tcpdump, on what box, on what
>> interface, external, internal...
>>
>> Regards
>>
>> --
>> Julian Anastasov <ja@ssi.bg>
>>
>
>
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Hello,

On Mon, 27 Jan 2014, Jacoby Hickerson wrote:

> Certainly and that makes sense, I will consolidate what I've emailed before
> with the additional information here.
>
> # PC info: Linux 3.12.5 for real servers 1 and 2, and Linux 3.9.10 for the
> client box. 
>
> There are 3 boxes total, client box, director/RIP1( real server 1) and RIP2
> (real server 2):
> - client box:
> inet 172.17.0.2/16 brd 172.17.255.255 scope global eth1   #CIP
>
> - director which is the same as real server 1 (RIP1).  The client is on a
> separate box.
> inet 172.17.0.16/16 brd 172.17.255.255 scope global bond0              
> #RIP1
> inet 172.17.0.24/16 brd 172.17.255.255 scope global secondary bond0:2   #VIP
>
> - real server 2 (RIP2)
> inet 172.17.0.24/32 scope global lo:0                      #VIP on loopback
> inet 172.17.0.17/16 brd 172.17.255.255 scope global bond0  #RIP2
>
> # ipvs setup on real server 1 (RIP1) only
> ipvsadm -C
> ipvsadm -A -f 100 -s rr
> ipvsadm -a -f 100 -r 172.17.0.16 -w 100
> ipvsadm -a -f 100 -r 172.17.0.17 -w 100
>
> # iptable rules (these rules are set for both real server 1 and real server
> 2)
> iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> --dport 80 -j MARK --set-xmark 0x64/0xffffffff
> iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
> --to-ports 50000
> iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
> --to-ports 50000
>
> The test I'm conducting is an http get from the client box connecting to the
> VIP:
> - Issue the following command on the client box:
> curl -v 'http://172.17.0.24'
>
> On both real servers there is an nginx webserver listening on port 50000
>
> I also turned on debugging and ran the curl command with port mapping using
> level 12 debug (this is output when the issue occurs of no load balancing).
> Debug output on real server 1 after executing the curl command the first
> time:
>
> Jan 24 23:05:44 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns 0
> refcnt 1 weight 100

The debug output was very helpful.

Looks like -j REDIRECT combined with DR is a bad idea.
When packet comes to IPVS the daddr is already 172.17.0.16,
see the "v:172.17.0.16" line below:

> Jan 24 23:05:44 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37455
> v:172.17.0.16:50130 d:172.17.0.17:50130 fwd:R s:65276 conn->flags:183
> conn->refcnt:1 dest->refcnt:2

The remote real server 2 is not configured for
such VIP (172.17.0.16). I don't remember when was
-j REDIRECT used for IPVS setups, may be for transparent
proxy setups.

Why not just use NAT method for both servers
without any REDIRECT rules?

Even -j DNAT --to-destination VIP:50000 has better
chance to use VIP instead of first IP.

> Jan 24 23:05:44 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37455
> v:172.17.0.16:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2
> Jan 24 23:05:44 pc01 kernel: IPVS: TCP input  [S...]
> 172.17.0.17:50130->172.17.0.2:37455 state: NONE->SYN_RECV conn->refcnt:2
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 24 23:05:44 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1031

Above "ip_vs_xmit.c line 1031" means packet was
sent to remote real server 2 (172.17.0.17) but due to
-j REDIRECT the daddr is 172.17.0.16.

...

> Debug output on real server 1 after executing the curl command a second
> time:
>
> Jan 24 23:05:45 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
> Jan 24 23:05:45 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns 0
> refcnt 1 weight 100
> Jan 24 23:05:45 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37456
> v:172.17.0.16:50130 d:172.17.0.16:50130 fwd:R s:65276 conn->flags:183
> conn->refcnt:1 dest->refcnt:2
> Jan 24 23:05:45 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37456
> v:172.17.0.16:50130 d:172.17.0.16:50130 conn->flags:101C3 conn->refcnt:2
> Jan 24 23:05:45 pc01 kernel: IPVS: TCP input  [S...]
> 172.17.0.16:50130->172.17.0.2:37456 state: NONE->SYN_RECV conn->refcnt:2
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116

No "ip_vs_xmit.c line 1031" here, packet was delivered
locally with NF_ACCEPT, so it goes to local real server
as per the "d:172.17.0.16" info.

> Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP
> 172.17.0.16:50130->172.17.0.2:37456 hit

...

> Below is an example of good results when connecting directly to port 50000.

So, no -j REDIRECT => no problem?

>  For this scenario I removed port 80 and updated iptables with fwmark for
> port 50000:
> iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> --dport 50000 -j MARK --set-xmark 0x64/0xffffffff
>
> Debug output on real server 1 when not port mapping first test (curl -v
> 'http://172.17.0.24:50000'):
>
> Jan 25 00:19:37 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
> Jan 25 00:19:37 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns 0
> refcnt 1 weight 100
> Jan 25 00:19:37 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:42815
> v:172.17.0.24:50130 d:172.17.0.17:50130 fwd:R s:4 conn->flags:183
> conn->refcnt:1 dest->refcnt:2

Yep, "v:172.17.0.24" means no -j REDIRECT was used.

> Jan 25 00:19:37 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:42815
> v:172.17.0.24:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2
> Jan 25 00:19:37 pc01 kernel: IPVS: TCP input  [S...]
> 172.17.0.17:50130->172.17.0.2:42815 state: NONE->SYN_RECV conn->refcnt:2
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1009
> Jan 25 00:19:37 pc01 kernel: IPVS: new dst 172.17.0.17, src 172.17.0.16,
> refcnt=1
> Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> net/netfilter/ipvs/ip_vs_core.c line 1116
> Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> net/netfilter/ipvs/ip_vs_xmit.c line 1031

Regards

--
Julian Anastasov <ja@ssi.bg>
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Thanks Julian this helps me understand it a lot better. Are you suggesting
using masquerading method? That isn't an ideal option for me unless of
course it is the only option.

To see how much further I could get using DR, I removed the redirect and
added the following to both real servers:
iptables -t nat -A PREROUTING -p tcp -m tcp --destination 172.17.0.24
--dport 80 -j DNAT --to-destination 172.17.0.24:50000

After the DNAT update it now sends packets to the real server 2, however
the port is not what the client expects.

The problem is that the real server 2 receives packets on the port mapped
port 50000 instead of port 80.
Here is debug output when it connects to real server 2:

Jan 28 23:58:57 pc01 kernel: IPVS: lookup service: fwm 100 TCP
172.17.0.24:50000 hit
Jan 28 23:58:57 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
Jan 28 23:58:57 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns 0
refcnt 1 weight 100
Jan 28 23:58:57 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:38193 v:
172.17.0.24:50000 d:172.17.0.17:50000 fwd:R s:4 conn->flags:183
conn->refcnt:1 dest->refcnt:2
Jan 28 23:58:57 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:38193 v:
172.17.0.24:50000 d:172.17.0.17:50000 conn->flags:101C3 conn->refcnt:2
Jan 28 23:58:57 pc01 kernel: IPVS: TCP input [S...] 172.17.0.17:50000->
172.17.0.2:38193 state: NONE->SYN_RECV conn->refcnt:2
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:57 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:22->
172.17.0.2:41024 not hit
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:22->
172.17.0.2:41024 not hit
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:41024->
172.17.0.16:22 not hit
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:41024->
172.17.0.16:22 not hit
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:38193->
172.17.0.24:50000 not hit
Jan 28 23:58:57 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:38193->
172.17.0.24:50000 hit
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:57 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1031
Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116

So we see above that the virtual address is 172.17.0.24:50000 ideally that
would be port 80. Or destination address 172.17.0.17 of the RIP2 should be
port 80.

The following is the tcpdump on real server 2 showing that it is
transmitting to the client with the unexpected port mapping of 50000 (so
the connect hangs):
tcpdump -iany -nn port 80 or port 50000 # (nothing was on the loopback just
bond0)
23:58:59.446423 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [S], seq
1458168690, win 14600, options [mss 1460,sackOK,TS val 447300324 ecr
0,nop,wscale 7], length 0
23:58:59.446423 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [S], seq
1458168690, win 14600, options [mss 1460,sackOK,TS val 447300324 ecr
0,nop,wscale 7], length 0
23:58:59.446484 IP 172.17.0.24.50000 > 172.17.0.2.38193: Flags [S.], seq
59199797, ack 1458168691, win 28960, options [mss 1460,sackOK,TS val
353113117 ecr 447300324,nop,wscale 7], length 0
23:58:59.446487 IP 172.17.0.24.50000 > 172.17.0.2.38193: Flags [S.], seq
59199797, ack 1458168691, win 28960, options [mss 1460,sackOK,TS val
353113117 ecr 447300324,nop,wscale 7], length 0
23:58:59.446839 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [R], seq
1458168691, win 0, length 0
23:58:59.446839 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [R], seq
1458168691, win 0, length 0

Here is debug output when it connects to real server 1:
Jan 28 23:58:47 pc01 kernel: IPVS: lookup service: fwm 100 TCP
172.17.0.24:50000 hit
Jan 28 23:58:47 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
Jan 28 23:58:47 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns 0
refcnt 1 weight 100
Jan 28 23:58:47 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:38192 v:
172.17.0.24:50000 d:172.17.0.16:50000 fwd:R s:65276 conn->flags:183
conn->refcnt:1 dest->refcnt:2
Jan 28 23:58:47 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:38192 v:
172.17.0.24:50000 d:172.17.0.16:50000 conn->flags:101C3 conn->refcnt:2
Jan 28 23:58:47 pc01 kernel: IPVS: TCP input [S...] 172.17.0.16:50000->
172.17.0.2:38192 state: NONE->SYN_RECV conn->refcnt:2
Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
net/netfilter/ipvs/ip_vs_xmit.c line 1009
Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50000->
172.17.0.2:38192 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50000->
172.17.0.2:38192 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: lookup service: fwm 0 TCP
172.17.0.2:38192 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:22->
172.17.0.2:41024 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_out,
net/netfilter/ipvs/ip_vs_core.c line 1116
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:38192->
172.17.0.24:50000 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:22->
172.17.0.2:41024 not hit
Jan 28 23:58:47 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:38192->
172.17.0.24:50000 hit
Jan 28 23:58:47 pc01 kernel: IPVS: TCP input [..A.] 172.17.0.16:50000->
172.17.0.2:38192 state: SYN_RECV->ESTABLISHED conn->refcnt:2

The output of tcpdump shows that the connection is good on real server 1 ->
client:
tcpdump -iany -nn port 80 or port 50000 # (nothing was on the loopback just
bond0)
23:58:47.241028 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [S], seq
2188819762, win 14600, options [mss 1460,sackOK,TS val 447290123 ecr
0,nop,wscale 7], length 0
23:58:47.241028 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [S], seq
2188819762, win 14600, options [mss 1460,sackOK,TS val 447290123 ecr
0,nop,wscale 7], length 0
23:58:47.241128 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [S.], seq
709044054, ack 2188819763, win 28960, options [mss 1460,sackOK,TS val
353091780 ecr 447290123,nop,wscale 7], length 0
23:58:47.241131 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [S.], seq
709044054, ack 2188819763, win 28960, options [mss 1460,sackOK,TS val
353091780 ecr 447290123,nop,wscale 7], length 0
23:58:47.241308 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 1, win
115, options [nop,nop,TS val 447290123 ecr 353091780], length 0
23:58:47.241308 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 1, win
115, options [nop,nop,TS val 447290123 ecr 353091780], length 0
23:58:47.241409 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [P.], seq
1:174, ack 1, win 115, options [nop,nop,TS val 447290123 ecr 353091780],
length 173
23:58:47.241409 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [P.], seq
1:174, ack 1, win 115, options [nop,nop,TS val 447290123 ecr 353091780],
length 173
23:58:47.241443 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 174,
win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0
23:58:47.241446 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 174,
win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0
23:58:47.241569 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [F.], seq 1,
ack 174, win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0
23:58:47.241573 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [F.], seq 1,
ack 174, win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0
23:58:47.241824 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 2, win
115, options [nop,nop,TS val 447290124 ecr 353091780], length 0
23:58:47.241824 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 2, win
115, options [nop,nop,TS val 447290124 ecr 353091780], length 0
23:58:47.241907 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [F.], seq 174,
ack 2, win 115, options [nop,nop,TS val 447290124 ecr 353091780], length 0
23:58:47.241907 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [F.], seq 174,
ack 2, win 115, options [nop,nop,TS val 447290124 ecr 353091780], length 0
23:58:47.241944 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 175,
win 235, options [nop,nop,TS val 353091781 ecr 447290124], length 0
23:58:47.241946 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 175,
win 235, options [nop,nop,TS val 353091781 ecr 447290124], length 0

Thanks again for spending time debugging this.

Jacoby


On Tue, Jan 28, 2014 at 1:16 AM, Julian Anastasov <ja@ssi.bg> wrote:

>
> Hello,
>
> On Mon, 27 Jan 2014, Jacoby Hickerson wrote:
>
> > Certainly and that makes sense, I will consolidate what I've emailed
> before
> > with the additional information here.
> >
> > # PC info: Linux 3.12.5 for real servers 1 and 2, and Linux 3.9.10 for
> the
> > client box.
> >
> > There are 3 boxes total, client box, director/RIP1( real server 1) and
> RIP2
> > (real server 2):
> > - client box:
> > inet 172.17.0.2/16 brd 172.17.255.255 scope global eth1 #CIP
> >
> > - director which is the same as real server 1 (RIP1). The client is on a
> > separate box.
> > inet 172.17.0.16/16 brd 172.17.255.255 scope global bond0
> > #RIP1
> > inet 172.17.0.24/16 brd 172.17.255.255 scope global secondary bond0:2
> #VIP
> >
> > - real server 2 (RIP2)
> > inet 172.17.0.24/32 scope global lo:0 #VIP on
> loopback
> > inet 172.17.0.17/16 brd 172.17.255.255 scope global bond0 #RIP2
> >
> > # ipvs setup on real server 1 (RIP1) only
> > ipvsadm -C
> > ipvsadm -A -f 100 -s rr
> > ipvsadm -a -f 100 -r 172.17.0.16 -w 100
> > ipvsadm -a -f 100 -r 172.17.0.17 -w 100
> >
> > # iptable rules (these rules are set for both real server 1 and real
> server
> > 2)
> > iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> > --dport 80 -j MARK --set-xmark 0x64/0xffffffff
> > iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT
> > --to-ports 50000
> > iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT
> > --to-ports 50000
> >
> > The test I'm conducting is an http get from the client box connecting to
> the
> > VIP:
> > - Issue the following command on the client box:
> > curl -v 'http://172.17.0.24'
> >
> > On both real servers there is an nginx webserver listening on port 50000
> >
> > I also turned on debugging and ran the curl command with port mapping
> using
> > level 12 debug (this is output when the issue occurs of no load
> balancing).
> > Debug output on real server 1 after executing the curl command the first
> > time:
> >
> > Jan 24 23:05:44 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns
> 0
> > refcnt 1 weight 100
>
> The debug output was very helpful.
>
> Looks like -j REDIRECT combined with DR is a bad idea.
> When packet comes to IPVS the daddr is already 172.17.0.16,
> see the "v:172.17.0.16" line below:
>
> > Jan 24 23:05:44 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37455
> > v:172.17.0.16:50130 d:172.17.0.17:50130 fwd:R s:65276 conn->flags:183
> > conn->refcnt:1 dest->refcnt:2
>
> The remote real server 2 is not configured for
> such VIP (172.17.0.16). I don't remember when was
> -j REDIRECT used for IPVS setups, may be for transparent
> proxy setups.
>
> Why not just use NAT method for both servers
> without any REDIRECT rules?
>
> Even -j DNAT --to-destination VIP:50000 has better
> chance to use VIP instead of first IP.
>
> > Jan 24 23:05:44 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37455
> > v:172.17.0.16:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2
> > Jan 24 23:05:44 pc01 kernel: IPVS: TCP input [S...]
> > 172.17.0.17:50130->172.17.0.2:37455 state: NONE->SYN_RECV conn->refcnt:2
> > Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> > net/netfilter/ipvs/ip_vs_xmit.c line 1009
> > Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out,
> > net/netfilter/ipvs/ip_vs_core.c line 1116
> > Jan 24 23:05:44 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> > net/netfilter/ipvs/ip_vs_xmit.c line 1031
>
> Above "ip_vs_xmit.c line 1031" means packet was
> sent to remote real server 2 (172.17.0.17) but due to
> -j REDIRECT the daddr is 172.17.0.16.
>
> ...
>
> > Debug output on real server 1 after executing the curl command a second
> > time:
> >
> > Jan 24 23:05:45 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
> > Jan 24 23:05:45 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns
> 0
> > refcnt 1 weight 100
> > Jan 24 23:05:45 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37456
> > v:172.17.0.16:50130 d:172.17.0.16:50130 fwd:R s:65276 conn->flags:183
> > conn->refcnt:1 dest->refcnt:2
> > Jan 24 23:05:45 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37456
> > v:172.17.0.16:50130 d:172.17.0.16:50130 conn->flags:101C3 conn->refcnt:2
> > Jan 24 23:05:45 pc01 kernel: IPVS: TCP input [S...]
> > 172.17.0.16:50130->172.17.0.2:37456 state: NONE->SYN_RECV conn->refcnt:2
> > Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> > net/netfilter/ipvs/ip_vs_xmit.c line 1009
> > Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out,
> > net/netfilter/ipvs/ip_vs_core.c line 1116
>
> No "ip_vs_xmit.c line 1031" here, packet was delivered
> locally with NF_ACCEPT, so it goes to local real server
> as per the "d:172.17.0.16" info.
>
> > Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP
> > 172.17.0.16:50130->172.17.0.2:37456 hit
>
> ...
>
> > Below is an example of good results when connecting directly to port
> 50000.
>
> So, no -j REDIRECT => no problem?
>
> > For this scenario I removed port 80 and updated iptables with fwmark for
> > port 50000:
> > iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp
> > --dport 50000 -j MARK --set-xmark 0x64/0xffffffff
> >
> > Debug output on real server 1 when not port mapping first test (curl -v
> > 'http://172.17.0.24:50000'):
> >
> > Jan 25 00:19:37 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling...
> > Jan 25 00:19:37 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns
> 0
> > refcnt 1 weight 100
> > Jan 25 00:19:37 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:42815
> > v:172.17.0.24:50130 d:172.17.0.17:50130 fwd:R s:4 conn->flags:183
> > conn->refcnt:1 dest->refcnt:2
>
> Yep, "v:172.17.0.24" means no -j REDIRECT was used.
>
> > Jan 25 00:19:37 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:42815
> > v:172.17.0.24:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2
> > Jan 25 00:19:37 pc01 kernel: IPVS: TCP input [S...]
> > 172.17.0.17:50130->172.17.0.2:42815 state: NONE->SYN_RECV conn->refcnt:2
> > Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit,
> > net/netfilter/ipvs/ip_vs_xmit.c line 1009
> > Jan 25 00:19:37 pc01 kernel: IPVS: new dst 172.17.0.17, src 172.17.0.16,
> > refcnt=1
> > Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out,
> > net/netfilter/ipvs/ip_vs_core.c line 1116
> > Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit,
> > net/netfilter/ipvs/ip_vs_xmit.c line 1031
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Hello,

On Tue, 28 Jan 2014, Jacoby Hickerson wrote:

> Thanks Julian this helps me understand it a lot better.  Are you suggesting
> using masquerading method? That isn't an ideal option for me unless of
> course it is the only option.
> To see how much further I could get using DR, I removed the redirect and
> added the following to both real servers:
> iptables -t nat -A PREROUTING -p tcp -m tcp --destination 172.17.0.24
> --dport 80 -j DNAT --to-destination 172.17.0.24:50000

You do not need REDIRECT rule on the director, use
masquerading method for the local RIP1 and DR method for
RIP2. Use REDIRECT on real server 2. For example:

# DNAT by IPVS: VIP:80 -> RIP1:50000
ipvsadm -a -f 100 -r 172.17.0.16:50000 -w 100 -m

# DR as before: VIP:80 sent as VIP:80 to nexthop 172.17.0.17
ipvsadm -a -f 100 -r 172.17.0.17 -w 100

Then add REDIRECT or the above DNAT only on
real server 2 (172.17.0.17). By this way traffic from
real server 2 will not go to client via director.

As you notice, the problem you are now facing
is that from client point of view, the remote port is 80
and real server 2 does not alter it in replies.
Real server 2 can return only with the port it
received. That is why the DNAT/REDIRECT for port
should happen on the real server and not on director.
Otherwise, we have to send replies via director for
proper assignment of port but it is something we
try to avoid.

> After the DNAT update it now sends packets to the real server 2, however the
> port is not what the client expects. 
>
> The problem is that the real server 2 receives packets on the port mapped
> port 50000 instead of port 80.

Regards

--
Julian Anastasov <ja@ssi.bg>
Re: [lvs-users] Port mapping with LVS-DR using fwmark [ In reply to ]
Ah I see. The ideal solution would be to have a similar setup on both
servers because any of these servers could fail-over, so the dynamic
setup/modifications would be more complex in a fail-over condition. It
would be interesting if the director maintained the original VIP port
assignment or there were a configuration item/utility setup to indicate.
But perhaps that would be inefficient.

Thanks again for your help!

Jacoby


On Tue, Jan 28, 2014 at 11:59 PM, Julian Anastasov <ja@ssi.bg> wrote:

>
> Hello,
>
> On Tue, 28 Jan 2014, Jacoby Hickerson wrote:
>
> > Thanks Julian this helps me understand it a lot better. Are you
> suggesting
> > using masquerading method? That isn't an ideal option for me unless of
> > course it is the only option.
> > To see how much further I could get using DR, I removed the redirect and
> > added the following to both real servers:
> > iptables -t nat -A PREROUTING -p tcp -m tcp --destination 172.17.0.24
> > --dport 80 -j DNAT --to-destination 172.17.0.24:50000
>
> You do not need REDIRECT rule on the director, use
> masquerading method for the local RIP1 and DR method for
> RIP2. Use REDIRECT on real server 2. For example:
>
> # DNAT by IPVS: VIP:80 -> RIP1:50000
> ipvsadm -a -f 100 -r 172.17.0.16:50000 -w 100 -m
>
> # DR as before: VIP:80 sent as VIP:80 to nexthop 172.17.0.17
> ipvsadm -a -f 100 -r 172.17.0.17 -w 100
>
> Then add REDIRECT or the above DNAT only on
> real server 2 (172.17.0.17). By this way traffic from
> real server 2 will not go to client via director.
>
> As you notice, the problem you are now facing
> is that from client point of view, the remote port is 80
> and real server 2 does not alter it in replies.
> Real server 2 can return only with the port it
> received. That is why the DNAT/REDIRECT for port
> should happen on the real server and not on director.
> Otherwise, we have to send replies via director for
> proper assignment of port but it is something we
> try to avoid.
>
> > After the DNAT update it now sends packets to the real server 2, however
> the
> > port is not what the client expects.
> >
> > The problem is that the real server 2 receives packets on the port mapped
> > port 50000 instead of port 80.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users