Mailing List Archive

2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17
Hello,

After reading the "Never happen" comment in the code, I thought it
wasn't too silly to mention that it apparently does happen. Never saw
the message before, hence this mail. This happened on a machine
doing SNAT for another pc, so conntrack may be involved.

The three errors happen within 1.5 seconds of each other:

[17834.377955] ipv4_get_l4proto: Frag of proto 17
[17835.358985] ipv4_get_l4proto: Frag of proto 17
[17835.872457] ipv4_get_l4proto: Frag of proto 17

As this seems to be an incident, I've no idea how to debug it, nor
whether it's worth debugging. If this can be caused by a peer
sending a bad packet I'd ignore this report.

If this is serious enough to be reported, perhaps it should be a BUG?

Greetings,

Indan


P.S. Netfilter's bugzilla gives an error when the search function is used.
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
Indan Zupancic wrote:
> Hello,
>
> After reading the "Never happen" comment in the code, I thought it
> wasn't too silly to mention that it apparently does happen. Never saw
> the message before, hence this mail. This happened on a machine
> doing SNAT for another pc, so conntrack may be involved.
>
> The three errors happen within 1.5 seconds of each other:
>
> [17834.377955] ipv4_get_l4proto: Frag of proto 17
> [17835.358985] ipv4_get_l4proto: Frag of proto 17
> [17835.872457] ipv4_get_l4proto: Frag of proto 17
>
> As this seems to be an incident, I've no idea how to debug it, nor
> whether it's worth debugging. If this can be caused by a peer
> sending a bad packet I'd ignore this report.
>
> If this is serious enough to be reported, perhaps it should be a BUG?

It should not be serious, the packets are simply dropped.

Did it perhaps happen directly after nf_conntrack_ipv4 module load?
Otherwise I think it might happen on loopback if you manually send
to large packets or possibly with NOTRACK. Any chance you're doing
anything of that?
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
On Wed, July 25, 2007 17:19, Patrick McHardy wrote:
> Indan Zupancic wrote:
>> Hello,
>>
>> After reading the "Never happen" comment in the code, I thought it
>> wasn't too silly to mention that it apparently does happen. Never saw
>> the message before, hence this mail. This happened on a machine
>> doing SNAT for another pc, so conntrack may be involved.
>>
>> The three errors happen within 1.5 seconds of each other:
>>
>> [17834.377955] ipv4_get_l4proto: Frag of proto 17
>> [17835.358985] ipv4_get_l4proto: Frag of proto 17
>> [17835.872457] ipv4_get_l4proto: Frag of proto 17
>>
>> As this seems to be an incident, I've no idea how to debug it, nor
>> whether it's worth debugging. If this can be caused by a peer
>> sending a bad packet I'd ignore this report.
>>
>> If this is serious enough to be reported, perhaps it should be a BUG?
>
> It should not be serious, the packets are simply dropped.

I meant serious in the sense of there being a bug in the code or not.
If it indicates a bug then it might be better to make it a BUG or
something else which gives more feedback to make it easier to track
down. Right now it's not clear where the packet came from and goes to.

> Did it perhaps happen directly after nf_conntrack_ipv4 module load?

No, it was loaded for at least a few hours.

> Otherwise I think it might happen on loopback if you manually send
> to large packets or possibly with NOTRACK. Any chance you're doing
> anything of that?

No idea what NOTRACK is, I've quite simple iptables rules and didn't do
anything fancy at the time, so I don't think so.

As far as I can remember I was just browsing at the time, at least doing
nothing that causes much UDP activity, DNS only. That said, I do run
dnsmasq locally, so there is some loopback UDP activity, and it's also the
DNS server for the SNATed host. Fair chance that there was more UDP
activity from that host though (Quake3).

If it happens again I'll add extra debugging stuff and hope that it'll
happen again. Currently it seems it's too vague to debug.

Greetings,

Indan
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
Indan Zupancic wrote:
> On Wed, July 25, 2007 17:19, Patrick McHardy wrote:
>
>>Indan Zupancic wrote:
>>
>>>Hello,
>>>
>>>After reading the "Never happen" comment in the code, I thought it
>>>wasn't too silly to mention that it apparently does happen. Never saw
>>>the message before, hence this mail. This happened on a machine
>>>doing SNAT for another pc, so conntrack may be involved.
>>>
>>>The three errors happen within 1.5 seconds of each other:
>>>
>>>[17834.377955] ipv4_get_l4proto: Frag of proto 17
>>>[17835.358985] ipv4_get_l4proto: Frag of proto 17
>>>[17835.872457] ipv4_get_l4proto: Frag of proto 17
>>>
>>>As this seems to be an incident, I've no idea how to debug it, nor
>>>whether it's worth debugging. If this can be caused by a peer
>>>sending a bad packet I'd ignore this report.
>>>
>>>If this is serious enough to be reported, perhaps it should be a BUG?
>>
>>It should not be serious, the packets are simply dropped.
>
>
> I meant serious in the sense of there being a bug in the code or not.
> If it indicates a bug then it might be better to make it a BUG or
> something else which gives more feedback to make it easier to track
> down. Right now it's not clear where the packet came from and goes to.


There is only one possible path, crashing peoples machines won't help :)
It does indicate a bug, but not a serious one.

>>Did it perhaps happen directly after nf_conntrack_ipv4 module load?
>
>
> No, it was loaded for at least a few hours.
>
>
>>Otherwise I think it might happen on loopback if you manually send
>>to large packets or possibly with NOTRACK. Any chance you're doing
>>anything of that?
>
>
> No idea what NOTRACK is, I've quite simple iptables rules and didn't do
> anything fancy at the time, so I don't think so.
>
> As far as I can remember I was just browsing at the time, at least doing
> nothing that causes much UDP activity, DNS only. That said, I do run
> dnsmasq locally, so there is some loopback UDP activity, and it's also the
> DNS server for the SNATed host. Fair chance that there was more UDP
> activity from that host though (Quake3).
>
> If it happens again I'll add extra debugging stuff and hope that it'll
> happen again. Currently it seems it's too vague to debug.


Thanks. One more question: are you running nfs?
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
On Wed, July 25, 2007 18:17, Patrick McHardy wrote:
> Indan Zupancic wrote:
>> On Wed, July 25, 2007 17:19, Patrick McHardy wrote:
>>
>>>Indan Zupancic wrote:
>>>
>>>>Hello,
>>>>
>>>>After reading the "Never happen" comment in the code, I thought it
>>>>wasn't too silly to mention that it apparently does happen. Never saw
>>>>the message before, hence this mail. This happened on a machine
>>>>doing SNAT for another pc, so conntrack may be involved.
>>>>
>>>>The three errors happen within 1.5 seconds of each other:
>>>>
>>>>[17834.377955] ipv4_get_l4proto: Frag of proto 17
>>>>[17835.358985] ipv4_get_l4proto: Frag of proto 17
>>>>[17835.872457] ipv4_get_l4proto: Frag of proto 17
>>>>
>>>>As this seems to be an incident, I've no idea how to debug it, nor
>>>>whether it's worth debugging. If this can be caused by a peer
>>>>sending a bad packet I'd ignore this report.
>>>>
>>>>If this is serious enough to be reported, perhaps it should be a BUG?
>>>
>>>It should not be serious, the packets are simply dropped.
>>
>>
>> I meant serious in the sense of there being a bug in the code or not.
>> If it indicates a bug then it might be better to make it a BUG or
>> something else which gives more feedback to make it easier to track
>> down. Right now it's not clear where the packet came from and goes to.
>
>
> There is only one possible path, crashing peoples machines won't help :)
> It does indicate a bug, but not a serious one.

Ah, yes, I just wanted the stacktrace, not the panic too. ;-)

Perhaps there's only one codepath, but there are multiple ones for the
packet, and knowing which packets caused it seems the first step to
finding a test case to reproduce the problem.


>>>Did it perhaps happen directly after nf_conntrack_ipv4 module load?
>>
>>
>> No, it was loaded for at least a few hours.
>>
>>
>>>Otherwise I think it might happen on loopback if you manually send
>>>to large packets or possibly with NOTRACK. Any chance you're doing
>>>anything of that?
>>
>>
>> No idea what NOTRACK is, I've quite simple iptables rules and didn't do
>> anything fancy at the time, so I don't think so.
>>
>> As far as I can remember I was just browsing at the time, at least doing
>> nothing that causes much UDP activity, DNS only. That said, I do run
>> dnsmasq locally, so there is some loopback UDP activity, and it's also the
>> DNS server for the SNATed host. Fair chance that there was more UDP
>> activity from that host though (Quake3).
>>
>> If it happens again I'll add extra debugging stuff and hope that it'll
>> happen again. Currently it seems it's too vague to debug.
>
>
> Thanks. One more question: are you running nfs?

Yes, but only for the host, I'm not using it locally. And it's v3, so should be
using TCP. (rpc.mountd --no-nfs-version 1 --no-nfs-version 2.)

Regards,

Indan
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
On Wed, 25 Jul 2007, Indan Zupancic wrote:

>> Thanks. One more question: are you running nfs?
>
> Yes, but only for the host, I'm not using it locally. And it's v3, so should be
> using TCP. (rpc.mountd --no-nfs-version 1 --no-nfs-version 2.)

I think it still uses UDP as default unless you specify tcp as a mount
option.

/Martin
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
On Wed, July 25, 2007 19:13, Martin Josefsson wrote:
> On Wed, 25 Jul 2007, Indan Zupancic wrote:
>
>>> Thanks. One more question: are you running nfs?
>>
>> Yes, but only for the host, I'm not using it locally. And it's v3, so should be
>> using TCP. (rpc.mountd --no-nfs-version 1 --no-nfs-version 2.)
>
> I think it still uses UDP as default unless you specify tcp as a mount
> option.

You're probably right, and think it the host isn't using the tcp option.

But I asked, and at the time the errors popped up NFS wasn't mounted.

Greetings,

Indan
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
From: "Indan Zupancic" <indan@nul.nu>
Date: Wed, 25 Jul 2007 19:41:45 +0200 (CEST)

> On Wed, July 25, 2007 19:13, Martin Josefsson wrote:
> > On Wed, 25 Jul 2007, Indan Zupancic wrote:
> >
> >>> Thanks. One more question: are you running nfs?
> >>
> >> Yes, but only for the host, I'm not using it locally. And it's v3, so should be
> >> using TCP. (rpc.mountd --no-nfs-version 1 --no-nfs-version 2.)
> >
> > I think it still uses UDP as default unless you specify tcp as a mount
> > option.
>
> You're probably right, and think it the host isn't using the tcp option.
>
> But I asked, and at the time the errors popped up NFS wasn't mounted.

I have one more question. Does that node receive ICMP error including
fragmented packet in its body ? I made ICMP error handling call
ipv4_get_l4proto in 2.6.23 tree.

-- Yasuyuki Kozakai
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
On Thu, July 26, 2007 03:27, Yasuyuki KOZAKAI wrote:
> From: "Indan Zupancic" <indan@nul.nu>
> Date: Wed, 25 Jul 2007 19:41:45 +0200 (CEST)
>
>> On Wed, July 25, 2007 19:13, Martin Josefsson wrote:
>> > On Wed, 25 Jul 2007, Indan Zupancic wrote:
>> >
>> >>> Thanks. One more question: are you running nfs?
>> >>
>> >> Yes, but only for the host, I'm not using it locally. And it's v3, so should be
>> >> using TCP. (rpc.mountd --no-nfs-version 1 --no-nfs-version 2.)
>> >
>> > I think it still uses UDP as default unless you specify tcp as a mount
>> > option.
>>
>> You're probably right, and think it the host isn't using the tcp option.
>>
>> But I asked, and at the time the errors popped up NFS wasn't mounted.
>
> I have one more question. Does that node receive ICMP error including
> fragmented packet in its body ? I made ICMP error handling call
> ipv4_get_l4proto in 2.6.23 tree.

I don't know, how can I check that?

Greetings,

Indan
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
Indan Zupancic wrote:
> On Thu, July 26, 2007 03:27, Yasuyuki KOZAKAI wrote:
>
>>I have one more question. Does that node receive ICMP error including
>>fragmented packet in its body ? I made ICMP error handling call
>>ipv4_get_l4proto in 2.6.23 tree.


Good point, I initially missed that this affects the current -rc.

> I don't know, how can I check that?


iptables -t raw -I PREROUTING \
-m icmp --icmp-type destination-unreachable -j LOG

should log the packets.
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
On Thu, July 26, 2007 11:50, Patrick McHardy wrote:
> Indan Zupancic wrote:
>> On Thu, July 26, 2007 03:27, Yasuyuki KOZAKAI wrote:
>>
>>>I have one more question. Does that node receive ICMP error including
>>>fragmented packet in its body ? I made ICMP error handling call
>>>ipv4_get_l4proto in 2.6.23 tree.
>
>
> Good point, I initially missed that this affects the current -rc.
>
>> I don't know, how can I check that?
>
>
> iptables -t raw -I PREROUTING \
> -m icmp --icmp-type destination-unreachable -j LOG
>
> should log the packets.

So with this when I get a Frag of proto it should also log an ICMP error?

Considering that the errors happened with a near exact 1 second interval
and a 0.5s interval I think it's highly likely that it were retry packets to
an unreachable host. But why is the proto UDP and not ICMP?

Greetings,

Indan
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
Indan Zupancic wrote:
> On Thu, July 26, 2007 11:50, Patrick McHardy wrote:
>
>>iptables -t raw -I PREROUTING \
>> -m icmp --icmp-type destination-unreachable -j LOG
>>
>>should log the packets.
>
>
> So with this when I get a Frag of proto it should also log an ICMP error?


Exactly.

> Considering that the errors happened with a near exact 1 second interval
> and a 0.5s interval I think it's highly likely that it were retry packets to
> an unreachable host. But why is the proto UDP and not ICMP?


Its the inner packet that is parsed by nf_ct_get_tuplepr.
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
On Thu, July 26, 2007 12:22, Patrick McHardy wrote:
> Indan Zupancic wrote:
>> On Thu, July 26, 2007 11:50, Patrick McHardy wrote:
>>
>>>iptables -t raw -I PREROUTING \
>>> -m icmp --icmp-type destination-unreachable -j LOG
>>>
>>>should log the packets.
>>
>>
>> So with this when I get a Frag of proto it should also log an ICMP error?
>
>
> Exactly.
>
>> Considering that the errors happened with a near exact 1 second interval
>> and a 0.5s interval I think it's highly likely that it were retry packets to
>> an unreachable host. But why is the proto UDP and not ICMP?
>
>
> Its the inner packet that is parsed by nf_ct_get_tuplepr.
>

Reading the comment in icmp.c iph->frag_off & htons(IP_OFFSET)
being true means that it's a fragment, but not the first one.

So what's happening is that the host sends a big UDP packet, it gets
fragmentated, but never reaches its destination. ICMP error packets
are generated. Conntrack drops the latter ones thanks to the check in
ipv4_get_l4proto.

So the question is whether those latter ICMP packets should be forwarded
or not. If not, the code is fine and the warning message could be removed.
If they should, then it might be hard for the current conntrack code know
where to send the packet, as the UDP header is missing.

Greetings,

Indan
Re: 2.6.23-rc1: ipv4_get_l4proto: Frag of proto 17 [ In reply to ]
Indan Zupancic wrote:
>
> Reading the comment in icmp.c iph->frag_off & htons(IP_OFFSET)
> being true means that it's a fragment, but not the first one.
>
> So what's happening is that the host sends a big UDP packet, it gets
> fragmentated, but never reaches its destination. ICMP error packets
> are generated. Conntrack drops the latter ones thanks to the check in
> ipv4_get_l4proto.
>
> So the question is whether those latter ICMP packets should be forwarded
> or not. If not, the code is fine and the warning message could be removed.
> If they should, then it might be hard for the current conntrack code know
> where to send the packet, as the UDP header is missing.

Yes, we can't associate them with the original connection.
We should catch this case in ICMP tracking though I think
instead of removing the message.