Mailing List Archive

IPv6-related (?) Bind issue
Hello all,

we've encountered a weird problem on our dual-stack (anycast) resolvers
and I'm wondering if anyone else has experienced anything similar.
Basically, we're getting many SERVFAIL responses for domains not in
cache. The weird part: if the domain does not exist in the cache, a
SERVFAIL response is produced (not all of the times) without the
resolver querying the auth NS (no outgoing communication is attempted).

More DNS-specific info about the issue (and the setup):
https://lists.isc.org/pipermail/bind-users/2014-March/092706.html
https://lists.dns-oarc.net/pipermail/dns-operations/2014-March/011385.html

I'm posting here because I'm quite convinced that the issue is IPv6-related

cheers,
Yannis
Re: IPv6-related (?) Bind issue [ In reply to ]
Re: IPv6-related (?) Bind issue [ In reply to ]
On Thu, Mar 06, 2014 at 11:00:28AM +0200, Yannis Nikolopoulos wrote:
> we've encountered a weird problem on our dual-stack (anycast) resolvers
> and I'm wondering if anyone else has experienced anything similar.
> Basically, we're getting many SERVFAIL responses for domains not in
> cache. The weird part: if the domain does not exist in the cache, a
> SERVFAIL response is produced (not all of the times) without the
> resolver querying the auth NS (no outgoing communication is attempted).

If you really thing this might be a kernel issue, please record the number of
failed syscalls during the time this problem exists: perf script
failed-syscalls-by-pid -p or strace -c.

Further please record network packets drop in the stack via perf script
net_dropmonitor.

This might give a hint.

Thanks,

Hannes
Re: IPv6-related (?) Bind issue [ In reply to ]
On Thu, Mar 06, 2014 at 10:28:22AM +0100, Hannes Frederic Sowa wrote:
> On Thu, Mar 06, 2014 at 11:00:28AM +0200, Yannis Nikolopoulos wrote:
> > we've encountered a weird problem on our dual-stack (anycast) resolvers
> > and I'm wondering if anyone else has experienced anything similar.
> > Basically, we're getting many SERVFAIL responses for domains not in
> > cache. The weird part: if the domain does not exist in the cache, a
> > SERVFAIL response is produced (not all of the times) without the
> > resolver querying the auth NS (no outgoing communication is attempted).
>
> If you really thing this might be a kernel issue, please record the number of
> failed syscalls during the time this problem exists: perf script
> failed-syscalls-by-pid -p or strace -c.
>
> Further please record network packets drop in the stack via perf script
> net_dropmonitor.
>
> This might give a hint.

Regarding anycast addresses you can check cat /proc/net/anycast6 if they get
instantiated (this only happens if you have forwarding enabled for the subnet
defined addresses) or if you have a program which does IPV6_JOIN_ANYCAST
setsockopt on a socket.

Old kernels don't allow using anycast addresses as source address. This was
recently changed in the linux kernel.

But I don't suspect this to be the problem.

Bye,

Hannes
Re: IPv6-related (?) Bind issue [ In reply to ]
On 03/06/2014 11:07 AM, Jeroen Massar wrote:
> On 2014-03-06 10:00 , Yannis Nikolopoulos wrote:
> [..]
>> https://lists.isc.org/pipermail/bind-users/2014-March/092706.html
>> https://lists.dns-oarc.net/pipermail/dns-operations/2014-March/011385.html
> And asking on yet-another list, did you do the /64 change:

yes we did (although we didn't expect much) and no, it didn't solve the
problem. Actually, it seemed like the problem worsened (more SERVFAILs)
when we switched from /127 to /64 (using ::a and ::b) and that, frankly,
made no sense to me

cheers,
Yannis
> https://lists.dns-oarc.net/pipermail/dns-operations/2014-March/011387.html
>
> ?
>
> Greets,
> Jeroen
>
Re: IPv6-related (?) Bind issue [ In reply to ]
On 2014-03-06 10:53 , Yannis Nikolopoulos wrote:
> On 03/06/2014 11:07 AM, Jeroen Massar wrote:
>> On 2014-03-06 10:00 , Yannis Nikolopoulos wrote:
>> [..]
>>> https://lists.isc.org/pipermail/bind-users/2014-March/092706.html
>>> https://lists.dns-oarc.net/pipermail/dns-operations/2014-March/011385.html
>>>
>> And asking on yet-another list, did you do the /64 change:
>
> yes we did (although we didn't expect much) and no, it didn't solve the
> problem. Actually, it seemed like the problem worsened (more SERVFAILs)
> when we switched from /127 to /64 (using ::a and ::b) and that, frankly,
> made no sense to me

Then next step is monitoring if your addresses are still on the
interface (ip mon ...)

Greets,
Jeroen