Mailing List Archive

[Bug 488] Failed name server leads to unroutable address error
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488





------- Comment #1 from ph10@hermes.cam.ac.uk 2007-03-22 09:36 -------
On Wed, 21 Mar 2007, marc@perkel.com wrote:

> I think this is a bug. A valid domain where the name servers are unreachable
> results in an error "unroutable address" rather than retrying.

Please post some evidence - for example, the output from

exim -d -bt something@a.bad.domain

Otherwise there's no way I or anybody else can test this or figure out
why it is behaving like that. This is likely to involve your DNS
resolver as well, so better post also the version of Exim, and what
operating system and release you saw this on.

Philip

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 488] Failed name server leads to unroutable address error [ In reply to ]
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488





------- Comment #2 from graeme@graemef.net 2007-03-22 10:57 -------
On 21/03/2007 21:43, marc@perkel.com wrote:
> I think this is a bug. A valid domain where the name servers are unreachable
> results in an error "unroutable address" rather than retrying. I ran into this
> situation where due to a network routing problem my name servers couldn't reach
> the domains name servers and rather than retrying it failed as if the domain
> didn't exist.

I don't believe this is a bug within Exim itself. It's an
interoperability problem, for sure, but not a bug - and it applies to
all applications which use DNS.

Let me explain:

If a domain - any domain - has nameservers A and B, and your machine
running Exim (machine C) cannot reach those servers, how does the
resolver on machine C (or any application running on it) know that the
domain is "valid"?

A common way of forcibly expiring domains (say, for repeated non-payment
or abuse reasons) within the ISP or hosting world is to simply delete
the zone from the authoritative nameservers. It sure gets customers'
attention when their domains disappear, I can tell you :)

Anyway: at that point, the root name servers or TLD nameservers for the
domain will be referring queries to the authoritative servers, which no
longer act authoritatively for it and return NXDOMAIN. At this point,
whatever application is making the query will bomb out with (in Exim's
case) an error of the form "unrouteable address".

Extending the problem outwards to your case (which doesn't get NXDOMAIN
back from the authoritative servers), there is _no way_ for the machine
C to know whether the lack of response is because:

a) the authoritative nameservers have been permanently switched off
b) the authoritative nameservers have firewalled you out
c) the authoritative nameservers have a problem meaning that their
nameserver software isn't running
d) some odd network problems are causing your queries to either not
reach the authoritative nameservers, or the responses to not get back
e) some other problem (there are many possibilities)

So, at this point, how does the application on machine C determine
whether the domain is "valid" or not? The only way it can respond is
(anthropomorphically speaking) with a shrug of the shoulders and an "I
can't be bothered carrying on with this, I cannot find the thing you're
looking for" style error.

Network problems can be hell, especially if they segment things the way
you saw.

Graeme

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 488] Failed name server leads to unroutable address error [ In reply to ]
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488





------- Comment #3 from exim@lists.colondot.net 2007-03-22 11:33 -------
Graeme,

Surely that's what a SERVFAIL response is for (either generated by the
referring server, or your local resolver). NXDOMAIN is a permanent failure, and
SERVFAIL should be a temporary failure.

Marc: at this point, you really need to post some debugging output showing what
happened during the delivery, so we can suggest some things to try from
commands like dig.

Cheers

MBM

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 488] Failed name server leads to unroutable address error [ In reply to ]
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488





------- Comment #4 from peter@bowyer.org 2007-03-22 11:40 -------
(In reply to comment #3)

The title of the bug is 'failed name server...', which might mean one of
several different things, but the text refers to authoratative name servers
being unreachable - which is yet another different condition. Exim only knows
what the local resolver tells it, so Marc needs to create failure conditions
upstream of his resolver in order to simulate the bug. It might also depend on
the configuration of any local nameserver upstream of the resolver.

Ah, that's what GF has already said, pretty much. At least we agree...

-Peter

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 488] Failed name server leads to unroutable address error [ In reply to ]
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488





------- Comment #5 from marc@perkel.com 2007-03-22 13:19 -------
OK - this might be a way to simulate the problem. Set your /etc/resolv.conf in
the Exim server to some local caching name server that you control. Pick a test
domain and look up the name server BS records. hen on the name server machine
run this:

iptables -v -I INPUT -s ns1.domain.com -j DROP
iptables -v -I INPUT -s ns2.domain.com -j DROP

I hope this code is right. The idea being that you block your resolving name
server from accessing the name servers of the domain that you are emailing.
That's what happened to me. Due to the routing problem my nameservers couldn't
rote to the name servers of the destination and because it was cut off mail
bounced instantly as unroutable.

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 488] Failed name server leads to unroutable address error [ In reply to ]
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488





------- Comment #6 from graeme@graemef.net 2007-03-22 17:57 -------
On Thu, 2007-03-22 at 13:19 +0000, marc@perkel.com wrote:
> OK - this might be a way to simulate the problem. Set your /etc/resolv.conf in
> the Exim server to some local caching name server that you control. Pick a test
> domain and look up the name server BS records. hen on the name server machine
> run this:
>
> iptables -v -I INPUT -s ns1.domain.com -j DROP
> iptables -v -I INPUT -s ns2.domain.com -j DROP
>
> I hope this code is right. The idea being that you block your resolving name
> server from accessing the name servers of the domain that you are emailing.
> That's what happened to me. Due to the routing problem my nameservers couldn't
> rote to the name servers of the destination and because it was cut off mail
> bounced instantly as unroutable.

And quite right too. If the lookups for the domain timeout completely,
how do your resolvers (and by extension Exim, or any other application
relying on DNS) know that it's a network problem?

If you can't lookup the records for a domain *regardless of the reason*,
the domain becomes unrouteable. Hence the error condition.

DNS is, after all, the application-layer glue that holds stuff like SMTP
together. Without it, we're back in the dark ages!

I still don't think this is a bug.

Graeme

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 488] Failed name server leads to unroutable address error [ In reply to ]
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488





------- Comment #7 from marc@perkel.com 2007-03-22 18:16 -------


graeme@graemef.net wrote:
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
>
> http://www.exim.org/bugzilla/show_bug.cgi?id=488
>
>
>
>
>
> ------- Comment #6 from graeme@graemef.net 2007-03-22 17:57 -------
> On Thu, 2007-03-22 at 13:19 +0000, marc@perkel.com wrote:
>
>> OK - this might be a way to simulate the problem. Set your /etc/resolv.conf in
>> the Exim server to some local caching name server that you control. Pick a test
>> domain and look up the name server BS records. hen on the name server machine
>> run this:
>>
>> iptables -v -I INPUT -s ns1.domain.com -j DROP
>> iptables -v -I INPUT -s ns2.domain.com -j DROP
>>
>> I hope this code is right. The idea being that you block your resolving name
>> server from accessing the name servers of the domain that you are emailing.
>> That's what happened to me. Due to the routing problem my nameservers couldn't
>> rote to the name servers of the destination and because it was cut off mail
>> bounced instantly as unroutable.
>>
>
> And quite right too. If the lookups for the domain timeout completely,
> how do your resolvers (and by extension Exim, or any other application
> relying on DNS) know that it's a network problem?
>
> If you can't lookup the records for a domain *regardless of the reason*,
> the domain becomes unrouteable. Hence the error condition.
>
> DNS is, after all, the application-layer glue that holds stuff like SMTP
> together. Without it, we're back in the dark ages!
>
> I still don't think this is a bug.
>
> Graeme
>

I'm not sure that I would say that it's not a bug. There might not be an
easy way to resolve it. It might be something we have to live with if we
can't distinguish between a valid domain with DNS problems and an
invalid domain.

But - I thought I'd bring it to your attention in case it could be fixed.

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 488] Failed name server leads to unroutable address error [ In reply to ]
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488





------- Comment #8 from exim@lists.colondot.net 2007-03-23 08:38 -------
Marc,

To be honest, without seeing exactly the condition you're describing, and some
sensible debugging output (including the resolve debugging), you're going to
have some task trying to get people here to accept that a bug exists.

SERVFAIL should be generating a defer, and IMLE it does, but...

Cheers

MBM

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 488] Failed name server leads to unroutable address error [ In reply to ]
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488





------- Comment #9 from graeme@graemef.net 2007-03-23 11:46 -------
On 23/03/2007 08:38, exim@lists.colondot.net wrote:
> SERVFAIL should be generating a defer, and IMLE it does, but...

Following Marc's suggestion about setting restrictions (-j DROP) on the
configured resolver - the local one - to simulate unreachable hosts
which timeout, I get this:

11:04:48 32081 changed uid/gid: forcing real = effective
11:04:48 32081 uid=0 gid=0 pid=32081
11:04:48 32081 auxiliary group list: <none>
11:04:48 32081 configuration file is /etc/exim.conf
11:04:48 32081 log selector = 040d99d8
11:04:48 32081 trusted user
11:04:48 32081 admin user
11:04:48 32081 originator: uid=0 gid=0 login=root name=root
11:04:48 32081 sender address = root@server.graemef.net
11:04:48 32081 Address testing: uid=0 gid=12 euid=0 egid=12
11:04:48 32081 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
11:04:48 32081 Testing graeme@graemef.net
11:04:48 32081 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
11:04:48 32081 Considering graeme@graemef.net
11:04:48 32081 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
11:04:48 32081 routing graeme@graemef.net
11:04:48 32081 --------> dnslookup router <--------
11:04:48 32081 local_part=graeme domain=graemef.net
11:04:48 32081 checking domains
11:04:48 32081 graemef.net in "@"? no (end of list)
11:04:48 32081 graemef.net in "! +local_domains"? yes (end of list)
11:04:48 32081 calling dnslookup router
11:04:48 32081 dnslookup router called for graeme@graemef.net
11:04:48 32081 domain = graemef.net
11:05:08 32081 DNS lookup of graemef.net (MX) gave TRY_AGAIN
11:05:08 32081 graemef.net in dns_again_means_nonexist? no (option unset)
11:05:08 32081 returning DNS_AGAIN
11:05:08 32081 dnslookup router: defer for graeme@graemef.net
11:05:08 32081 message: host lookup did not complete
graeme@graemef.net cannot be resolved at this time:
host lookup did not complete
11:05:08 32081 search_tidyup called

So, after a retry delay, it defers.

Changing the -j DROP to -j REJECT and using the various types of
rejection, I get the following (condensed for brevity):

icmp-net-unreachable -> defer after delay, as with DROP
icmp-host-unreachable -> defer after delay, as with DROP
icmp-port-unreachable -> immediate deferral
icmp-proto-unreachable -> immediate deferral
icmp-net-prohibited -> immediate deferral
icmp-host-prohibited -> immediate deferral

Changing this so that the local resolver is reachable but the remote
nameservers for the same domain are not, I get the exact same behaviour.

Of course, this could be entirely down to the fact that the box I used
for testing is running Exim 4.22 on RedHat 8.0. I really must change my
gateway :)

Graeme

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 488] Failed name server leads to unroutable address error [ In reply to ]
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488





------- Comment #10 from marc@perkel.com 2007-03-23 14:27 -------
I see the problem. I have dns_again_means_nonexist = *

Not sure why I set that but that has to be my problem. What should that
be set to?

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 488] Failed name server leads to unroutable address error [ In reply to ]
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=488


holmgren@lysator.liu.se changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |INVALID




------- Comment #11 from holmgren@lysator.liu.se 2007-03-23 19:29 -------
Certainly not *. If you don't know, it probably shouldn't be set to anything at
all.

The documentation describes what it's good for - for certain domains with
broken nameservers that return SERVFAIL instead of NXDOMAIN.

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##