Mailing List Archive

libspf2 crash
I'm seeing a number of segfaults. I believe the issue is that libreplace is
calling __dn_skipname with identical pointers, but I can't seem to narrow it
down. When I pull up a core dump, it looks like:

(gdb) bt
#0 0x00be37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00c24815 in raise () from /lib/tls/libc.so.6
#2 0x00c26279 in abort () from /lib/tls/libc.so.6
#3 0x08065610 in segfault_handler (received_signal=11) at common/signals.c:36
#4 <signal handler called>
#5 0x00db2e88 in __ns_name_skip () from /lib/libresolv.so.2
#6 0x00dac2dc in __dn_skipname () from /lib/libresolv.so.2
#7 0x00662205 in __ns_skiprr (ptr=0xa0cdd2 "", eom=0xa0cdd2 "",
section=ns_s_an, count=10526448) at __ns_initparse.c:84
#8 0x0066237e in __ns_initparse (msg=0xa0a866 "�\f", msglen=9618,
handle=0xa0a810) at __ns_initparse.c:124
#9 0x0065ba19 in SPF_dns_resolv_lookup (spf_dns_server=0xaba4ae8,
domain=0xb78f28e0 "yourshirevillage.com", rr_type=ns_t_mx, should_cache=1) at
spf_dns_resolv.c:188
#10 0x0065adbc in SPF_dns_lookup (spf_dns_server=0xaba4ae8, domain=0xb78f28e0
"yourshirevillage.com", rr_type=ns_t_mx, should_cache=1) at spf_dns.c:114
#11 0x0065b37c in SPF_dns_cache_lookup (spf_dns_server=0xaba4b10,
domain=0xb78f28e0 "yourshirevillage.com", rr_type=ns_t_mx, should_cache=1) at
spf_dns_cache.c:387
#12 0x0065adbc in SPF_dns_lookup (spf_dns_server=0xaba4b10, domain=0xb78f28e0
"yourshirevillage.com", rr_type=ns_t_mx, should_cache=1) at spf_dns.c:114
#13 0x0065f60f in SPF_record_interpret (spf_record=0xae91c230,
spf_request=0xae91bab8, spf_response=0xae91b968, depth=0) at spf_interpret.c:778
#14 0x00660fc3 in SPF_request_query_record (spf_request=0xae91bab8,
spf_response=0xae91b968, spf_record=0xae91c230, err=SPF_E_SUCCESS) at
spf_request.c:224
#15 0x0066102c in SPF_request_query_mailfrom (spf_request=0xae91bab8,
spf_responsep=0xa0b24c) at spf_request.c:255
#16 0x0806945c in smtp_check_spf (session=0xa0b320) at smtp/smtp_spf.c:136
#17 0x0805dd2d in smtp_process_rcptto (session=0xa0b320) at
smtp/smtp_process_connection.c:407
#18 0x0805f375 in smtp_process_connection (session=0xa0b320) at
smtp/smtp_process_connection.c:1260
#19 0x080651e4 in worker_thread () at common/worker.c:84
#20 0x00ded3cc in start_thread () from /lib/tls/libpthread.so.0
#21 0x00cc61ae in clone () from /lib/tls/libc.so.6


-------------------------------------------
Sender Policy Framework: http://www.openspf.org
Modify Your Subscription: http://www.listbox.com/member/
Archives: http://www.listbox.com/member/archive/1007/=now
RSS Feed: http://www.listbox.com/member/archive/rss/1007/
Powered by Listbox: http://www.listbox.com
Re: libspf2 crash [ In reply to ]
On Wed, 07 May 2008 10:44:46 -0500 Ladar Levison <ladar@lavabit.com> wrote:
>
>I'm seeing a number of segfaults. I believe the issue is that libreplace
is
>calling __dn_skipname with identical pointers, but I can't seem to narrow
it
>down.

...

How are you triggering the fault?

What OS/Distro/version?

Does this happen when you call libspf2 via spfquery or can you provide some
reduced test case that demonstrates the problem?

Scott K

-------------------------------------------
Sender Policy Framework: http://www.openspf.org
Modify Your Subscription: http://www.listbox.com/member/
Archives: http://www.listbox.com/member/archive/1007/=now
RSS Feed: http://www.listbox.com/member/archive/rss/1007/
Powered by Listbox: http://www.listbox.com
Re: libspf2 crash [ In reply to ]
Scott Kitterman wrote:
> On Wed, 07 May 2008 10:44:46 -0500 Ladar Levison <ladar@lavabit.com> wrote:
>
>> I'm seeing a number of segfaults. I believe the issue is that libreplace
>>
> is
>
>> calling __dn_skipname with identical pointers, but I can't seem to narrow
>>
> it
>
>> down.
>>
>
> ...
>
> How are you triggering the fault?
>
> What OS/Distro/version?
>
> Does this happen when you call libspf2 via spfquery or can you provide some
> reduced test case that demonstrates the problem?
>
> Scott K
>


The issue appears to occur sporadically with certain domains. One such
example is yourshirevillage.com, and it, like the others, lists "mx" in
the SPF record, and then has a couple hundred MX records listed. I
believe the issue is that a corrupted DNS result is being returned, and
that's what's triggering the segfault. I haven't been able to figure it
out completely yet.


-------------------------------------------
Sender Policy Framework: http://www.openspf.org
Modify Your Subscription: http://www.listbox.com/member/
Archives: http://www.listbox.com/member/archive/1007/=now
RSS Feed: http://www.listbox.com/member/archive/rss/1007/
Powered by Listbox: http://www.listbox.com
Re: libspf2 crash [ In reply to ]
Ladar Levison wrote:

>>> I'm seeing a number of segfaults. I believe the issue is that
>>> libreplace
[...]
> The issue appears to occur sporadically with certain domains. One
> such example is yourshirevillage.com, and it, like the others, lists
> "mx" in the SPF record, and then has a couple hundred MX records
> listed. I believe the issue is that a corrupted DNS result is being
> returned, and that's what's triggering the segfault. I haven't been
> able to figure it out completely yet.

Do you have this patch applied to libspf2? Without it, it segfaults
consistently on a number of domains, that have "unusual" but
syntactically valid TXT RR.

--- src/libspf2/spf_dns_resolv.c.orig 2005-02-19 05:38:12.000000000 +0300
+++ src/libspf2/spf_dns_resolv.c 2007-11-03 11:54:03.000000000 +0300
@@ -399,6 +399,10 @@
while ( rdlen > 0 )
{
len = *src;
+ /* zero length element? must be
terminator */
+ if (len == 0) break;
+ /* element longer than buffer? corrupt
data */
+ if (len >= rdlen) break;
src++;
memcpy( dst, src, len );
dst += len;

Eugene

-------------------------------------------
Sender Policy Framework: http://www.openspf.org
Modify Your Subscription: http://www.listbox.com/member/
Archives: http://www.listbox.com/member/archive/1007/=now
RSS Feed: http://www.listbox.com/member/archive/rss/1007/
Powered by Listbox: http://www.listbox.com
Re: libspf2 crash [ In reply to ]
On Wed, May 07, 2008 at 10:44:46AM -0500, Ladar Levison wrote:
>
> I'm seeing a number of segfaults. I believe the issue is that libreplace is
> calling __dn_skipname with identical pointers, but I can't seem to narrow it
> down. When I pull up a core dump, it looks like:
>
> (gdb) bt
> #0 0x00be37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> #1 0x00c24815 in raise () from /lib/tls/libc.so.6
> #2 0x00c26279 in abort () from /lib/tls/libc.so.6
> #3 0x08065610 in segfault_handler (received_signal=11) at common/signals.c:36
> #4 <signal handler called>
> #5 0x00db2e88 in __ns_name_skip () from /lib/libresolv.so.2
> #6 0x00dac2dc in __dn_skipname () from /lib/libresolv.so.2
> #7 0x00662205 in __ns_skiprr (ptr=0xa0cdd2 "", eom=0xa0cdd2 "",
> section=ns_s_an, count=10526448) at __ns_initparse.c:84

If "count" is not a call by reference parameter its value represents
probably an address (0xA09EF0) instead a count value?
But this could be just a limitation of gdb's knowledge of the calling
interface ...

> #8 0x0066237e in __ns_initparse (msg=0xa0a866 "ý\f", msglen=9618,
> handle=0xa0a810) at __ns_initparse.c:124
> #9 0x0065ba19 in SPF_dns_resolv_lookup (spf_dns_server=0xaba4ae8,
> domain=0xb78f28e0 "yourshirevillage.com", rr_type=ns_t_mx, should_cache=1) at
> spf_dns_resolv.c:188

I'm not sure if my experience I had so far matches this problem somehow:

I encountered a problem of libspf2 1.2.5 which does not cleanly
initialize a data structure given to res_ninit().

Maybe this leads to a delayed problem like the above (I think not very
likely, because __ns_initparse is called after res_ninit() ...). Anyway,
just to mention it: In my environment res_ninit() takes the workspace
given and tries to free up referenced memory - asuming a reused
workspace. If the malloced workspace is not initialized to zero
res_ninit() tries to free garbage ... (seen on Solaris 8/Sparc,
Solaris 9/x86, not on Linux so far).
(http://jk.kom.tuwien.ac.at/software/milter-greylist/libspf2-1.2.5-res_ninit.patch)

A second point is the sometime missing malloc result handling in libspf2
1.2.5 On serveral occasions it is not expected that malloc may return a
NULL pointer ...
(http://jk.kom.tuwien.ac.at/software/milter-greylist/libspf2-1.2.5-malloc.patch).
However, this should only a problem if the virtual memory gets really
short ;)

Johann

-------------------------------------------
Sender Policy Framework: http://www.openspf.org
Modify Your Subscription: http://www.listbox.com/member/
Archives: http://www.listbox.com/member/archive/1007/=now
RSS Feed: http://www.listbox.com/member/archive/rss/1007/
Powered by Listbox: http://www.listbox.com
Re: libspf2 crash [ In reply to ]
Eugene Crosser wrote:
> Ladar Levison wrote:
>
>>>> I'm seeing a number of segfaults. I believe the issue is that
>>>> libreplace
> [...]
>> The issue appears to occur sporadically with certain domains. One
>> such example is yourshirevillage.com, and it, like the others, lists
>> "mx" in the SPF record, and then has a couple hundred MX records
>> listed. I believe the issue is that a corrupted DNS result is being
>> returned, and that's what's triggering the segfault. I haven't been
>> able to figure it out completely yet.
>
> Do you have this patch applied to libspf2? Without it, it segfaults
> consistently on a number of domains, that have "unusual" but
> syntactically valid TXT RR.
>
> --- src/libspf2/spf_dns_resolv.c.orig 2005-02-19 05:38:12.000000000
> +0300
> +++ src/libspf2/spf_dns_resolv.c 2007-11-03 11:54:03.000000000
> +0300
> @@ -399,6 +399,10 @@
> while ( rdlen > 0 )
> {
> len = *src;
> + /* zero length element? must be
> terminator */
> + if (len == 0) break;
> + /* element longer than buffer? corrupt
> data */
> + if (len >= rdlen) break;
> src++;
> memcpy( dst, src, len );
> dst += len;
>
> Eugene
Yes I do. This appears to be a different bug.

-------------------------------------------
Sender Policy Framework: http://www.openspf.org
Modify Your Subscription: http://www.listbox.com/member/
Archives: http://www.listbox.com/member/archive/1007/=now
RSS Feed: http://www.listbox.com/member/archive/rss/1007/
Powered by Listbox: http://www.listbox.com
Re: libspf2 crash [ In reply to ]
If you execute the attached code right now (Tuesday, 12:30am) you'll be able to
reproduce the crash. Note that these spammer domains disappear often, so I'm not
sure how long this will last. Also note I'm loading the library dynamically, so
update the code accordingly. The Valgrind output is:

==22263== Conditional jump or move depends on uninitialised value(s)
==22263== at 0x555E8E: __ns_name_skip (in /lib/libresolv-2.3.4.so)
==22263== by 0x54F2DB: __dn_skipname (in /lib/libresolv-2.3.4.so)
==22263== by 0x40166BF: __ns_skiprr (__ns_initparse.c:83)
==22263== by 0x4016839: __ns_initparse (__ns_initparse.c:123)
==22263== by 0x400F3D4: SPF_dns_resolv_lookup (spf_dns_resolv.c:188)
==22263== by 0x400E5E6: SPF_dns_lookup (spf_dns.c:114)
==22263== by 0x4013513: SPF_record_interpret (spf_interpret.c:778)
==22263== by 0x4015289: SPF_request_query_record (spf_request.c:224)
==22263== by 0x401530B: SPF_request_query_mailfrom (spf_request.c:255)
==22263== by 0x8048623: main (main.c:134)

I'll continue to investigate, and if I come up with a patch, I'll post it.
Re: libspf2 crash [ In reply to ]
I believe I've tracked down the issue. It has to do with
SPF_dns_resolv_lookup(), and where it calls res_nquery(). Basically res_nquery()
is returning 5600, a length greater than sizeof(response), which is a hard coded
buffer size of 2048. Once the __ns_skiprr function reads past the 2048 bytes,
your reading random data which is what's causing segfaults (and the Valgrind
errors).

The solution is two fold. Increase the buffer size to 8196, and then check
whether the return value of res_nquery() is less than 8196. If dns_len is
greater than 8196, set it to 8196.

I believe what's happening, and I'd like confirmation on this, is res_nquery()
is returning the amount of data it could read if the buffer length was
sufficient. Its the client's responsibility to check whether this is greater
than the buffer size (something libspf2 isn't doing). Anyone else have thoughts
on this?

I'm running on CentOS 4.6.

Ladar Levison wrote:
> If you execute the attached code right now (Tuesday, 12:30am) you'll be
> able to reproduce the crash. Note that these spammer domains disappear
> often, so I'm not sure how long this will last. Also note I'm loading
> the library dynamically, so update the code accordingly. The Valgrind
> output is:
>
> ==22263== Conditional jump or move depends on uninitialised value(s)
> ==22263== at 0x555E8E: __ns_name_skip (in /lib/libresolv-2.3.4.so)
> ==22263== by 0x54F2DB: __dn_skipname (in /lib/libresolv-2.3.4.so)
> ==22263== by 0x40166BF: __ns_skiprr (__ns_initparse.c:83)
> ==22263== by 0x4016839: __ns_initparse (__ns_initparse.c:123)
> ==22263== by 0x400F3D4: SPF_dns_resolv_lookup (spf_dns_resolv.c:188)
> ==22263== by 0x400E5E6: SPF_dns_lookup (spf_dns.c:114)
> ==22263== by 0x4013513: SPF_record_interpret (spf_interpret.c:778)
> ==22263== by 0x4015289: SPF_request_query_record (spf_request.c:224)
> ==22263== by 0x401530B: SPF_request_query_mailfrom (spf_request.c:255)
> ==22263== by 0x8048623: main (main.c:134)
>
> I'll continue to investigate, and if I come up with a patch, I'll post it.
>
>
>



-------------------------------------------
Sender Policy Framework: http://www.openspf.org
Modify Your Subscription: http://www.listbox.com/member/
Archives: http://www.listbox.com/member/archive/1007/=now
RSS Feed: http://www.listbox.com/member/archive/rss/1007/
Powered by Listbox: http://www.listbox.com
Re: libspf2 crash [ In reply to ]
On a final note (and just for the record), here is the GDB output required to
prove this was the issue:

(gdb) bt
#0 0x00bab7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00bec815 in raise () from /lib/tls/libc.so.6
#2 0x00bee279 in abort () from /lib/tls/libc.so.6
#3 0x08065638 in segfault_handler (received_signal=11) at common/signals.c:36
#4 <signal handler called>
#5 0x00d7ae88 in __ns_name_skip () from /lib/libresolv.so.2
#6 0x00d742dc in __dn_skipname () from /lib/libresolv.so.2
#7 0x0065e205 in __ns_skiprr (ptr=0xb3caae21 "", eom=0xb3caae21 "",
section=ns_s_an, count=-1278574864) at __ns_initparse.c:84
#8 0x0065e37e in __ns_initparse (msg=0xb3ca8862 "�\f", msglen=9697,
handle=0xb3ca8810) at __ns_initparse.c:124
#9 0x00657a19 in SPF_dns_resolv_lookup (spf_dns_server=0xb3e3330,
domain=0xb7a46c18 "citylinenews.com", rr_type=ns_t_mx, should_cache=1)
at spf_dns_resolv.c:188
#10 0x00656dbc in SPF_dns_lookup (spf_dns_server=0xb3e3330, domain=0xb7a46c18
"citylinenews.com", rr_type=ns_t_mx, should_cache=1)
at spf_dns.c:114
#11 0x0065737c in SPF_dns_cache_lookup (spf_dns_server=0xb3e3358,
domain=0xb7a46c18 "citylinenews.com", rr_type=ns_t_mx, should_cache=1)
at spf_dns_cache.c:387
#12 0x00656dbc in SPF_dns_lookup (spf_dns_server=0xb3e3358, domain=0xb7a46c18
"citylinenews.com", rr_type=ns_t_mx, should_cache=1)
at spf_dns.c:114
#13 0x0065b60f in SPF_record_interpret (spf_record=0xb7a92358,
spf_request=0xb7a5bd28, spf_response=0xb7a928b0, depth=0)
at spf_interpret.c:778
#14 0x0065cfc3 in SPF_request_query_record (spf_request=0xb7a5bd28,
spf_response=0xb7a928b0, spf_record=0xb7a92358, err=SPF_E_SUCCESS)
at spf_request.c:224
#15 0x0065d02c in SPF_request_query_mailfrom (spf_request=0xb7a5bd28,
spf_responsep=0xb3ca924c) at spf_request.c:255
#16 0x08069484 in smtp_check_spf (session=0xb3ca9320) at smtp/smtp_spf.c:136
#17 0x0805dd55 in smtp_process_rcptto (session=0xb3ca9320) at
smtp/smtp_process_connection.c:407
#18 0x0805f39d in smtp_process_connection (session=0xb3ca9320) at
smtp/smtp_process_connection.c:1260
#19 0x0806520c in worker_thread () at common/worker.c:84
#20 0x00da33cc in start_thread () from /lib/tls/libpthread.so.0
#21 0x00c8e1ae in clone () from /lib/tls/libc.so.6
(gdb) frame 9
#9 0x00657a19 in SPF_dns_resolv_lookup (spf_dns_server=0xb3e3330,
domain=0xb7a46c18 "citylinenews.com", rr_type=ns_t_mx, should_cache=1)
at spf_dns_resolv.c:188
(gdb) print dns_len
$1 = 9697
(gdb) print sizeof(response)
$2 = 2048
(gdb) quit

Note that 9697 is greater than 2048.

Ladar Levison wrote:
> I believe I've tracked down the issue. It has to do with
> SPF_dns_resolv_lookup(), and where it calls res_nquery(). Basically
> res_nquery() is returning 5600, a length greater than sizeof(response),
> which is a hard coded buffer size of 2048. Once the __ns_skiprr function
> reads past the 2048 bytes, your reading random data which is what's
> causing segfaults (and the Valgrind errors).
>
> The solution is two fold. Increase the buffer size to 8196, and then
> check whether the return value of res_nquery() is less than 8196. If
> dns_len is greater than 8196, set it to 8196.
>
> I believe what's happening, and I'd like confirmation on this, is
> res_nquery() is returning the amount of data it could read if the buffer
> length was sufficient. Its the client's responsibility to check whether
> this is greater than the buffer size (something libspf2 isn't doing).
> Anyone else have thoughts on this?
>
> I'm running on CentOS 4.6.
>
> Ladar Levison wrote:
>> If you execute the attached code right now (Tuesday, 12:30am) you'll
>> be able to reproduce the crash. Note that these spammer domains
>> disappear often, so I'm not sure how long this will last. Also note
>> I'm loading the library dynamically, so update the code accordingly.
>> The Valgrind output is:
>>
>> ==22263== Conditional jump or move depends on uninitialised value(s)
>> ==22263== at 0x555E8E: __ns_name_skip (in /lib/libresolv-2.3.4.so)
>> ==22263== by 0x54F2DB: __dn_skipname (in /lib/libresolv-2.3.4.so)
>> ==22263== by 0x40166BF: __ns_skiprr (__ns_initparse.c:83)
>> ==22263== by 0x4016839: __ns_initparse (__ns_initparse.c:123)
>> ==22263== by 0x400F3D4: SPF_dns_resolv_lookup (spf_dns_resolv.c:188)
>> ==22263== by 0x400E5E6: SPF_dns_lookup (spf_dns.c:114)
>> ==22263== by 0x4013513: SPF_record_interpret (spf_interpret.c:778)
>> ==22263== by 0x4015289: SPF_request_query_record (spf_request.c:224)
>> ==22263== by 0x401530B: SPF_request_query_mailfrom (spf_request.c:255)
>> ==22263== by 0x8048623: main (main.c:134)
>>
>> I'll continue to investigate, and if I come up with a patch, I'll post
>> it.
>>
>>
>>
>




-------------------------------------------
Sender Policy Framework: http://www.openspf.org
Modify Your Subscription: http://www.listbox.com/member/
Archives: http://www.listbox.com/member/archive/1007/=now
RSS Feed: http://www.listbox.com/member/archive/rss/1007/
Powered by Listbox: http://www.listbox.com