Mailing List Archive

Java Handling Signals?
I took my first swing at integrating mod_mbox and Lucene4c this
afternoon. Code wise, it all worked. However, running the code, I
quickly ran into a problem:

#0 0x40268e38 in _Unwind_Find_FDE () from /lib/tls/libc.so.6
#1 0x402678a5 in dl_iterate_phdr () from /lib/tls/libc.so.6
#2 0x40268925 in _Unwind_Find_FDE () from /lib/tls/libc.so.6
#3 0x411987ef in uw_frame_state_for (context=0xbfbc41e0, fs=0xbfbc4120)
at ../../gcc-4.0.0/gcc/unwind-dw2.c:971
#4 0x41199cbc in _Unwind_RaiseException (exc=0x5429e0) at unwind.inc:104
#5 0x408611e2 in _Jv_Throw (value=0x2967e0) at
../../../gcc-4.0.0/libjava/exception.cc:103
#6 0x40854afa in catch_segv (_dummy=11) at
../../../gcc-4.0.0/libjava/prims.cc:152
#7 <signal handler called>
#8 0x41198b1d in uw_frame_state_for (context=0xbfbc4718, fs=0xbfbc4658)
at linux-unwind.h:125
#9 0x41199cbc in _Unwind_RaiseException (exc=0x542a20) at unwind.inc:104
#10 0x408611e2 in _Jv_Throw (value=0x2967f8) at
../../../gcc-4.0.0/libjava/exception.cc:103
#11 0x40854afa in catch_segv (_dummy=11) at
../../../gcc-4.0.0/libjava/prims.cc:152
#12 <signal handler called>
#13 0x41198b1d in uw_frame_state_for (context=0xbfbc4c50, fs=0xbfbc4b90)
at linux-unwind.h:125
#14 0x41199cbc in _Unwind_RaiseException (exc=0x542a60) at unwind.inc:104
#15 0x408611e2 in _Jv_Throw (value=0x296810) at
../../../gcc-4.0.0/libjava/exception.cc:103
#16 0x40854afa in catch_segv (_dummy=11) at
../../../gcc-4.0.0/libjava/prims.cc:152
#17 <signal handler called>
...continues forever...

More or less, java is catching a signal, and then causing another signal
to be sent from inside the handler (ouch!). This seems to happen
randomly while adding thousands of documents to a writer.

Source is at:
http://svn.apache.org/repos/asf/httpd/mod_mbox/trunk/module-2.0/mbox_search.c

It seems this is likely a bug in gcj?

-Paul
Re: Java Handling Signals? [ In reply to ]
Paul Querna wrote:
> I took my first swing at integrating mod_mbox and Lucene4c this
> afternoon. Code wise, it all worked. However, running the code, I
> quickly ran into a problem:

[ snip backtrace ]

> More or less, java is catching a signal, and then causing another signal
> to be sent from inside the handler (ouch!). This seems to happen
> randomly while adding thousands of documents to a writer.
>
> Source is at:
> http://svn.apache.org/repos/asf/httpd/mod_mbox/trunk/module-2.0/mbox_search.c
>
> It seems this is likely a bug in gcj?

Interesting...

Can you reproduce this while running inside a debugger? I'm kind of
curious what exactly is being thrown, that could give us some sort of
clue to what the cause of the whole thing is. Perhaps the argument to
_Jv_Throw would be worth printing out...

Actually, now that I look at catch_segv, it seems like it's a null
pointer exception of some sort, but throwing the NPE is causing another
SEGV.

Is this reasonably simple to reproduce? Did you just throw a big chunk
of mbox files at mod_mbox?

While it seems likely that this is a GCJ problem, I'm also curious if
it's something we could work around, perhaps more frequent optimizations
(which we still need to expose via the C API of course) or something
might help keep resource utilization down to the point where this
problem doesn't happen.

-garrett
Re: Java Handling Signals? [ In reply to ]
Garrett Rooney wrote:
> Paul Querna wrote:
>
>> I took my first swing at integrating mod_mbox and Lucene4c this
>> afternoon. Code wise, it all worked. However, running the code, I
>> quickly ran into a problem:
>
>
> [ snip backtrace ]
>
>> More or less, java is catching a signal, and then causing another signal
>> to be sent from inside the handler (ouch!). This seems to happen
>> randomly while adding thousands of documents to a writer.
>>
>> Source is at:
>> http://svn.apache.org/repos/asf/httpd/mod_mbox/trunk/module-2.0/mbox_search.c
>>
>>
>> It seems this is likely a bug in gcj?
>
>
> Interesting...
>
> Can you reproduce this while running inside a debugger? I'm kind of
> curious what exactly is being thrown, that could give us some sort of
> clue to what the cause of the whole thing is. Perhaps the argument to
> _Jv_Throw would be worth printing out...

Running inside the debugger, it never finishes processing the first
.mbox file. End of this email has a trace. Running normally, It does
usually complete the first .mbox file.

>
> Actually, now that I look at catch_segv, it seems like it's a null
> pointer exception of some sort, but throwing the NPE is causing another
> SEGV.
>
> Is this reasonably simple to reproduce? Did you just throw a big chunk
> of mbox files at mod_mbox?

Yes, the signal crash reproduces easily.

> While it seems likely that this is a GCJ problem, I'm also curious if
> it's something we could work around, perhaps more frequent optimizations
> (which we still need to expose via the C API of course) or something
> might help keep resource utilization down to the point where this
> problem doesn't happen.
>
> -garrett
>

(gdb) set args -v -c /x1/work/mbox/data/apr.apache.org/dev -s /tmp/test
(gdb) run
Starting program: /x1/work/mbox/mod_mbox/module-2.0/mod-mbox-util -v -c
/x1/work/mbox/data/apr.apache.org/dev -s /tmp/test
[Thread debugging using libthread_db enabled]
[New Thread 1094048032 (LWP 15561)]
Opening Lucene Index at: /tmp/test
[New Thread 1102941104 (LWP 15568)]
Base Path: /x1/work/mbox/data/apr.apache.org/dev
Found 53 mbox files to process
Scaning 200503.mbox for Mailing List info
Building Cache for dev@apr.apache.org
Last Update: Tue, 31 May 2005 23:12:24 GMT
Current Time: Wed, 01 Jun 2005 00:05:37 GMT
Processing '200503.mbox'

Program received signal SIGPWR, Power fail/restart.
[Switching to Thread 1102941104 (LWP 15568)]
0x41188295 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/tls/libpthread.so.0

(gdb) where
#0 0x41188295 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/tls/libpthread.so.0
#1 0x40241f66 in pthread_cond_wait () from /lib/tls/libc.so.6
#2 0x40bb40ac in _Jv_CondWait (cv=0x4111b960, mu=0x4111b980, millis=The
value of variable 'millis' is distributed across several
locations, and GDB cannot access its value.

) at ../../../gcc-4.0.0/libjava/posix-threads.cc:147
#3 0x40881896 in gnu::gcj::runtime::FinalizerThread::run (this=0x5fb40)
at ../../../gcc-4.0.0/libjava/gnu/gcj/runtime/natFinalizerThread.cc:57
#4 0x40892b8b in _Jv_ThreadRun (thread=0x5fb40) at
../../../gcc-4.0.0/libjava/java/lang/natThread.cc:289
#5 0x40bb45c0 in really_start (x=0x70ff8) at
../../../gcc-4.0.0/libjava/posix-threads.cc:398
#6 0x40c8f298 in GC_start_routine (arg=0x53fa0) at
../../../gcc-4.0.0/boehm-gc/pthread_support.c:1185
#7 0x41185b63 in start_thread () from /lib/tls/libpthread.so.0
#8 0x4023618a in clone () from /lib/tls/libc.so.6

(gdb) info threads
* 2 Thread 1102941104 (LWP 15568) 0x41188295 in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
1 Thread 1094048032 (LWP 15561) 0x41189d0b in sem_wait@GLIBC_2.0 ()
from /lib/tls/libpthread.so.0

(gdb) thread 1
[Switching to thread 1 (Thread 1094048032 (LWP 15561))]#0 0x41189d0b in
sem_wait@GLIBC_2.0 () from /lib/tls/libpthread.so.0

(gdb) where
#0 0x41189d0b in sem_wait@GLIBC_2.0 () from /lib/tls/libpthread.so.0
#1 0x4110e514 in __JCR_END__ () from /home/chip/gcc4/lib/libgcj.so.6
#2 0x40c904be in GC_stop_world () at
../../../gcc-4.0.0/boehm-gc/pthread_stop_world.c:392
#3 0x40c7f541 in GC_stopped_mark (stop_func=0x40c7f200
<GC_never_stop_func>) at ../../../gcc-4.0.0/boehm-gc/alloc.c:482
#4 0x40c7ff2a in GC_try_to_collect_inner (stop_func=0x40c7f200
<GC_never_stop_func>) at ../../../gcc-4.0.0/boehm-gc/alloc.c:365
#5 0x40c808b6 in GC_collect_or_expand (needed_blocks=1,
ignore_off_page=0) at ../../../gcc-4.0.0/boehm-gc/alloc.c:1020
#6 0x40c80a2b in GC_allocobj (sz=340, kind=0) at
../../../gcc-4.0.0/boehm-gc/alloc.c:1071
#7 0x40c852fa in GC_generic_malloc_inner (lb=1032, k=0) at
../../../gcc-4.0.0/boehm-gc/malloc.c:136
#8 0x40c85415 in GC_generic_malloc (lb=1032, k=0) at
../../../gcc-4.0.0/boehm-gc/malloc.c:192
#9 0x40c856a3 in GC_malloc_atomic (lb=1032) at
../../../gcc-4.0.0/boehm-gc/malloc.c:262
#10 0x40c8ff61 in GC_local_malloc_atomic (bytes=1032) at
../../../gcc-4.0.0/boehm-gc/pthread_support.c:332
#11 0x40854b50 in _Jv_NewPrimArray (eltype=0x4111b6e0, count=1024) at
java-gc.h:57
#12 0x400dc836 in
org.apache.lucene.store.RAMOutputStream.flushBuffer(byte[], int)
(this=0x2da5e0, src=0x325000, len=1)
at RAMOutputStream.java:75
#13 0x400dbad0 in org.apache.lucene.store.OutputStream.flush()
(this=0x2da5e0) at OutputStream.java:131
#14 0x400dc895 in org.apache.lucene.store.RAMOutputStream.close()
(this=0x2da5e0) at RAMOutputStream.java:98
#15 0x400ab3d1 in
org.apache.lucene.index.DocumentWriter.writeNorms(org.apache.lucene.document.Document,
java.lang.String) (
this=0x29c510, doc=0x2ffab0, segment=0x305a50) at
TermVectorsWriter.java:328
#16 0x400ade93 in
org.apache.lucene.index.IndexWriter.addDocument(org.apache.lucene.document.Document,
org.apache.lucene.analysis.Analyzer) (this=0x8bf18, doc=0xfffffffc,
analyzer=0xfffffffc) at IndexWriter.java:312
#17 0x400ade1f in
org.apache.lucene.index.IndexWriter.addDocument(org.apache.lucene.document.Document)
(this=0x0, doc=0xfffffffc)
at IndexWriter.java:294
#18 0x400df52f in index_writer_add_document_internal (b=0x0) at
src/index/writer.cxx:123
#19 0x400df13b in lcn__swallow_exception (func=0x400df510
<index_writer_add_document_internal>, baton=0xfffffffc)
at src/util/exception.cxx:59
#20 0x400df573 in lcn_index_writer_add_document (writer=0xfffffffc,
document=0xfffffffc, pool=0x8057908) at src/index/writer.cxx:136
#21 0x0804ac95 in mbox_indexer_add (ctx=0x8053a30, doci=0xbffff5bc) at
mbox_search.c:131
#22 0x0804c03b in store_msgc (pool=0x808dae0, database=0x80853e8,
key=0x80a3aa4 "<20050308202015.GB14734@redhat.com>",
msgc=0xbffff698, indexer=0x8053a30, list=0xfffffffc <Address
0xfffffffc out of bounds>,
domain=0xfffffffc <Address 0xfffffffc out of bounds>) at
mbox_parse.c:602
---Type <return> to continue, or q <return> to quit---
#23 0x0804c5ef in mbox_generate_index (r=0xbffff768, f=0x80852f0,
indexer=0x8053a30, list=0x807e088 "dev",
domain=0x807e08c "apr.apache.org") at mbox_parse.c:756
#24 0x0804a7c9 in main (argc=6, argv=0xbffffb04) at mod-mbox-util.c:106
(gdb)
Re: Java Handling Signals? [ In reply to ]
Paul Querna wrote:
[..snip..]
>
> Program received signal SIGPWR, Power fail/restart.

Okay, I did find out that SIGPWR is used by the GCJ garbage collector:
http://www.gnu.org/software/gcc/java/gdb.html

After ignoring the SIGPWR and SIGXCPU, it works much better, and gives a
useful backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1094048032 (LWP 15585)]
org.apache.lucene.store.RAMInputStream.RAMInputStream(org.apache.lucene.store.RAMFile)
(this=@84fdb0, f=null)
at RAMInputStream.java:31
31 RAMInputStream.java: No such file or directory.
in RAMInputStream.java
Current language: auto; currently java
(gdb) where
#0
org.apache.lucene.store.RAMInputStream.RAMInputStream(org.apache.lucene.store.RAMFile)
(this=@84fdb0, f=null)
at RAMInputStream.java:31
#1 0x400dc2ad in
org.apache.lucene.store.RAMDirectory.openFile(java.lang.String)
(this=@84fdb0, name=@84fdb0)
at RAMDirectory.java:165
#2 0x400b47eb in
org.apache.lucene.index.SegmentReader.openNorms(org.apache.lucene.store.Directory)
(this=@802ca8,
cfsDir=@5cb80) at SegmentReader.java:369
#3 0x400b35ce in
org.apache.lucene.index.SegmentReader.initialize(org.apache.lucene.index.SegmentInfo)
(this=@802ca8,
si=@78bc80) at SegmentReader.java:127
#4 0x400ae882 in org.apache.lucene.index.IndexWriter.mergeSegments(int)
(this=@8bf18, minSegment=7) at IndexWriter.java:484
#5 0x400ae62d in
org.apache.lucene.index.IndexWriter.maybeMergeSegments() (this=@8bf18)
at IndexWriter.java:462
#6 0x400adee3 in
org.apache.lucene.index.IndexWriter.addDocument(org.apache.lucene.document.Document,
org.apache.lucene.analysis.Analyzer) (this=@8bf18, doc=@84fdb0,
analyzer=@84fdb0) at IndexWriter.java:311
#7 0x400ade1f in
org.apache.lucene.index.IndexWriter.addDocument(org.apache.lucene.document.Document)
(this=@84fdb0,
doc=@84fdb0) at IndexWriter.java:294
#8 0x400df52f in index_writer_add_document_internal (b=@84fdb0) at
src/index/writer.cxx:123
#9 0x400df13b in lcn__swallow_exception (func=0x400df510
<index_writer_add_document_internal>, baton=@84fdb0)
at src/util/exception.cxx:59
#10 0x400df573 in lcn_index_writer_add_document (writer=@84fdb0,
document=@84fdb0, pool=@8057908) at src/index/writer.cxx:136
#11 0x0804ac95 in mbox_indexer_add (ctx=@8053a30, doci=@bffff5ac) at
mbox_search.c:131
#12 0x0804c03b in store_msgc (pool=@8130d60, database=@80853e8,
key=@81d98c4, msgc=@bffff688, indexer=@8053a30,
list=@84fdb0, domain=@84fdb0) at mbox_parse.c:602
#13 0x0804c5ef in mbox_generate_index (r=@bffff758, f=@80852f0,
indexer=@8053a30, list=@807e088, domain=@807e08c)
at mbox_parse.c:756
#14 0x0804a7c9 in main (argc=6, argv=@bffffaf4) at mod-mbox-util.c:106
(gdb) info threads
2 Thread 1102941104 (LWP 15592) 0x41188295 in
pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
* 1 Thread 1094048032 (LWP 15585)
org.apache.lucene.store.RAMInputStream.RAMInputStream(org.apache.lucene.store.RAMFile)
(
this=@84fdb0, f=null) at RAMInputStream.java:31

(gdb) thread 2
[Switching to thread 2 (Thread 1102941104 (LWP 15592))]#0 0x41188295 in
pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/tls/libpthread.so.0

(gdb) where
#0 0x41188295 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/tls/libpthread.so.0
#1 0x40241f66 in pthread_cond_wait () from /lib/tls/libc.so.6
#2 0x40bb40ac in _Jv_CondWait (cv=@4111b960, mu=@4111b980, millis=The
value of variable 'millis' is distributed across several
locations, and GDB cannot access its value.

) at ../../../gcc-4.0.0/libjava/posix-threads.cc:147
#3 0x40881896 in gnu::gcj::runtime::FinalizerThread::run (this=@5fb40)
at ../../../gcc-4.0.0/libjava/gnu/gcj/runtime/natFinalizerThread.cc:57
#4 0x40892b8b in _Jv_ThreadRun (thread=@5fb40) at
../../../gcc-4.0.0/libjava/java/lang/natThread.cc:289
#5 0x40bb45c0 in really_start (x=@70ff8) at
../../../gcc-4.0.0/libjava/posix-threads.cc:398
#6 0x40c8f298 in GC_start_routine (arg=@53fa0) at
../../../gcc-4.0.0/boehm-gc/pthread_support.c:1185
#7 0x41185b63 in start_thread () from /lib/tls/libpthread.so.0
#8 0x4023618a in clone () from /lib/tls/libc.so.6
(gdb)


<rant>Why the hell are they using Signals for the Garbage
collection...</rant>

-Paul