Mailing List Archive

Re: Ruby & Lucene & ApacheCon
Thomas - have you had a look at PyLucene and how they do the gcj/SWIG
wizardry? What kinds of issues did you encounter with gcj? Perhaps
Andi Vajda from PyLucene could offer some advice?

I'd rather see the gcj/SWIG approach moving forward so that SWIG
Lucene doesn't lag behind Java Lucene where all the innovation happens.

As for Lucene4C versus CLucene and moving CLucene to Apache - I'll
let the c-dev@lucene list discuss it. I'm happy to have CLucene at
Apache too, though it seems simpler for us to only house a single
implementation in C. The gcj version would be ideal in my mind, but
I'm also not skilled in gcj (and haven't touched C in decades,
practically) - so it certainly is up to the actual coders where to go
with it.

Erik


On Aug 8, 2005, at 8:36 AM, Brian McCallister wrote:

> At ApacheCon EU I roped one of the most productive developers
> (Thomas Dudziak) I know (who also has SWIG experience =) into the
> Ruby/Lucene thing. Anyway, he's had little success with gcj and
> lucene4c thus far (lucene4c isn't quite complete enough, and as
> Garrett knows (and said he's working on) kind of tough to build.
>
> Anyway, Thomas went and in an afternoon put SWIG bindings around
> CLucene =)
>
> Now, the more fun part, Ben (whose email I don't have) of CLucene
> would like to move the project to Apache =)
>
> Thoughts?
>
> -Brian
>
> On Aug 8, 2005, at 6:34 AM, Thomas Dudziak wrote:
>
>
>> Hi,
>>
>> after much tinkering and installing/reinstalling gcc/gcj (3.4.3 and
>> 4.0.1) I finally got a combo of ruby+swig+gcj to compile, only to be
>> stopped dead by a internal compiler error of GCJ. I honestly don't
>> know why this works for PyLucene (which btw. I didnt' get to compile
>> because the mac version of Python is 2.3 whereas PyLucene seems to
>> require 2.4).
>>
>> And then yesterday by chance I spotted a mail by Ben van Klinken on
>> the SWIG mailing list who is the lead developer of the CLucene
>> project
>> (http://clucene.sourceforge.net/), a full C++ port of Lucene. So I
>> fired up an email to him and he told me that they've rewritten
>> CLucene
>> to be easily usable with SWIG (currently he's doing a C# and COM
>> wrapper for CLucene) and they already have more or less the
>> functionality as Lucene 1.4.3.
>>
>> So I decided to give it a try, and after about half an hour I not
>> only
>> had CLucene compiled and linked, but also a basic SWIG ruby wrapper
>> around one of the helper classes of CLucene (compared to about a week
>> for the same using gcj).
>>
>> The interesting thing now is that they'd like to move to Apache, they
>> even proposed incubation
>> (http://clucene.sourceforge.net/incubatorproposal.htm) though they
>> seem to be missing a sponsor (Erik didn't answer as far as I could
>> see
>> on the Lucene dev mailing list).
>> I'd very much like to use CLucene as the basis for the ruby binding
>> (and Ben is quite willing to help with any SWIG wrappers and C++
>> issues), so my question is: could you talk to Erik as to whether it
>> would be possible to accept the incubation proposal (via
>> sponsoring by
>> the Lucene PMC) ? From what I saw so far of CLucene, I might be able
>> manage to create a ruby binding of the querying in August, which
>> would
>> be a good start for the RubyLucene repository.
>
Re: Ruby & Lucene & ApacheCon [ In reply to ]
On 8/8/05, Erik Hatcher <erik@ehatchersolutions.com> wrote:
> Thomas - have you had a look at PyLucene and how they do the gcj/SWIG
> wizardry? What kinds of issues did you encounter with gcj? Perhaps
> Andi Vajda from PyLucene could offer some advice?
>
> I'd rather see the gcj/SWIG approach moving forward so that SWIG
> Lucene doesn't lag behind Java Lucene where all the innovation happens.

Yep, I tried to compile PyLucene on my Mac, but it failed because of
the Python version that comes with Mac OS 10.4 (which is 2.3). To be
fair to PyLucene, I only tried for a couple of hours as I don't really
have an interest in Python, I actually only wanted to see how they use
gcj.
But aside from that, I tried the PyLucene way first for a whole week.
First the issue of getting to run gcj on Mac OS X which ain't easy at
all - I had to install darwinports with a fresh gcc. Getting gcj to
run over Lucene is easy, works out of the box. But linking ruby with
swig-wrapped gcj-compiled lucene is not, all I got is a gcj internal
compiler error (with both gcc/gcj 3.4.3 and 4.0.1). This bug is in the
gcc bug list marked as a regression.
On Windows I had a similar amount of trouble using both MingW and
cygwin; I wasn't able to compile & link the stuff against ruby.

So to summarize, while there is definitely a strong argument for using
gcj to create other-language bindings from the Java-version, there are
a few issues that IMO make a strong case for CLucene:

* at best gcj is difficult to use; but on Windows & MacOS it is quite
involved and difficult. For me it was nearly impossible as I'm no
gcc/gcj expert

* it prevents or at least makes it extremely difficult to create
certain bindings such as COM and C# (perhaps except mono) as MingW is
not easily combined with VisualC++ AFAIK. And I don't think that there
is any chance of debugging such a combination when a problem arises.

* the amount of work necessary to swig-wrap the gcj-compiled Lucene to
a given target language is immense - just have a look at the swig file
of PyLucene and the Makefile to make the magic happen; I think this
must be a nightmare to maintain. I cannot really tell what amount of
work would be necessary for CLucene but since it is a straight C++
library and built with swig in mind, I would be surprised if it is not
a lot less

So from a technical point of view, it is my opinion that a pure C++
version is easier to maintain and evolve right now. I also think that
most of the innovation in Lucene is not Java-specific so while it
would be duplicated implementation work, the algorithms are the same
(or near enough). Also, a pure C++ version of Lucene gives it more
momentum IMO in both the Linux world (mbox_lucene or something similar
comes to mind) and the Microsoft world (.Net etc.)

> As for Lucene4C versus CLucene and moving CLucene to Apache - I'll
> let the c-dev@lucene list discuss it. I'm happy to have CLucene at
> Apache too, though it seems simpler for us to only house a single
> implementation in C. The gcj version would be ideal in my mind, but
> I'm also not skilled in gcj (and haven't touched C in decades,
> practically) - so it certainly is up to the actual coders where to go
> with it.

I don't know whether it is a "Lucene4C vs. CLucene" anyway. From what
I understand Lucene4C tries to create a simpler API for Lucene, and
while they are building on top of a gcj-compiled version of Java
Lucene, that is likely not a requirement (I don't think that they want
to expose any of the gcj-generated classes).
Besides, CLucene is quite far so from a practical point of view it
would make sense to use /maintain it. Being the practical guy that I
am, I think that any issues between Lucene4C, PyLucene, CLucene can be
worked out if the developers work together. After all, for all I know
it might even be possible to use a mixture of the Lucene4C API (for
plain C) and the CLucene API (for C++) in front of a gcj-compiled Java
Lucene, and all SWIG wrappers could then be build on top of this API.
At lest technically this is possible and perhaps even feasible.

regards,
Tom