Mailing List Archive

Re: Ruby & Lucene & ApacheCon
At ApacheCon EU I roped one of the most productive developers (Thomas
Dudziak) I know (who also has SWIG experience =) into the Ruby/Lucene
thing. Anyway, he's had little success with gcj and lucene4c thus far
(lucene4c isn't quite complete enough, and as Garrett knows (and said
he's working on) kind of tough to build.

Anyway, Thomas went and in an afternoon put SWIG bindings around
CLucene =)

Now, the more fun part, Ben (whose email I don't have) of CLucene
would like to move the project to Apache =)

Thoughts?

-Brian

On Aug 8, 2005, at 6:34 AM, Thomas Dudziak wrote:

> Hi,
>
> after much tinkering and installing/reinstalling gcc/gcj (3.4.3 and
> 4.0.1) I finally got a combo of ruby+swig+gcj to compile, only to be
> stopped dead by a internal compiler error of GCJ. I honestly don't
> know why this works for PyLucene (which btw. I didnt' get to compile
> because the mac version of Python is 2.3 whereas PyLucene seems to
> require 2.4).
>
> And then yesterday by chance I spotted a mail by Ben van Klinken on
> the SWIG mailing list who is the lead developer of the CLucene project
> (http://clucene.sourceforge.net/), a full C++ port of Lucene. So I
> fired up an email to him and he told me that they've rewritten CLucene
> to be easily usable with SWIG (currently he's doing a C# and COM
> wrapper for CLucene) and they already have more or less the
> functionality as Lucene 1.4.3.
>
> So I decided to give it a try, and after about half an hour I not only
> had CLucene compiled and linked, but also a basic SWIG ruby wrapper
> around one of the helper classes of CLucene (compared to about a week
> for the same using gcj).
>
> The interesting thing now is that they'd like to move to Apache, they
> even proposed incubation
> (http://clucene.sourceforge.net/incubatorproposal.htm) though they
> seem to be missing a sponsor (Erik didn't answer as far as I could see
> on the Lucene dev mailing list).
> I'd very much like to use CLucene as the basis for the ruby binding
> (and Ben is quite willing to help with any SWIG wrappers and C++
> issues), so my question is: could you talk to Erik as to whether it
> would be possible to accept the incubation proposal (via sponsoring by
> the Lucene PMC) ? From what I saw so far of CLucene, I might be able
> manage to create a ruby binding of the querying in August, which would
> be a good start for the RubyLucene repository.
Re: Ruby & Lucene & ApacheCon [ In reply to ]
Thomas - have you had a look at PyLucene and how they do the gcj/SWIG
wizardry? What kinds of issues did you encounter with gcj? Perhaps
Andi Vajda from PyLucene could offer some advice?

I'd rather see the gcj/SWIG approach moving forward so that SWIG
Lucene doesn't lag behind Java Lucene where all the innovation happens.

As for Lucene4C versus CLucene and moving CLucene to Apache - I'll
let the c-dev@lucene list discuss it. I'm happy to have CLucene at
Apache too, though it seems simpler for us to only house a single
implementation in C. The gcj version would be ideal in my mind, but
I'm also not skilled in gcj (and haven't touched C in decades,
practically) - so it certainly is up to the actual coders where to go
with it.

Erik


On Aug 8, 2005, at 8:36 AM, Brian McCallister wrote:

> At ApacheCon EU I roped one of the most productive developers
> (Thomas Dudziak) I know (who also has SWIG experience =) into the
> Ruby/Lucene thing. Anyway, he's had little success with gcj and
> lucene4c thus far (lucene4c isn't quite complete enough, and as
> Garrett knows (and said he's working on) kind of tough to build.
>
> Anyway, Thomas went and in an afternoon put SWIG bindings around
> CLucene =)
>
> Now, the more fun part, Ben (whose email I don't have) of CLucene
> would like to move the project to Apache =)
>
> Thoughts?
>
> -Brian
>
> On Aug 8, 2005, at 6:34 AM, Thomas Dudziak wrote:
>
>
>> Hi,
>>
>> after much tinkering and installing/reinstalling gcc/gcj (3.4.3 and
>> 4.0.1) I finally got a combo of ruby+swig+gcj to compile, only to be
>> stopped dead by a internal compiler error of GCJ. I honestly don't
>> know why this works for PyLucene (which btw. I didnt' get to compile
>> because the mac version of Python is 2.3 whereas PyLucene seems to
>> require 2.4).
>>
>> And then yesterday by chance I spotted a mail by Ben van Klinken on
>> the SWIG mailing list who is the lead developer of the CLucene
>> project
>> (http://clucene.sourceforge.net/), a full C++ port of Lucene. So I
>> fired up an email to him and he told me that they've rewritten
>> CLucene
>> to be easily usable with SWIG (currently he's doing a C# and COM
>> wrapper for CLucene) and they already have more or less the
>> functionality as Lucene 1.4.3.
>>
>> So I decided to give it a try, and after about half an hour I not
>> only
>> had CLucene compiled and linked, but also a basic SWIG ruby wrapper
>> around one of the helper classes of CLucene (compared to about a week
>> for the same using gcj).
>>
>> The interesting thing now is that they'd like to move to Apache, they
>> even proposed incubation
>> (http://clucene.sourceforge.net/incubatorproposal.htm) though they
>> seem to be missing a sponsor (Erik didn't answer as far as I could
>> see
>> on the Lucene dev mailing list).
>> I'd very much like to use CLucene as the basis for the ruby binding
>> (and Ben is quite willing to help with any SWIG wrappers and C++
>> issues), so my question is: could you talk to Erik as to whether it
>> would be possible to accept the incubation proposal (via
>> sponsoring by
>> the Lucene PMC) ? From what I saw so far of CLucene, I might be able
>> manage to create a ruby binding of the querying in August, which
>> would
>> be a good start for the RubyLucene repository.
>
Re: Ruby & Lucene & ApacheCon [ In reply to ]
On 8/8/05, Erik Hatcher <erik@ehatchersolutions.com> wrote:
> Thomas - have you had a look at PyLucene and how they do the gcj/SWIG
> wizardry? What kinds of issues did you encounter with gcj? Perhaps
> Andi Vajda from PyLucene could offer some advice?
>
> I'd rather see the gcj/SWIG approach moving forward so that SWIG
> Lucene doesn't lag behind Java Lucene where all the innovation happens.

Yep, I tried to compile PyLucene on my Mac, but it failed because of
the Python version that comes with Mac OS 10.4 (which is 2.3). To be
fair to PyLucene, I only tried for a couple of hours as I don't really
have an interest in Python, I actually only wanted to see how they use
gcj.
But aside from that, I tried the PyLucene way first for a whole week.
First the issue of getting to run gcj on Mac OS X which ain't easy at
all - I had to install darwinports with a fresh gcc. Getting gcj to
run over Lucene is easy, works out of the box. But linking ruby with
swig-wrapped gcj-compiled lucene is not, all I got is a gcj internal
compiler error (with both gcc/gcj 3.4.3 and 4.0.1). This bug is in the
gcc bug list marked as a regression.
On Windows I had a similar amount of trouble using both MingW and
cygwin; I wasn't able to compile & link the stuff against ruby.

So to summarize, while there is definitely a strong argument for using
gcj to create other-language bindings from the Java-version, there are
a few issues that IMO make a strong case for CLucene:

* at best gcj is difficult to use; but on Windows & MacOS it is quite
involved and difficult. For me it was nearly impossible as I'm no
gcc/gcj expert

* it prevents or at least makes it extremely difficult to create
certain bindings such as COM and C# (perhaps except mono) as MingW is
not easily combined with VisualC++ AFAIK. And I don't think that there
is any chance of debugging such a combination when a problem arises.

* the amount of work necessary to swig-wrap the gcj-compiled Lucene to
a given target language is immense - just have a look at the swig file
of PyLucene and the Makefile to make the magic happen; I think this
must be a nightmare to maintain. I cannot really tell what amount of
work would be necessary for CLucene but since it is a straight C++
library and built with swig in mind, I would be surprised if it is not
a lot less

So from a technical point of view, it is my opinion that a pure C++
version is easier to maintain and evolve right now. I also think that
most of the innovation in Lucene is not Java-specific so while it
would be duplicated implementation work, the algorithms are the same
(or near enough). Also, a pure C++ version of Lucene gives it more
momentum IMO in both the Linux world (mbox_lucene or something similar
comes to mind) and the Microsoft world (.Net etc.)

> As for Lucene4C versus CLucene and moving CLucene to Apache - I'll
> let the c-dev@lucene list discuss it. I'm happy to have CLucene at
> Apache too, though it seems simpler for us to only house a single
> implementation in C. The gcj version would be ideal in my mind, but
> I'm also not skilled in gcj (and haven't touched C in decades,
> practically) - so it certainly is up to the actual coders where to go
> with it.

I don't know whether it is a "Lucene4C vs. CLucene" anyway. From what
I understand Lucene4C tries to create a simpler API for Lucene, and
while they are building on top of a gcj-compiled version of Java
Lucene, that is likely not a requirement (I don't think that they want
to expose any of the gcj-generated classes).
Besides, CLucene is quite far so from a practical point of view it
would make sense to use /maintain it. Being the practical guy that I
am, I think that any issues between Lucene4C, PyLucene, CLucene can be
worked out if the developers work together. After all, for all I know
it might even be possible to use a mixture of the Lucene4C API (for
plain C) and the CLucene API (for C++) in front of a gcj-compiled Java
Lucene, and all SWIG wrappers could then be build on top of this API.
At lest technically this is possible and perhaps even feasible.

regards,
Tom