Mailing List Archive

A proper port of Lucene to C#
Hi all,

I'm quite finishing the porting (a *proper* one) of the Apache Jakarta
project Lucene.

I optimized the IO management and refactored most of the code, making
the syntax C#-compliant. By small benchmarks, the performances are
better and the code lines are reduced.

Furthermore, the parsers and lexers are created by the MinosseCC (a
porting of JavaCC), thing that gives the oportunity to extend the
grammars provided with the original Lucene.

I want to release the port project under APL and would like it to be
hosted and mantained: I have no time for the development of the code. I
decided porting the project for commercial pourposes (needed to index
millions of data entries), being the previous port of Lucene (dotLucene)
not so good-looking and a little slow (for my needings).

In the future, I would like to embed part of the code of this project
(in a redux subset) within my project Minosse RDBMS
(http://www.minosse.com): because of this reason, I would like it to be
mantained and developed during the time.



Thanks for the attenction.
Antonello
Re: A proper port of Lucene to C# [ In reply to ]
Antonello,

Welcome!

On Jun 3, 2005, at 4:53 AM, Antonello Provenzano wrote:
> I'm quite finishing the porting (a *proper* one) of the Apache
> Jakarta project Lucene.

What version of Lucene did you port? How can you ever be "finished"
since Lucene is continually evolving?

> I want to release the port project under APL

Do you mean the ASL (Apache Software License)?

> and would like it to be hosted and mantained: I have no time for
> the development of the code.

Well now that's a problem for bringing your code into Apache. ASF is
about community over code. We are not a dumping ground for
unmaintained code without a surrounding community. Are there other
developers on your codebase? Have you communicated with George
Aroush about combining efforts and producing a single top-quality C#
port? George has begun, though currently stalled, the process of
bringing his C# port to the Apache Incubator. Only if you can
nurture the code through this process and build a community around
the project will it be possible to bring it into Apache.

> I decided porting the project for commercial pourposes (needed to
> index millions of data entries), being the previous port of Lucene
> (dotLucene) not so good-looking and a little slow (for my needings).

Again, I'm curious about the communication you had with the
developers of the other C# codebase(s). Did you make improvements
known to them?

Erik
Re: A proper port of Lucene to C# [ In reply to ]
Erik Hatcher ha scritto:

> Antonello,
>
> Welcome!
>
> On Jun 3, 2005, at 4:53 AM, Antonello Provenzano wrote:
>
>> I'm quite finishing the porting (a *proper* one) of the Apache
>> Jakarta project Lucene.
>
>
> What version of Lucene did you port? How can you ever be "finished"
> since Lucene is continually evolving?

I'm porting the version 1.4.3. I'm a developer as you are: I know
perfectly a software is always in-motion and cannot finish until it's
discontinued!

>> I want to release the port project under APL
>
>
> Do you mean the ASL (Apache Software License)?
>
Right.

>> and would like it to be hosted and mantained: I have no time for the
>> development of the code.
>
>
> Well now that's a problem for bringing your code into Apache. ASF is
> about community over code. We are not a dumping ground for
> unmaintained code without a surrounding community. Are there other
> developers on your codebase? Have you communicated with George
> Aroush about combining efforts and producing a single top-quality C#
> port? George has begun, though currently stalled, the process of
> bringing his C# port to the Apache Incubator. Only if you can
> nurture the code through this process and build a community around
> the project will it be possible to bring it into Apache.

Ok... I pourposed the port to the #Dashboard community, being a Mono
developer: they suggested me to pourpose it to the Lucene community too,
saying you would be interested in hosting it.
Anyway, the #Dashboard developers would be interested in the development.
The implementation I downloaded and compared against is the dotLucene
(used by #dashboard and beagle too): I was unable to find the source
code for the George Aroush's Lucene.NET. Then, I contacted the author of
the dotLucene project, for replacing his code with mine. I'm still
waiting for an answer...
Anyway, I'm interested in contacting George, for pourposing him the
code: I really have no time for the development of this project too, and
would be a good solution to give him for the development.

>
>> I decided porting the project for commercial pourposes (needed to
>> index millions of data entries), being the previous port of Lucene
>> (dotLucene) not so good-looking and a little slow (for my needings).
>
>
> Again, I'm curious about the communication you had with the
> developers of the other C# codebase(s). Did you make improvements
> known to them?
>
> Erik
>
>
>
>
Re: A proper port of Lucene to C# [ In reply to ]
Antonello Provenzano wrote:

> Erik Hatcher ha scritto:
>
>> On Jun 3, 2005, at 4:53 AM, Antonello Provenzano wrote:
>>
>>> I'm quite finishing the porting (a *proper* one) of the Apache
>>> Jakarta project Lucene.
>>
>> What version of Lucene did you port? How can you ever be "finished"
>> since Lucene is continually evolving?
>
> I'm porting the version 1.4.3. I'm a developer as you are: I know
> perfectly a software is always in-motion and cannot finish until it's
> discontinued!
>
>>> and would like it to be hosted and mantained: I have no time for
>>> the development of the code. .
>>
I have recently tested yet another technique for using Lucene in .Net
that resolves these two problems and others. I.e., with no effort I can
use the latest Java source code pulled from subversion, including all
patches in bugzilla. Also, it is possible to submit patches back to the
main java codeline. The approach is to use IKVM for byte-code
translation, leaving the source in Java. I tried this in both Microsoft
.Net and Mono. On the Microsoft platform, all unit tests passed except
for TestDateTools (and DateTools are only used by apps, not directly by
Lucene itself). The bug there turned out to be an issue in the Calendar
class in GNU.Classpath. I reported this and the Classpath guys have
fixed it, so as soon as the fix propagates to IKVM, all unit tests will
pass.

In Mono however, many tests fail. I think many bugs remain in Mono.

I've not done stress testing yet, but performance on the unit test suite
seems faster than in Java. Running the unit tests requires JUnit, which
ikvm's into .Net byte code without issue.

There are disadvantages to this approach as well, but the advantages of
access to latest code, zero effort, and participation in the Java Lucene
community are strong positives. Integration with Lucene from C#.Net is
not quite as good as the source code ports, e.g. names remain in Java
conventions, but otherwise integration is fairly seamless. The main
disadvantages are with debugging (e.g., can't step into Lucene from
C#.Net app in Visual Studio) and with reverse integration (can't easily
code a call from Lucene, e.g. a patch, into C#; it is possible to
subclass Lucene in C#). If one does Lucene patches and extensions in
Java, all works well. Plus, that allows for contributing them back to
the main codeline.

> Ok... I pourposed the port to the #Dashboard community, being a Mono
> developer: they suggested me to pourpose it to the Lucene community
> too, saying you would be interested in hosting it.

Antonello, have you run the unit tests on your port in Mono? If that
works, it would be a most exciting development. Is your code posted
somewhere?

> Anyway, the #Dashboard developers would be interested in the development.

Can you provide a pointer to these guys?

> The implementation I downloaded and compared against is the dotLucene
> (used by #dashboard and beagle too): I was unable to find the source
> code for the George Aroush's Lucene.NET. Then, I contacted the author
> of the dotLucene project, for replacing his code with mine. I'm still
> waiting for an answer...

George has many tools that help to automate his port. Also, he has just
completed an initial port of the latest 1.9 code base, which some of us
need. What porting approach do you use -- how much manual effort is
involved?

Chuck
RE: A proper port of Lucene to C# [ In reply to ]
> (used by #dashboard and beagle too): I was unable to find the source
> code for the George Aroush's Lucene.NET.

dotLucene is George's Lucene.NET project. He just recently posted the
latest port of 1.9RC1; you can download it here:

http://sourceforge.net/projects/dotlucene/

I know he's actively soliciting any help on the port; he can probably tell
you more.

Monsur
Re: A proper port of Lucene to C# [ In reply to ]
Monsur Hossain ha scritto:

>>(used by #dashboard and beagle too): I was unable to find the source
>>code for the George Aroush's Lucene.NET.
>>
>>
>
>dotLucene is George's Lucene.NET project. He just recently posted the
>latest port of 1.9RC1; you can download it here:
>
>http://sourceforge.net/projects/dotlucene/
>
>I know he's actively soliciting any help on the port; he can probably tell
>you more.
>
>Monsur
>
>
>
Thank you for the hint Monsur. Anyway, I already wrote to the author of
dotLucene (?George?), offering my help and the code I developed.As I
said in the previous post, I'm still waiting for an answer.


Antonello