Mailing List Archive

saurabhtiwaririshi at gmail

Why not spend the effort improving Lucene?

John
On Nov 14, 2014 5:46 PM, "swsong_dev" <swsong_dev@websqrd.com> wrote:

> Iâ€™m developing search engine, Fastcatsearch. http://github
> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>
> Lucene is widely known and famous project and I cannot beat Lucene for now.
>
> But is there any chance to beat Lucene?
>
> Anything like features, performance.
>
> Please, let me know what to do to make better product than Lucene.
>
> Thank you.

Re: How can I make better project than Lucene? [ In reply to ]

Nov 14, 2014, 11:09 PM

Post #3 of 25 (3334 views)

+1 John wang

On Saturday, November 15, 2014, John Wang <john.wang@gmail.com> wrote:

> Why not spend the effort improving Lucene?
>
> John
> On Nov 14, 2014 5:46 PM, "swsong_dev" <swsong_dev@websqrd.com
> <javascript:;>> wrote:
>
> > Iâ€™m developing search engine, Fastcatsearch. http://github
> > <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
> >
> > Lucene is widely known and famous project and I cannot beat Lucene for
> now.
> >
> > But is there any chance to beat Lucene?
> >
> > Anything like features, performance.
> >
> > Please, let me know what to do to make better product than Lucene.
> >
> > Thank you.
>

--
Saurabh

Re: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 3:22 AM

Post #4 of 25 (3323 views)

Actually I think competing projects is very healthy for open source development.

There are many things you could explore to "contrast" with Lucene,
e.g. write your new search engine in Go not Java: Java has many
problems, maybe Go fixes them. Go also has a low-latency garbage
collector in development ... and Java's GC options still can't scale
to the heap sizes that are practical now.

Lucene has many limitations, so your competing engine could focus on
them. E.g. the "schemalessness" of Lucene has become a big problem,
and near impossible to fix at this point, and prevents new important
features like LUCENE-5879 from being possible, so you could give your
engine a "gentle" schema from the start.

The Lucene Filter/Query situation is a mess: one should extend the other.

Lucene has weak support for proximity queries (SpanQuery is slow and
does not get much attention).

Lucene is showing its age, missing some compelling features like a
builtin transaction log, "core" support for numerics (they are sort of
hacked on top), optimistic concurrency support (sequence ids,
versions, something), distributed support (near real time replication,
etc.), multi-tenancy, an example server implementation, so the search
servers on top of Lucene have had to fill these gaps. Maybe you could
make your engine distributed from the start (Go is a great match for
that, from what little I know).

All 3 highlighter options have problems.

The analysis chain (attributes) is overly complex.

In your competing engine you can borrow/copy/steal from Lucene's good
parts to get started...

Mike McCandless

http://blog.mikemccandless.com

On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
> Iâ€™m developing search engine, Fastcatsearch. http://github <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>
> Lucene is widely known and famous project and I cannot beat Lucene for now.
>
> But is there any chance to beat Lucene?
>
> Anything like features, performance.
>
> Please, let me know what to do to make better product than Lucene.
>
> Thank you.

RE: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 3:40 AM

Post #5 of 25 (3325 views)

Thank you for a solid list of things to look into in Lucene Michael.

If the poster wants to make a better lucene, I presume he has considered the logistics and science needed; a rather daunting view imo.

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: Saturday, November 15, 2014 6:22 AM
To: general@lucene.apache.org
Subject: Re: How can I make better project than Lucene?

Actually I think competing projects is very healthy for open source development.

There are many things you could explore to "contrast" with Lucene, e.g. write your new search engine in Go not Java: Java has many problems, maybe Go fixes them. Go also has a low-latency garbage collector in development ... and Java's GC options still can't scale to the heap sizes that are practical now.

Lucene has many limitations, so your competing engine could focus on them. E.g. the "schemalessness" of Lucene has become a big problem, and near impossible to fix at this point, and prevents new important features like LUCENE-5879 from being possible, so you could give your engine a "gentle" schema from the start.

The Lucene Filter/Query situation is a mess: one should extend the other.

Lucene has weak support for proximity queries (SpanQuery is slow and does not get much attention).

Lucene is showing its age, missing some compelling features like a builtin transaction log, "core" support for numerics (they are sort of hacked on top), optimistic concurrency support (sequence ids, versions, something), distributed support (near real time replication, etc.), multi-tenancy, an example server implementation, so the search servers on top of Lucene have had to fill these gaps. Maybe you could make your engine distributed from the start (Go is a great match for that, from what little I know).

All 3 highlighter options have problems.

The analysis chain (attributes) is overly complex.

In your competing engine you can borrow/copy/steal from Lucene's good parts to get started...

Mike McCandless

http://blog.mikemccandless.com

On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
> Iâ€™m developing search engine, Fastcatsearch. http://github
> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>
> Lucene is widely known and famous project and I cannot beat Lucene for now.
>
> But is there any chance to beat Lucene?
>
> Anything like features, performance.
>
> Please, let me know what to do to make better product than Lucene.
>
> Thank you.

RE: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 3:42 AM

Post #6 of 25 (3323 views)

Btw: SwSong should not steal code; which implies an existing license whose terms he is willing to break. Not a good first step. ;-)

will

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: Saturday, November 15, 2014 6:22 AM
To: general@lucene.apache.org
Subject: Re: How can I make better project than Lucene?

Actually I think competing projects is very healthy for open source development.

There are many things you could explore to "contrast" with Lucene, e.g. write your new search engine in Go not Java: Java has many problems, maybe Go fixes them. Go also has a low-latency garbage collector in development ... and Java's GC options still can't scale to the heap sizes that are practical now.

Lucene has many limitations, so your competing engine could focus on them. E.g. the "schemalessness" of Lucene has become a big problem, and near impossible to fix at this point, and prevents new important features like LUCENE-5879 from being possible, so you could give your engine a "gentle" schema from the start.

The Lucene Filter/Query situation is a mess: one should extend the other.

Lucene has weak support for proximity queries (SpanQuery is slow and does not get much attention).

Lucene is showing its age, missing some compelling features like a builtin transaction log, "core" support for numerics (they are sort of hacked on top), optimistic concurrency support (sequence ids, versions, something), distributed support (near real time replication, etc.), multi-tenancy, an example server implementation, so the search servers on top of Lucene have had to fill these gaps. Maybe you could make your engine distributed from the start (Go is a great match for that, from what little I know).

All 3 highlighter options have problems.

The analysis chain (attributes) is overly complex.

In your competing engine you can borrow/copy/steal from Lucene's good parts to get started...

Mike McCandless

http://blog.mikemccandless.com

On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
> Iâ€™m developing search engine, Fastcatsearch. http://github
> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>
> Lucene is widely known and famous project and I cannot beat Lucene for now.
>
> But is there any chance to beat Lucene?
>
> Anything like features, performance.
>
> Please, let me know what to do to make better product than Lucene.
>
> Thank you.

Re: How can I make better project than Lucene? [ In reply to ]

sivatumma at gmail

Nov 15, 2014, 5:06 AM

Post #7 of 25 (3324 views)

To build such a big product, One would obviously attribute the license.

Sent from iPhone

> On 15-Nov-2014, at 5:12 pm, Will Martin <wmartinusa@gmail.com> wrote:
>
> Btw: SwSong should not steal code; which implies an existing license whose terms he is willing to break. Not a good first step. ;-)
>
> will
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Saturday, November 15, 2014 6:22 AM
> To: general@lucene.apache.org
> Subject: Re: How can I make better project than Lucene?
>
> Actually I think competing projects is very healthy for open source development.
>
> There are many things you could explore to "contrast" with Lucene, e.g. write your new search engine in Go not Java: Java has many problems, maybe Go fixes them. Go also has a low-latency garbage collector in development ... and Java's GC options still can't scale to the heap sizes that are practical now.
>
> Lucene has many limitations, so your competing engine could focus on them. E.g. the "schemalessness" of Lucene has become a big problem, and near impossible to fix at this point, and prevents new important features like LUCENE-5879 from being possible, so you could give your engine a "gentle" schema from the start.
>
> The Lucene Filter/Query situation is a mess: one should extend the other.
>
> Lucene has weak support for proximity queries (SpanQuery is slow and does not get much attention).
>
> Lucene is showing its age, missing some compelling features like a builtin transaction log, "core" support for numerics (they are sort of hacked on top), optimistic concurrency support (sequence ids, versions, something), distributed support (near real time replication, etc.), multi-tenancy, an example server implementation, so the search servers on top of Lucene have had to fill these gaps. Maybe you could make your engine distributed from the start (Go is a great match for that, from what little I know).
>
> All 3 highlighter options have problems.
>
> The analysis chain (attributes) is overly complex.
>
> In your competing engine you can borrow/copy/steal from Lucene's good parts to get started...
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
>> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>> Iâ€™m developing search engine, Fastcatsearch. http://github
>> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>>
>> Lucene is widely known and famous project and I cannot beat Lucene for now.
>>
>> But is there any chance to beat Lucene?
>>
>> Anything like features, performance.
>>
>> Please, let me know what to do to make better product than Lucene.
>>
>> Thank you.
>

Re: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 5:47 AM

Post #8 of 25 (3323 views)

Well the Apache Software License is very generous about poaching.

"Your ideas will go further if you don't insist on going with them."

Mike McCandless

http://blog.mikemccandless.com

On Sat, Nov 15, 2014 at 6:42 AM, Will Martin <wmartinusa@gmail.com> wrote:
> Btw: SwSong should not steal code; which implies an existing license whose terms he is willing to break. Not a good first step. ;-)
>
> will
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Saturday, November 15, 2014 6:22 AM
> To: general@lucene.apache.org
> Subject: Re: How can I make better project than Lucene?
>
> Actually I think competing projects is very healthy for open source development.
>
> There are many things you could explore to "contrast" with Lucene, e.g. write your new search engine in Go not Java: Java has many problems, maybe Go fixes them. Go also has a low-latency garbage collector in development ... and Java's GC options still can't scale to the heap sizes that are practical now.
>
> Lucene has many limitations, so your competing engine could focus on them. E.g. the "schemalessness" of Lucene has become a big problem, and near impossible to fix at this point, and prevents new important features like LUCENE-5879 from being possible, so you could give your engine a "gentle" schema from the start.
>
> The Lucene Filter/Query situation is a mess: one should extend the other.
>
> Lucene has weak support for proximity queries (SpanQuery is slow and does not get much attention).
>
> Lucene is showing its age, missing some compelling features like a builtin transaction log, "core" support for numerics (they are sort of hacked on top), optimistic concurrency support (sequence ids, versions, something), distributed support (near real time replication, etc.), multi-tenancy, an example server implementation, so the search servers on top of Lucene have had to fill these gaps. Maybe you could make your engine distributed from the start (Go is a great match for that, from what little I know).
>
> All 3 highlighter options have problems.
>
> The analysis chain (attributes) is overly complex.
>
> In your competing engine you can borrow/copy/steal from Lucene's good parts to get started...
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>> Iâ€™m developing search engine, Fastcatsearch. http://github
>> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>>
>> Lucene is widely known and famous project and I cannot beat Lucene for now.
>>
>> But is there any chance to beat Lucene?
>>
>> Anything like features, performance.
>>
>> Please, let me know what to do to make better product than Lucene.
>>
>> Thank you.
>

RE: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 5:53 AM

Post #9 of 25 (3331 views)

Um, doesn't the Apache license require inclusion of the license? Just sayin'

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: Saturday, November 15, 2014 8:47 AM
To: general@lucene.apache.org
Subject: Re: How can I make better project than Lucene?

Well the Apache Software License is very generous about poaching.

"Your ideas will go further if you don't insist on going with them."

Mike McCandless

http://blog.mikemccandless.com

On Sat, Nov 15, 2014 at 6:42 AM, Will Martin <wmartinusa@gmail.com> wrote:
> Btw: SwSong should not steal code; which implies an existing license whose terms he is willing to break. Not a good first step. ;-)
>
> will
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Saturday, November 15, 2014 6:22 AM
> To: general@lucene.apache.org
> Subject: Re: How can I make better project than Lucene?
>
> Actually I think competing projects is very healthy for open source development.
>
> There are many things you could explore to "contrast" with Lucene, e.g. write your new search engine in Go not Java: Java has many problems, maybe Go fixes them. Go also has a low-latency garbage collector in development ... and Java's GC options still can't scale to the heap sizes that are practical now.
>
> Lucene has many limitations, so your competing engine could focus on them. E.g. the "schemalessness" of Lucene has become a big problem, and near impossible to fix at this point, and prevents new important features like LUCENE-5879 from being possible, so you could give your engine a "gentle" schema from the start.
>
> The Lucene Filter/Query situation is a mess: one should extend the other.
>
> Lucene has weak support for proximity queries (SpanQuery is slow and does not get much attention).
>
> Lucene is showing its age, missing some compelling features like a builtin transaction log, "core" support for numerics (they are sort of hacked on top), optimistic concurrency support (sequence ids, versions, something), distributed support (near real time replication, etc.), multi-tenancy, an example server implementation, so the search servers on top of Lucene have had to fill these gaps. Maybe you could make your engine distributed from the start (Go is a great match for that, from what little I know).
>
> All 3 highlighter options have problems.
>
> The analysis chain (attributes) is overly complex.
>
> In your competing engine you can borrow/copy/steal from Lucene's good parts to get started...
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>> Iâ€™m developing search engine, Fastcatsearch. http://github
>> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>>
>> Lucene is widely known and famous project and I cannot beat Lucene for now.
>>
>> But is there any chance to beat Lucene?
>>
>> Anything like features, performance.
>>
>> Please, let me know what to do to make better product than Lucene.
>>
>> Thank you.
>

Re: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 5:56 AM

Post #10 of 25 (3324 views)

Yes it does.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Nov 15, 2014 at 8:53 AM, Will Martin <wmartinusa@gmail.com> wrote:
> Um, doesn't the Apache license require inclusion of the license? Just sayin'
>
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Saturday, November 15, 2014 8:47 AM
> To: general@lucene.apache.org
> Subject: Re: How can I make better project than Lucene?
>
> Well the Apache Software License is very generous about poaching.
>
> "Your ideas will go further if you don't insist on going with them."
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Nov 15, 2014 at 6:42 AM, Will Martin <wmartinusa@gmail.com> wrote:
>> Btw: SwSong should not steal code; which implies an existing license whose terms he is willing to break. Not a good first step. ;-)
>>
>> will
>>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Saturday, November 15, 2014 6:22 AM
>> To: general@lucene.apache.org
>> Subject: Re: How can I make better project than Lucene?
>>
>> Actually I think competing projects is very healthy for open source development.
>>
>> There are many things you could explore to "contrast" with Lucene, e.g. write your new search engine in Go not Java: Java has many problems, maybe Go fixes them. Go also has a low-latency garbage collector in development ... and Java's GC options still can't scale to the heap sizes that are practical now.
>>
>> Lucene has many limitations, so your competing engine could focus on them. E.g. the "schemalessness" of Lucene has become a big problem, and near impossible to fix at this point, and prevents new important features like LUCENE-5879 from being possible, so you could give your engine a "gentle" schema from the start.
>>
>> The Lucene Filter/Query situation is a mess: one should extend the other.
>>
>> Lucene has weak support for proximity queries (SpanQuery is slow and does not get much attention).
>>
>> Lucene is showing its age, missing some compelling features like a builtin transaction log, "core" support for numerics (they are sort of hacked on top), optimistic concurrency support (sequence ids, versions, something), distributed support (near real time replication, etc.), multi-tenancy, an example server implementation, so the search servers on top of Lucene have had to fill these gaps. Maybe you could make your engine distributed from the start (Go is a great match for that, from what little I know).
>>
>> All 3 highlighter options have problems.
>>
>> The analysis chain (attributes) is overly complex.
>>
>> In your competing engine you can borrow/copy/steal from Lucene's good parts to get started...
>>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>>> Iâ€™m developing search engine, Fastcatsearch. http://github
>>> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>>>
>>> Lucene is widely known and famous project and I cannot beat Lucene for now.
>>>
>>> But is there any chance to beat Lucene?
>>>
>>> Anything like features, performance.
>>>
>>> Please, let me know what to do to make better product than Lucene.
>>>
>>> Thank you.
>>
>

RE: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 6:17 AM

Post #11 of 25 (3325 views)

Comments inline:

===

-----Original Message-----
From: Siva Thumma [mailto:sivatumma@gmail.com]
Sent: Saturday, November 15, 2014 8:06 AM
To: general@lucene.apache.org
Subject: Re: How can I make better project than Lucene?

To build such a big product, One would obviously attribute the license.

Sent from iPhone

> On 15-Nov-2014, at 5:12 pm, Will Martin <wmartinusa@gmail.com> wrote:
>
> Btw: SwSong should not steal code; which implies an existing license whose terms he is willing to break. Not a good first step. ;-)
>
> will
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Saturday, November 15, 2014 6:22 AM
> To: general@lucene.apache.org
> Subject: Re: How can I make better project than Lucene?
>
> Actually I think competing projects is very healthy for open source development.
>
> There are many things you could explore to "contrast" with Lucene, e.g. write your new search engine in Go not Java: Java has many problems, maybe Go fixes them. Go also has a low-latency garbage collector in development ... and Java's GC options still can't scale to the heap sizes that are practical now.

>
> Lucene has many limitations, so your competing engine could focus on them. E.g. the "schemalessness" of Lucene has become a big problem, and near impossible to fix at this point, and prevents new important features like LUCENE-5879 from being possible, so you could give your engine a "gentle" schema from the start.

>
> The Lucene Filter/Query situation is a mess: one should extend the other.
>

> Lucene has weak support for proximity queries (SpanQuery is slow and does not get much attention).
>

> Lucene is showing its age, missing some compelling features like a builtin transaction log, "core" support for numerics (they are sort of hacked on top), optimistic concurrency support (sequence ids, versions, something), distributed support (near real time replication, etc.), multi-tenancy, an example server implementation, so the search servers on top of Lucene have had to fill these gaps. Maybe you could make your engine distributed from the start (Go is a great match for that, from what little I know).
>
> All 3 highlighter options have problems.
>

> The analysis chain (attributes) is overly complex.
>
> In your competing engine you can borrow/copy/steal from Lucene's good parts to get started...
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
>> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>> Iâ€™m developing search engine, Fastcatsearch. http://github
>> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>>
>> Lucene is widely known and famous project and I cannot beat Lucene for now.
>>
>> But is there any chance to beat Lucene?
>>
>> Anything like features, performance.
>>
>> Please, let me know what to do to make better product than Lucene.
>>
>> Thank you.
>

Re: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 1:56 PM

Post #12 of 25 (3323 views)

Yes, I borrows Lucene¡¯s source code(analysis, store, utils ) partly.

My search engine is LGPL, and of course I leaved a Apache license text header in all source codes borrowed from Lucene.

Sang Song

> 2014. 11. 15., ¿ÀÈÄ 10:56, Michael McCandless <lucene@mikemccandless.com> ÀÛ¼º:
>
> Yes it does.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Nov 15, 2014 at 8:53 AM, Will Martin <wmartinusa@gmail.com> wrote:
>> Um, doesn't the Apache license require inclusion of the license? Just sayin'
>>
>>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Saturday, November 15, 2014 8:47 AM
>> To: general@lucene.apache.org
>> Subject: Re: How can I make better project than Lucene?
>>
>> Well the Apache Software License is very generous about poaching.
>>
>> "Your ideas will go further if you don't insist on going with them."
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sat, Nov 15, 2014 at 6:42 AM, Will Martin <wmartinusa@gmail.com> wrote:
>>> Btw: SwSong should not steal code; which implies an existing license whose terms he is willing to break. Not a good first step. ;-)
>>>
>>> will
>>>
>>> -----Original Message-----
>>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>>> Sent: Saturday, November 15, 2014 6:22 AM
>>> To: general@lucene.apache.org
>>> Subject: Re: How can I make better project than Lucene?
>>>
>>> Actually I think competing projects is very healthy for open source development.
>>>
>>> There are many things you could explore to "contrast" with Lucene, e.g. write your new search engine in Go not Java: Java has many problems, maybe Go fixes them. Go also has a low-latency garbage collector in development ... and Java's GC options still can't scale to the heap sizes that are practical now.
>>>
>>> Lucene has many limitations, so your competing engine could focus on them. E.g. the "schemalessness" of Lucene has become a big problem, and near impossible to fix at this point, and prevents new important features like LUCENE-5879 from being possible, so you could give your engine a "gentle" schema from the start.
>>>
>>> The Lucene Filter/Query situation is a mess: one should extend the other.
>>>
>>> Lucene has weak support for proximity queries (SpanQuery is slow and does not get much attention).
>>>
>>> Lucene is showing its age, missing some compelling features like a builtin transaction log, "core" support for numerics (they are sort of hacked on top), optimistic concurrency support (sequence ids, versions, something), distributed support (near real time replication, etc.), multi-tenancy, an example server implementation, so the search servers on top of Lucene have had to fill these gaps. Maybe you could make your engine distributed from the start (Go is a great match for that, from what little I know).
>>>
>>> All 3 highlighter options have problems.
>>>
>>> The analysis chain (attributes) is overly complex.
>>>
>>> In your competing engine you can borrow/copy/steal from Lucene's good parts to get started...
>>>
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>>>> I¡¯m developing search engine, Fastcatsearch. http://github
>>>> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>>>>
>>>> Lucene is widely known and famous project and I cannot beat Lucene for now.
>>>>
>>>> But is there any chance to beat Lucene?
>>>>
>>>> Anything like features, performance.
>>>>
>>>> Please, let me know what to do to make better product than Lucene.
>>>>
>>>> Thank you.
>>>
>>

Re: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 3:23 PM

Post #13 of 25 (3323 views)

Thank you for your sincere reply, Mr. McCandless.

When I posted an email in a mailing-list, I was afraid for not getting a considerable reply, but I¡¯m now so glad I might find a way.

I agree that a new search engine in Go would be competitive. I think we all need a next generation search engine core redesigned from the start.

And, I understand Lucene¡¯s limitations you mentioned. They are good points to get started.

I have been used a search engine for first 4 years and developed a search engines for last 6 years from the bottom, and I got feedback ¡°It¡¯s faster than Solr in indexing and searching¡±. (http://ddakker.tistory.com/248 <http://ddakker.tistory.com/248>)

===Result===
Data size : 529,188
Fastcat indexing time : 1m 26s
Solr3.5 indexing time : 5m 30s
Fastcat searching time : 48ms
Solr3.5 searching time : 73ms

It applied to Korea¡¯s greatest shopping service(http://danawa.com/ <http://danawa.com/>) a month ago to my delight.

But my goal has been making a globally-used open source search engine.

As you suggested, now I want to make a whole-new search engine in Go.

I have made my first search engine alone, but I would not make a new search engine alone. I want to make it with global developers together.

If you plan to make a new search engine in Go, or know someone around you, could you help me gathering members for a new search engine, and guide us technically(feature requirement, efficient design)?

Or if there is already a new search engine project in Go, could you let me know?

In Korea, no one develops a search engine except people who work at a search engine solution company, and even they are very few and do not spend time to an open source project.

In my case, I found a tiny venture company for making time to develop an open source search engine 4 year ago.

I want to be involved in a next-generation search engine project. I would be happy to make a new search engine itself.

Your little help could be great for me.

Thank you.

Sang Song

> 2014. 11. 15., ¿ÀÈÄ 8:22, Michael McCandless <lucene@mikemccandless.com> ÀÛ¼º:
>
> Actually I think competing projects is very healthy for open source development.
>
> There are many things you could explore to "contrast" with Lucene,
> e.g. write your new search engine in Go not Java: Java has many
> problems, maybe Go fixes them. Go also has a low-latency garbage
> collector in development ... and Java's GC options still can't scale
> to the heap sizes that are practical now.
>
> Lucene has many limitations, so your competing engine could focus on
> them. E.g. the "schemalessness" of Lucene has become a big problem,
> and near impossible to fix at this point, and prevents new important
> features like LUCENE-5879 from being possible, so you could give your
> engine a "gentle" schema from the start.
>
> The Lucene Filter/Query situation is a mess: one should extend the other.
>
> Lucene has weak support for proximity queries (SpanQuery is slow and
> does not get much attention).
>
> Lucene is showing its age, missing some compelling features like a
> builtin transaction log, "core" support for numerics (they are sort of
> hacked on top), optimistic concurrency support (sequence ids,
> versions, something), distributed support (near real time replication,
> etc.), multi-tenancy, an example server implementation, so the search
> servers on top of Lucene have had to fill these gaps. Maybe you could
> make your engine distributed from the start (Go is a great match for
> that, from what little I know).
>
> All 3 highlighter options have problems.
>
> The analysis chain (attributes) is overly complex.
>
> In your competing engine you can borrow/copy/steal from Lucene's good
> parts to get started...
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>> I¡¯m developing search engine, Fastcatsearch. http://github <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>>
>> Lucene is widely known and famous project and I cannot beat Lucene for now.
>>
>> But is there any chance to beat Lucene?
>>
>> Anything like features, performance.
>>
>> Please, let me know what to do to make better product than Lucene.
>>
>> Thank you.

Re: How can I make better project than Lucene? [ In reply to ]

erik.hatcher at gmail

Nov 15, 2014, 4:31 PM

Post #14 of 25 (3271 views)

I¡¯m curious to see these benchmarks run on the latest Solr version, as the numbers you quoted were over two years ago. Also, it¡¯d be useful to see the indexing and searching benchmark code to make sure it takes advantage of best practices. I¡¯ve indexed 10M docs into Solr in only a few minutes. 500K, say in CSV format, for basic e-commerce product data, would likely take a minute or so. The searching differences you present seem fairly negligible - 100ms is the blink of an eye, so anything under is considered quite acceptable by the largest e-commerce vendors in the world. Along with that, perhaps an even more important benchmark is relevancy or in some way measure how good the search results are.

As Mike put so well, competition is a great thing so by all means I encourage you to carry on with your endeavor. Sounds like you¡¯ve built some powerful stuff and have extensive experience. +1

Erik

On Nov 15, 2014, at 6:23 PM, swsong_dev <swsong_dev@websqrd.com> wrote:

> Thank you for your sincere reply, Mr. McCandless.
>
> When I posted an email in a mailing-list, I was afraid for not getting a considerable reply, but I¡¯m now so glad I might find a way.
>
> I agree that a new search engine in Go would be competitive. I think we all need a next generation search engine core redesigned from the start.
>
> And, I understand Lucene¡¯s limitations you mentioned. They are good points to get started.
>
> I have been used a search engine for first 4 years and developed a search engines for last 6 years from the bottom, and I got feedback ¡°It¡¯s faster than Solr in indexing and searching¡±. (http://ddakker.tistory.com/248 <http://ddakker.tistory.com/248>)
>
> ===Result===
> Data size : 529,188
> Fastcat indexing time : 1m 26s
> Solr3.5 indexing time : 5m 30s
> Fastcat searching time : 48ms
> Solr3.5 searching time : 73ms
>
> It applied to Korea¡¯s greatest shopping service(http://danawa.com/ <http://danawa.com/>) a month ago to my delight.
>
> But my goal has been making a globally-used open source search engine.
>
> As you suggested, now I want to make a whole-new search engine in Go.
>
> I have made my first search engine alone, but I would not make a new search engine alone. I want to make it with global developers together.
>
> If you plan to make a new search engine in Go, or know someone around you, could you help me gathering members for a new search engine, and guide us technically(feature requirement, efficient design)?
>
> Or if there is already a new search engine project in Go, could you let me know?
>
> In Korea, no one develops a search engine except people who work at a search engine solution company, and even they are very few and do not spend time to an open source project.
>
> In my case, I found a tiny venture company for making time to develop an open source search engine 4 year ago.
>
> I want to be involved in a next-generation search engine project. I would be happy to make a new search engine itself.
>
> Your little help could be great for me.
>
> Thank you.
>
> Sang Song
>
>
>> 2014. 11. 15., ¿ÀÈÄ 8:22, Michael McCandless <lucene@mikemccandless.com> ÀÛ¼º:
>>
>> Actually I think competing projects is very healthy for open source development.
>>
>> There are many things you could explore to "contrast" with Lucene,
>> e.g. write your new search engine in Go not Java: Java has many
>> problems, maybe Go fixes them. Go also has a low-latency garbage
>> collector in development ... and Java's GC options still can't scale
>> to the heap sizes that are practical now.
>>
>> Lucene has many limitations, so your competing engine could focus on
>> them. E.g. the "schemalessness" of Lucene has become a big problem,
>> and near impossible to fix at this point, and prevents new important
>> features like LUCENE-5879 from being possible, so you could give your
>> engine a "gentle" schema from the start.
>>
>> The Lucene Filter/Query situation is a mess: one should extend the other.
>>
>> Lucene has weak support for proximity queries (SpanQuery is slow and
>> does not get much attention).
>>
>> Lucene is showing its age, missing some compelling features like a
>> builtin transaction log, "core" support for numerics (they are sort of
>> hacked on top), optimistic concurrency support (sequence ids,
>> versions, something), distributed support (near real time replication,
>> etc.), multi-tenancy, an example server implementation, so the search
>> servers on top of Lucene have had to fill these gaps. Maybe you could
>> make your engine distributed from the start (Go is a great match for
>> that, from what little I know).
>>
>> All 3 highlighter options have problems.
>>
>> The analysis chain (attributes) is overly complex.
>>
>> In your competing engine you can borrow/copy/steal from Lucene's good
>> parts to get started...
>>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>>> I¡¯m developing search engine, Fastcatsearch. http://github <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>>>
>>> Lucene is widely known and famous project and I cannot beat Lucene for now.
>>>
>>> But is there any chance to beat Lucene?
>>>
>>> Anything like features, performance.
>>>
>>> Please, let me know what to do to make better product than Lucene.
>>>
>>> Thank you.
>

RE: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 11:04 PM

Post #15 of 25 (3319 views)

Please let me know if you go the GO route, so I can choose the language I
like best for my work. Wmartinusa at google online mail.

-----Original Message-----
From: swsong_dev [mailto:swsong_dev@websqrd.com]
Sent: Saturday, November 15, 2014 6:23 PM
To: general@lucene.apache.org
Subject: Re: How can I make better project than Lucene?

Thank you for your sincere reply, Mr. McCandless.

When I posted an email in a mailing-list, I was afraid for not getting a
considerable reply, but I¡¯m now so glad I might find a way.

I agree that a new search engine in Go would be competitive. I think we all
need a next generation search engine core redesigned from the start.

And, I understand Lucene¡¯s limitations you mentioned. They are good points
to get started.

I have been used a search engine for first 4 years and developed a search
engines for last 6 years from the bottom, and I got feedback ¡°It¡¯s faster
than Solr in indexing and searching¡±. (http://ddakker.tistory.com/248
<http://ddakker.tistory.com/248>)

===Result===
Data size : 529,188
Fastcat indexing time : 1m 26s
Solr3.5 indexing time : 5m 30s
Fastcat searching time : 48ms
Solr3.5 searching time : 73ms

It applied to Korea¡¯s greatest shopping service(http://danawa.com/
<http://danawa.com/>) a month ago to my delight.

But my goal has been making a globally-used open source search engine.

As you suggested, now I want to make a whole-new search engine in Go.

I have made my first search engine alone, but I would not make a new search
engine alone. I want to make it with global developers together.

If you plan to make a new search engine in Go, or know someone around you,
could you help me gathering members for a new search engine, and guide us
technically(feature requirement, efficient design)?

Or if there is already a new search engine project in Go, could you let me
know?

In Korea, no one develops a search engine except people who work at a
search engine solution company, and even they are very few and do not spend
time to an open source project.

In my case, I found a tiny venture company for making time to develop an
open source search engine 4 year ago.

I want to be involved in a next-generation search engine project. I would
be happy to make a new search engine itself.

Your little help could be great for me.

Thank you.

Sang Song

> 2014. 11. 15., ¿ÀÈÄ 8:22, Michael McCandless <lucene@mikemccandless.com>
ÀÛ¼º:
>
> Actually I think competing projects is very healthy for open source
development.
>
> There are many things you could explore to "contrast" with Lucene,
> e.g. write your new search engine in Go not Java: Java has many
> problems, maybe Go fixes them. Go also has a low-latency garbage
> collector in development ... and Java's GC options still can't scale
> to the heap sizes that are practical now.
>
> Lucene has many limitations, so your competing engine could focus on
> them. E.g. the "schemalessness" of Lucene has become a big problem,
> and near impossible to fix at this point, and prevents new important
> features like LUCENE-5879 from being possible, so you could give your
> engine a "gentle" schema from the start.
>
> The Lucene Filter/Query situation is a mess: one should extend the other.
>
> Lucene has weak support for proximity queries (SpanQuery is slow and
> does not get much attention).
>
> Lucene is showing its age, missing some compelling features like a
> builtin transaction log, "core" support for numerics (they are sort of
> hacked on top), optimistic concurrency support (sequence ids,
> versions, something), distributed support (near real time replication,
> etc.), multi-tenancy, an example server implementation, so the search
> servers on top of Lucene have had to fill these gaps. Maybe you could
> make your engine distributed from the start (Go is a great match for
> that, from what little I know).
>
> All 3 highlighter options have problems.
>
> The analysis chain (attributes) is overly complex.
>
> In your competing engine you can borrow/copy/steal from Lucene's good
> parts to get started...
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com>
wrote:
>> I¡¯m developing search engine, Fastcatsearch. http://github
>> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>>
>> Lucene is widely known and famous project and I cannot beat Lucene for
now.
>>
>> But is there any chance to beat Lucene?
>>
>> Anything like features, performance.
>>
>> Please, let me know what to do to make better product than Lucene.
>>
>> Thank you.

Re: How can I make better project than Lucene? [ In reply to ]

Nov 15, 2014, 11:52 PM

Post #16 of 25 (3319 views)

Do you mean you want to make a new search engine with me in Go?
I don't understand what you mean exactly..

2014ë…„ 11ì›” 16ì¼ ì¼ìš”ì¼, Will Martin<wmartinusa@gmail.com>ë‹˜ì´ ìž‘ì„±í•œ ë©”ì‹œì§€:

> Please let me know if you go the GO route, so I can choose the language I
> like best for my work. Wmartinusa at google online mail.
>
> -----Original Message-----
> From: swsong_dev [mailto:swsong_dev@websqrd.com <javascript:;>]
> Sent: Saturday, November 15, 2014 6:23 PM
> To: general@lucene.apache.org <javascript:;>
> Subject: Re: How can I make better project than Lucene?
>
> Thank you for your sincere reply, Mr. McCandless.
>
> When I posted an email in a mailing-list, I was afraid for not getting a
> considerable reply, but Iâ€™m now so glad I might find a way.
>
> I agree that a new search engine in Go would be competitive. I think we all
> need a next generation search engine core redesigned from the start.
>
> And, I understand Luceneâ€™s limitations you mentioned. They are good points
> to get started.
>
> I have been used a search engine for first 4 years and developed a search
> engines for last 6 years from the bottom, and I got feedback â€œItâ€™s faster
> than Solr in indexing and searchingâ€. (http://ddakker.tistory.com/248
> <http://ddakker.tistory.com/248>)
>
> ===Result===
> Data size : 529,188
> Fastcat indexing time : 1m 26s
> Solr3.5 indexing time : 5m 30s
> Fastcat searching time : 48ms
> Solr3.5 searching time : 73ms
>
> It applied to Koreaâ€™s greatest shopping service(http://danawa.com/
> <http://danawa.com/>) a month ago to my delight.
>
> But my goal has been making a globally-used open source search engine.
>
> As you suggested, now I want to make a whole-new search engine in Go.
>
> I have made my first search engine alone, but I would not make a new search
> engine alone. I want to make it with global developers together.
>
> If you plan to make a new search engine in Go, or know someone around you,
> could you help me gathering members for a new search engine, and guide us
> technically(feature requirement, efficient design)?
>
> Or if there is already a new search engine project in Go, could you let me
> know?
>
> In Korea, no one develops a search engine except people who work at a
> search engine solution company, and even they are very few and do not spend
> time to an open source project.
>
> In my case, I found a tiny venture company for making time to develop an
> open source search engine 4 year ago.
>
> I want to be involved in a next-generation search engine project. I would
> be happy to make a new search engine itself.
>
> Your little help could be great for me.
>
> Thank you.
>
> Sang Song
>
>
> > 2014. 11. 15., ì˜¤í›„ 8:22, Michael McCandless <lucene@mikemccandless.com
> <javascript:;>>
> ìž‘ì„±:
> >
> > Actually I think competing projects is very healthy for open source
> development.
> >
> > There are many things you could explore to "contrast" with Lucene,
> > e.g. write your new search engine in Go not Java: Java has many
> > problems, maybe Go fixes them. Go also has a low-latency garbage
> > collector in development ... and Java's GC options still can't scale
> > to the heap sizes that are practical now.
> >
> > Lucene has many limitations, so your competing engine could focus on
> > them. E.g. the "schemalessness" of Lucene has become a big problem,
> > and near impossible to fix at this point, and prevents new important
> > features like LUCENE-5879 from being possible, so you could give your
> > engine a "gentle" schema from the start.
> >
> > The Lucene Filter/Query situation is a mess: one should extend the other.
> >
> > Lucene has weak support for proximity queries (SpanQuery is slow and
> > does not get much attention).
> >
> > Lucene is showing its age, missing some compelling features like a
> > builtin transaction log, "core" support for numerics (they are sort of
> > hacked on top), optimistic concurrency support (sequence ids,
> > versions, something), distributed support (near real time replication,
> > etc.), multi-tenancy, an example server implementation, so the search
> > servers on top of Lucene have had to fill these gaps. Maybe you could
> > make your engine distributed from the start (Go is a great match for
> > that, from what little I know).
> >
> > All 3 highlighter options have problems.
> >
> > The analysis chain (attributes) is overly complex.
> >
> > In your competing engine you can borrow/copy/steal from Lucene's good
> > parts to get started...
> >
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com
> <javascript:;>>
> wrote:
> >> Iâ€™m developing search engine, Fastcatsearch. http://github
> >> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
> >>
> >> Lucene is widely known and famous project and I cannot beat Lucene for
> now.
> >>
> >> But is there any chance to beat Lucene?
> >>
> >> Anything like features, performance.
> >>
> >> Please, let me know what to do to make better product than Lucene.
> >>
> >> Thank you.
>
>
>

RE: How can I make better project than Lucene? [ In reply to ]

Nov 16, 2014, 1:06 AM

Post #17 of 25 (3325 views)

I don't know what I think about an engine in Go.

:-(

-----Original Message-----
From: sangwook DEV [mailto:swsong_dev@websqrd.com]
Sent: Sunday, November 16, 2014 2:52 AM
To: general@lucene.apache.org
Subject: Re: How can I make better project than Lucene?

Do you mean you want to make a new search engine with me in Go?
I don't understand what you mean exactly..

2014ë…„ 11ì›” 16ì¼ ì¼ìš”ì¼, Will Martin<wmartinusa@gmail.com>ë‹˜ì´ ìž‘ì„±í•œ ë©”ì‹œì§€:

> Please let me know if you go the GO route, so I can choose the language I
> like best for my work. Wmartinusa at google online mail.
>
> -----Original Message-----
> From: swsong_dev [mailto:swsong_dev@websqrd.com <javascript:;>]
> Sent: Saturday, November 15, 2014 6:23 PM
> To: general@lucene.apache.org <javascript:;>
> Subject: Re: How can I make better project than Lucene?
>
> Thank you for your sincere reply, Mr. McCandless.
>
> When I posted an email in a mailing-list, I was afraid for not getting
> a considerable reply, but Iâ€™m now so glad I might find a way.
>
> I agree that a new search engine in Go would be competitive. I think
> we all need a next generation search engine core redesigned from the start.
>
> And, I understand Luceneâ€™s limitations you mentioned. They are good
> points to get started.
>
> I have been used a search engine for first 4 years and developed a
> search engines for last 6 years from the bottom, and I got feedback
> â€œItâ€™s faster than Solr in indexing and searchingâ€.
> (http://ddakker.tistory.com/248
> <http://ddakker.tistory.com/248>)
>
> ===Result===
> Data size : 529,188
> Fastcat indexing time : 1m 26s
> Solr3.5 indexing time : 5m 30s
> Fastcat searching time : 48ms
> Solr3.5 searching time : 73ms
>
> It applied to Koreaâ€™s greatest shopping service(http://danawa.com/
> <http://danawa.com/>) a month ago to my delight.
>
> But my goal has been making a globally-used open source search engine.
>
> As you suggested, now I want to make a whole-new search engine in Go.
>
> I have made my first search engine alone, but I would not make a new
> search engine alone. I want to make it with global developers together.
>
> If you plan to make a new search engine in Go, or know someone around
> you, could you help me gathering members for a new search engine, and
> guide us technically(feature requirement, efficient design)?
>
> Or if there is already a new search engine project in Go, could you
> let me know?
>
> In Korea, no one develops a search engine except people who work at a
> search engine solution company, and even they are very few and do not
> spend time to an open source project.
>
> In my case, I found a tiny venture company for making time to develop
> an open source search engine 4 year ago.
>
> I want to be involved in a next-generation search engine project. I
> would be happy to make a new search engine itself.
>
> Your little help could be great for me.
>
> Thank you.
>
> Sang Song
>
>
> > 2014. 11. 15., ì˜¤í›„ 8:22, Michael McCandless
> > <lucene@mikemccandless.com
> <javascript:;>>
> ìž‘ì„±:
> >
> > Actually I think competing projects is very healthy for open source
> development.
> >
> > There are many things you could explore to "contrast" with Lucene,
> > e.g. write your new search engine in Go not Java: Java has many
> > problems, maybe Go fixes them. Go also has a low-latency garbage
> > collector in development ... and Java's GC options still can't scale
> > to the heap sizes that are practical now.
> >
> > Lucene has many limitations, so your competing engine could focus on
> > them. E.g. the "schemalessness" of Lucene has become a big problem,
> > and near impossible to fix at this point, and prevents new important
> > features like LUCENE-5879 from being possible, so you could give
> > your engine a "gentle" schema from the start.
> >
> > The Lucene Filter/Query situation is a mess: one should extend the other.
> >
> > Lucene has weak support for proximity queries (SpanQuery is slow and
> > does not get much attention).
> >
> > Lucene is showing its age, missing some compelling features like a
> > builtin transaction log, "core" support for numerics (they are sort
> > of hacked on top), optimistic concurrency support (sequence ids,
> > versions, something), distributed support (near real time
> > replication, etc.), multi-tenancy, an example server implementation,
> > so the search servers on top of Lucene have had to fill these gaps.
> > Maybe you could make your engine distributed from the start (Go is a
> > great match for that, from what little I know).
> >
> > All 3 highlighter options have problems.
> >
> > The analysis chain (attributes) is overly complex.
> >
> > In your competing engine you can borrow/copy/steal from Lucene's
> > good parts to get started...
> >
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com
> <javascript:;>>
> wrote:
> >> Iâ€™m developing search engine, Fastcatsearch. http://github
> >> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
> >>
> >> Lucene is widely known and famous project and I cannot beat Lucene
> >> for
> now.
> >>
> >> But is there any chance to beat Lucene?
> >>
> >> Anything like features, performance.
> >>
> >> Please, let me know what to do to make better product than Lucene.
> >>
> >> Thank you.
>
>
>

RE: How can I make better project than Lucene? [ In reply to ]

Nov 16, 2014, 1:07 AM

Post #18 of 25 (3309 views)

Actually I'm more interested in the list Mike wrote about lucene.

-----Original Message-----
From: sangwook DEV [mailto:swsong_dev@websqrd.com]
Sent: Sunday, November 16, 2014 2:52 AM
To: general@lucene.apache.org
Subject: Re: How can I make better project than Lucene?

Do you mean you want to make a new search engine with me in Go?
I don't understand what you mean exactly..

2014ë…„ 11ì›” 16ì¼ ì¼ìš”ì¼, Will Martin<wmartinusa@gmail.com>ë‹˜ì´ ìž‘ì„±í•œ ë©”ì‹œì§€:

> Please let me know if you go the GO route, so I can choose the language I
> like best for my work. Wmartinusa at google online mail.
>
> -----Original Message-----
> From: swsong_dev [mailto:swsong_dev@websqrd.com <javascript:;>]
> Sent: Saturday, November 15, 2014 6:23 PM
> To: general@lucene.apache.org <javascript:;>
> Subject: Re: How can I make better project than Lucene?
>
> Thank you for your sincere reply, Mr. McCandless.
>
> When I posted an email in a mailing-list, I was afraid for not getting
> a considerable reply, but Iâ€™m now so glad I might find a way.
>
> I agree that a new search engine in Go would be competitive. I think
> we all need a next generation search engine core redesigned from the start.
>
> And, I understand Luceneâ€™s limitations you mentioned. They are good
> points to get started.
>
> I have been used a search engine for first 4 years and developed a
> search engines for last 6 years from the bottom, and I got feedback
> â€œItâ€™s faster than Solr in indexing and searchingâ€.
> (http://ddakker.tistory.com/248
> <http://ddakker.tistory.com/248>)
>
> ===Result===
> Data size : 529,188
> Fastcat indexing time : 1m 26s
> Solr3.5 indexing time : 5m 30s
> Fastcat searching time : 48ms
> Solr3.5 searching time : 73ms
>
> It applied to Koreaâ€™s greatest shopping service(http://danawa.com/
> <http://danawa.com/>) a month ago to my delight.
>
> But my goal has been making a globally-used open source search engine.
>
> As you suggested, now I want to make a whole-new search engine in Go.
>
> I have made my first search engine alone, but I would not make a new
> search engine alone. I want to make it with global developers together.
>
> If you plan to make a new search engine in Go, or know someone around
> you, could you help me gathering members for a new search engine, and
> guide us technically(feature requirement, efficient design)?
>
> Or if there is already a new search engine project in Go, could you
> let me know?
>
> In Korea, no one develops a search engine except people who work at a
> search engine solution company, and even they are very few and do not
> spend time to an open source project.
>
> In my case, I found a tiny venture company for making time to develop
> an open source search engine 4 year ago.
>
> I want to be involved in a next-generation search engine project. I
> would be happy to make a new search engine itself.
>
> Your little help could be great for me.
>
> Thank you.
>
> Sang Song
>
>
> > 2014. 11. 15., ì˜¤í›„ 8:22, Michael McCandless
> > <lucene@mikemccandless.com
> <javascript:;>>
> ìž‘ì„±:
> >
> > Actually I think competing projects is very healthy for open source
> development.
> >
> > There are many things you could explore to "contrast" with Lucene,
> > e.g. write your new search engine in Go not Java: Java has many
> > problems, maybe Go fixes them. Go also has a low-latency garbage
> > collector in development ... and Java's GC options still can't scale
> > to the heap sizes that are practical now.
> >
> > Lucene has many limitations, so your competing engine could focus on
> > them. E.g. the "schemalessness" of Lucene has become a big problem,
> > and near impossible to fix at this point, and prevents new important
> > features like LUCENE-5879 from being possible, so you could give
> > your engine a "gentle" schema from the start.
> >
> > The Lucene Filter/Query situation is a mess: one should extend the other.
> >
> > Lucene has weak support for proximity queries (SpanQuery is slow and
> > does not get much attention).
> >
> > Lucene is showing its age, missing some compelling features like a
> > builtin transaction log, "core" support for numerics (they are sort
> > of hacked on top), optimistic concurrency support (sequence ids,
> > versions, something), distributed support (near real time
> > replication, etc.), multi-tenancy, an example server implementation,
> > so the search servers on top of Lucene have had to fill these gaps.
> > Maybe you could make your engine distributed from the start (Go is a
> > great match for that, from what little I know).
> >
> > All 3 highlighter options have problems.
> >
> > The analysis chain (attributes) is overly complex.
> >
> > In your competing engine you can borrow/copy/steal from Lucene's
> > good parts to get started...
> >
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com
> <javascript:;>>
> wrote:
> >> Iâ€™m developing search engine, Fastcatsearch. http://github
> >> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
> >>
> >> Lucene is widely known and famous project and I cannot beat Lucene
> >> for
> now.
> >>
> >> But is there any chance to beat Lucene?
> >>
> >> Anything like features, performance.
> >>
> >> Please, let me know what to do to make better product than Lucene.
> >>
> >> Thank you.
>
>
>

RE: How can I make better project than Lucene? [ In reply to ]

Nov 16, 2014, 2:09 AM

Post #19 of 25 (3318 views)

http://dx.doi.org/10.1145/2480362.2480533

-----Original Message-----
From: Saurabh Tiwari [mailto:saurabhtiwaririshi@gmail.com]
Sent: Saturday, November 15, 2014 2:09 AM
To: general@lucene.apache.org
Subject: Re: How can I make better project than Lucene?

+1 John wang

On Saturday, November 15, 2014, John Wang <john.wang@gmail.com> wrote:

> Why not spend the effort improving Lucene?
>
> John
> On Nov 14, 2014 5:46 PM, "swsong_dev" <swsong_dev@websqrd.com
> <javascript:;>> wrote:
>
> > Iâ€™m developing search engine, Fastcatsearch. http://github
> > <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
> >
> > Lucene is widely known and famous project and I cannot beat Lucene
> > for
> now.
> >
> > But is there any chance to beat Lucene?
> >
> > Anything like features, performance.
> >
> > Please, let me know what to do to make better product than Lucene.
> >
> > Thank you.
>

--
Saurabh

Re: How can I make better project than Lucene? [ In reply to ]

leegee at gmail

Nov 16, 2014, 10:06 AM

Post #20 of 25 (3312 views)

What about Rust?

On 16/11/2014 10:06, Will Martin wrote:
> I don't know what I think about an engine in Go.
>
> :-(
>
>
> -----Original Message-----
> From: sangwook DEV [mailto:swsong_dev@websqrd.com]
> Sent: Sunday, November 16, 2014 2:52 AM
> To: general@lucene.apache.org
> Subject: Re: How can I make better project than Lucene?
>
> Do you mean you want to make a new search engine with me in Go?
> I don't understand what you mean exactly..
>
> 2014ë…„ 11ì›” 16ì¼ ì¼ìš”ì¼, Will Martin<wmartinusa@gmail.com>ë‹˜ì´ ìž‘ì„±í•œ ë©”ì‹œì§€:
>
>> Please let me know if you go the GO route, so I can choose the language I
>> like best for my work. Wmartinusa at google online mail.
>>
>> -----Original Message-----
>> From: swsong_dev [mailto:swsong_dev@websqrd.com <javascript:;>]
>> Sent: Saturday, November 15, 2014 6:23 PM
>> To: general@lucene.apache.org <javascript:;>
>> Subject: Re: How can I make better project than Lucene?
>>
>> Thank you for your sincere reply, Mr. McCandless.
>>
>> When I posted an email in a mailing-list, I was afraid for not getting
>> a considerable reply, but Iâ€™m now so glad I might find a way.
>>
>> I agree that a new search engine in Go would be competitive. I think
>> we all need a next generation search engine core redesigned from the start.
>>
>> And, I understand Luceneâ€™s limitations you mentioned. They are good
>> points to get started.
>>
>> I have been used a search engine for first 4 years and developed a
>> search engines for last 6 years from the bottom, and I got feedback
>> â€œItâ€™s faster than Solr in indexing and searchingâ€.
>> (http://ddakker.tistory.com/248
>> <http://ddakker.tistory.com/248>)
>>
>> ===Result===
>> Data size : 529,188
>> Fastcat indexing time : 1m 26s
>> Solr3.5 indexing time : 5m 30s
>> Fastcat searching time : 48ms
>> Solr3.5 searching time : 73ms
>>
>> It applied to Koreaâ€™s greatest shopping service(http://danawa.com/
>> <http://danawa.com/>) a month ago to my delight.
>>
>> But my goal has been making a globally-used open source search engine.
>>
>> As you suggested, now I want to make a whole-new search engine in Go.
>>
>> I have made my first search engine alone, but I would not make a new
>> search engine alone. I want to make it with global developers together.
>>
>> If you plan to make a new search engine in Go, or know someone around
>> you, could you help me gathering members for a new search engine, and
>> guide us technically(feature requirement, efficient design)?
>>
>> Or if there is already a new search engine project in Go, could you
>> let me know?
>>
>> In Korea, no one develops a search engine except people who work at a
>> search engine solution company, and even they are very few and do not
>> spend time to an open source project.
>>
>> In my case, I found a tiny venture company for making time to develop
>> an open source search engine 4 year ago.
>>
>> I want to be involved in a next-generation search engine project. I
>> would be happy to make a new search engine itself.
>>
>> Your little help could be great for me.
>>
>> Thank you.
>>
>> Sang Song
>>
>>
>>> 2014. 11. 15., ì˜¤í›„ 8:22, Michael McCandless
>>> <lucene@mikemccandless.com
>> <javascript:;>>
>> ìž‘ì„±:
>>> Actually I think competing projects is very healthy for open source
>> development.
>>> There are many things you could explore to "contrast" with Lucene,
>>> e.g. write your new search engine in Go not Java: Java has many
>>> problems, maybe Go fixes them. Go also has a low-latency garbage
>>> collector in development ... and Java's GC options still can't scale
>>> to the heap sizes that are practical now.
>>>
>>> Lucene has many limitations, so your competing engine could focus on
>>> them. E.g. the "schemalessness" of Lucene has become a big problem,
>>> and near impossible to fix at this point, and prevents new important
>>> features like LUCENE-5879 from being possible, so you could give
>>> your engine a "gentle" schema from the start.
>>>
>>> The Lucene Filter/Query situation is a mess: one should extend the other.
>>>
>>> Lucene has weak support for proximity queries (SpanQuery is slow and
>>> does not get much attention).
>>>
>>> Lucene is showing its age, missing some compelling features like a
>>> builtin transaction log, "core" support for numerics (they are sort
>>> of hacked on top), optimistic concurrency support (sequence ids,
>>> versions, something), distributed support (near real time
>>> replication, etc.), multi-tenancy, an example server implementation,
>>> so the search servers on top of Lucene have had to fill these gaps.
>>> Maybe you could make your engine distributed from the start (Go is a
>>> great match for that, from what little I know).
>>>
>>> All 3 highlighter options have problems.
>>>
>>> The analysis chain (attributes) is overly complex.
>>>
>>> In your competing engine you can borrow/copy/steal from Lucene's
>>> good parts to get started...
>>>
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com
>> <javascript:;>>
>> wrote:
>>>> Iâ€™m developing search engine, Fastcatsearch. http://github
>>>> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>>>>
>>>> Lucene is widely known and famous project and I cannot beat Lucene
>>>> for
>> now.
>>>> But is there any chance to beat Lucene?
>>>>
>>>> Anything like features, performance.
>>>>
>>>> Please, let me know what to do to make better product than Lucene.
>>>>
>>>> Thank you.
>>
>>
>

Re: How can I make better project than Lucene? [ In reply to ]

marvin at rectangular

Nov 18, 2014, 10:16 AM

Post #21 of 25 (3276 views)

On Sat, Nov 15, 2014 at 3:22 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:

> The analysis chain (attributes) is overly complex.

If you were to start from scratch, what would the analysis chain look like?

Marvin Humphrey

Re: How can I make better project than Lucene? [ In reply to ]

eakarsu at gmail

Nov 18, 2014, 10:40 AM

Post #22 of 25 (3272 views)

Erik,

Can you pleas answer my question in SOLR list?
I appreciate your help

I have several field types and like to assign correct boosting so that I
will get results in correct order.
Here is a summary of what I have:
1- Product Title - text field , Boost = 160
2- Product Description - text field , Boost = 80
3-Number of clicks - Integer field, having value [1 TO 1000] , Boost = 40
4- Product Features - text field , Boost = 20
5- AmountPurchased - Float field , Boost = 10
5- Product Properties - text field , Boost = 5

User will make a search q= "foo bar" and we expect solr will return results
based on Boost values assigned above. qf and pf can help me to assign
boosting for text fields easily. But I am having difficulty to mix text
fields with numeric ones. For example, I want product with Number of clicks
= 20 should be listed higher than one with 10 clicks after 1) and 2).

I guess solr, based on search results, will re order based boost values in
text fields but I want product with number of clicks 10 will be higher than
with clicks 5. As result, any products having clicks will have higher ranks
that products that has features that includes search keywords.

I hope I have explained correctly,

Can you please guide me on how to solve this issue?

Regards

Erol Akarsu
Remove Ads
<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=buy_credits_page>

On Sat, Nov 15, 2014 at 7:31 PM, Erik Hatcher <erik.hatcher@gmail.com>
wrote:

> Iâ€™m curious to see these benchmarks run on the latest Solr version, as the
> numbers you quoted were over two years ago. Also, itâ€™d be useful to see
> the indexing and searching benchmark code to make sure it takes advantage
> of best practices. Iâ€™ve indexed 10M docs into Solr in only a few
> minutes. 500K, say in CSV format, for basic e-commerce product data, would
> likely take a minute or so. The searching differences you present seem
> fairly negligible - 100ms is the blink of an eye, so anything under is
> considered quite acceptable by the largest e-commerce vendors in the
> world. Along with that, perhaps an even more important benchmark is
> relevancy or in some way measure how good the search results are.
>
> As Mike put so well, competition is a great thing so by all means I
> encourage you to carry on with your endeavor. Sounds like youâ€™ve built
> some powerful stuff and have extensive experience. +1
>
> Erik
>
>
>
> On Nov 15, 2014, at 6:23 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>
> > Thank you for your sincere reply, Mr. McCandless.
> >
> > When I posted an email in a mailing-list, I was afraid for not getting a
> considerable reply, but Iâ€™m now so glad I might find a way.
> >
> > I agree that a new search engine in Go would be competitive. I think we
> all need a next generation search engine core redesigned from the start.
> >
> > And, I understand Luceneâ€™s limitations you mentioned. They are good
> points to get started.
> >
> > I have been used a search engine for first 4 years and developed a
> search engines for last 6 years from the bottom, and I got feedback â€œItâ€™s
> faster than Solr in indexing and searchingâ€. (
> http://ddakker.tistory.com/248 <http://ddakker.tistory.com/248>)
> >
> > ===Result===
> > Data size : 529,188
> > Fastcat indexing time : 1m 26s
> > Solr3.5 indexing time : 5m 30s
> > Fastcat searching time : 48ms
> > Solr3.5 searching time : 73ms
> >
> > It applied to Koreaâ€™s greatest shopping service(http://danawa.com/ <
> http://danawa.com/>) a month ago to my delight.
> >
> > But my goal has been making a globally-used open source search engine.
> >
> > As you suggested, now I want to make a whole-new search engine in Go.
> >
> > I have made my first search engine alone, but I would not make a new
> search engine alone. I want to make it with global developers together.
> >
> > If you plan to make a new search engine in Go, or know someone around
> you, could you help me gathering members for a new search engine, and guide
> us technically(feature requirement, efficient design)?
> >
> > Or if there is already a new search engine project in Go, could you let
> me know?
> >
> > In Korea, no one develops a search engine except people who work at a
> search engine solution company, and even they are very few and do not spend
> time to an open source project.
> >
> > In my case, I found a tiny venture company for making time to develop an
> open source search engine 4 year ago.
> >
> > I want to be involved in a next-generation search engine project. I
> would be happy to make a new search engine itself.
> >
> > Your little help could be great for me.
> >
> > Thank you.
> >
> > Sang Song
> >
> >
> >> 2014. 11. 15., ì˜¤í›„ 8:22, Michael McCandless <lucene@mikemccandless.com>
> ìž‘ì„±:
> >>
> >> Actually I think competing projects is very healthy for open source
> development.
> >>
> >> There are many things you could explore to "contrast" with Lucene,
> >> e.g. write your new search engine in Go not Java: Java has many
> >> problems, maybe Go fixes them. Go also has a low-latency garbage
> >> collector in development ... and Java's GC options still can't scale
> >> to the heap sizes that are practical now.
> >>
> >> Lucene has many limitations, so your competing engine could focus on
> >> them. E.g. the "schemalessness" of Lucene has become a big problem,
> >> and near impossible to fix at this point, and prevents new important
> >> features like LUCENE-5879 from being possible, so you could give your
> >> engine a "gentle" schema from the start.
> >>
> >> The Lucene Filter/Query situation is a mess: one should extend the
> other.
> >>
> >> Lucene has weak support for proximity queries (SpanQuery is slow and
> >> does not get much attention).
> >>
> >> Lucene is showing its age, missing some compelling features like a
> >> builtin transaction log, "core" support for numerics (they are sort of
> >> hacked on top), optimistic concurrency support (sequence ids,
> >> versions, something), distributed support (near real time replication,
> >> etc.), multi-tenancy, an example server implementation, so the search
> >> servers on top of Lucene have had to fill these gaps. Maybe you could
> >> make your engine distributed from the start (Go is a great match for
> >> that, from what little I know).
> >>
> >> All 3 highlighter options have problems.
> >>
> >> The analysis chain (attributes) is overly complex.
> >>
> >> In your competing engine you can borrow/copy/steal from Lucene's good
> >> parts to get started...
> >>
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com>
> wrote:
> >>> Iâ€™m developing search engine, Fastcatsearch. http://github
> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
> >>>
> >>> Lucene is widely known and famous project and I cannot beat Lucene for
> now.
> >>>
> >>> But is there any chance to beat Lucene?
> >>>
> >>> Anything like features, performance.
> >>>
> >>> Please, let me know what to do to make better product than Lucene.
> >>>
> >>> Thank you.
> >
>
>

Re: How can I make better project than Lucene? [ In reply to ]

Nov 18, 2014, 12:52 PM

Post #23 of 25 (3270 views)

On Tue, Nov 18, 2014 at 1:16 PM, Marvin Humphrey <marvin@rectangular.com> wrote:
> On Sat, Nov 15, 2014 at 3:22 AM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>
>> The analysis chain (attributes) is overly complex.
>
> If you were to start from scratch, what would the analysis chain look like?

Hi Marvin, long time no talk! I like the new Go bindings for Lucy!

Here are some things that bug me about Lucene's analysis APIs:
Lucene's Attributes have separate interface from impl, with default
impls, and this causes complex code in oal.util.Attribute*. It seems
like overkill. Seems like we should just have concrete core impls for
the atts Lucene knows how to index.

There are 5 java source files in that package related to attributes
(Attribute.java AttributeFactory.java AttributeImpl.java
AttributeReflector.java AttributeSource.java): too much.

There should not be a global AttributeFactory that owns all attrs
throughout the pipeline: that's too global. Rather, each stage should
be free to control what the next stages sees (LUCENE-2450) ... the
namespace should be private to that stage, and each stage can
delete/add/replace the incoming bindings it saw. This may seem more
complex but I think it'd be simpler in the end? And, the first stage
should not have to be responsible for clearing things that later
stages had inserted: common source of bugs for that first Tokenizer to
not call clearAttributes.

Reuse of token streams was an "afterthought" that took a long time to
work its way down to simpler APIs, but now we ReuseStrategy,
AnalyzerWrapper, DelegatingAnalzyerWrapper.

Custom analyzers can't be (easily?) serialized, so ES and Solr have
their own layers to parse a custom chain from JSON/XML. Those layers
could do better error checking...

Can we do something better with offsets, such that TokenFilters (not
just Tokenizers/CharReaders) would also be able to set correct
offsets?

The stuffing of things into "analysis" that really should have been a
"gentle schema" is annoying: KeywordAnalyzer, Numeric*.

Token filters that want to create graphs are nearly impossible. E.g
you cannot put a WDF in front of SynonymFilter today because
SynonymFilter can't handle an incoming graph (LUCENE-5012).

Deleted tokens should still be present, just "marked" as deleted (so
IW doesn't index them). This would make it possible (to Rob's horror)
for tokenizers to preserve every single character they saw, but things
that are not tokens (punctuation, whitespace) are marked deleted.
Maybe this makes it possible for all stages to work with offsets
properly?

There is probably more, and probably lots of people disagree that
these are even "problems" :)

Mike McCandless

http://blog.mikemccandless.com

Re: How can I make better project than Lucene? [ In reply to ]

evert.wagenaar at gmail

Nov 19, 2014, 7:56 AM

Post #24 of 25 (3268 views)

Big challenge I would think. Bear in mind that Lucene is in active
development for 17 years now. It has a fast community and almost everything
has been optimised and rewritten during the last decade so if you can't
beat Lucene, join the community and start optimising.

Regards,

Evert

Evert Wagenaar

On Sat, Nov 15, 2014 at 2:43 AM, swsong_dev <swsong_dev@websqrd.com> wrote:

> I'm developing search engine, Fastcatsearch. http://github
> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>
> Lucene is widely known and famous project and I cannot beat Lucene for now.
>
> But is there any chance to beat Lucene?
>
> Anything like features, performance.
>
> Please, let me know what to do to make better product than Lucene.
>
> Thank you.

Re: How can I make better project than Lucene? [ In reply to ]

Nov 19, 2014, 10:05 PM

Post #25 of 25 (3246 views)