Mailing List Archive

Searching sub string
Hi,

I'd like to perform a substring search (query like *foo*).
As you better know, it is not possible to use * as the first character.
Is there any other solution ?

Thanks,
Moran
--
View this message in context: http://www.nabble.com/Searching-sub-string-tp15089424p15089424.html
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Searching sub string [ In reply to ]
On Freitag, 25. Januar 2008, mortic44 wrote:

> I'd like to perform a substring search (query like *foo*).
> As you better know, it is not possible to use * as the first character.

It is, at last in recent versions of Lucene Java, see
QueryParser.setAllowLeadingWildcard(). An alternative might be to produce
an n-gram index.

Regards
Daniel

--
http://www.danielnaber.de
Re: Searching sub string [ In reply to ]
Thanks Daniel.

Unfortunately i'm using Lucene .Net and
"QueryParser.setAllowLeadingWildcard()" is available on ver 2.1 which has
not released yet.

Moran



Daniel Naber-8 wrote:
>
> On Freitag, 25. Januar 2008, mortic44 wrote:
>
>> I'd like to perform a substring search (query like *foo*).
>> As you better know, it is not possible to use * as the first character.
>
> It is, at last in recent versions of Lucene Java, see
> QueryParser.setAllowLeadingWildcard(). An alternative might be to produce
> an n-gram index.
>
> Regards
> Daniel
>
> --
> http://www.danielnaber.de
>
>

--
View this message in context: http://www.nabble.com/Searching-sub-string-tp15089424p15106180.html
Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Searching sub string [ In reply to ]
In answer to Moran Schemer's problem of not having access to the latest
Lucene features because he is using .NET:

I know this is heresy and anathema, but one way out of the Lucene .Net
ghetto problem is to use the mainline Lucene release Java source code
and compile it with a Java compiler that targets the .NET CLR.

That's why I did some testing and benchmarking of the idea, using the J#
compiler a couple years ago.

http://alum.mit.edu/www/gjc/lucene-java-vjc.html is the result. I did
the work using Lucene 1.9.1 and also what was at the time the 2.0 trunc.
But the point is that I kept careful track of how much time it took and
how few lines of code needed to be modified, and the result was that you
can have a big payoff with very little effort, assuming that you are
comfortable with programming in Java.

So give it a try, it should take you at most a couple hours to compile
and test the Java source code for Lucene, (that's all it took me).

The heresy and anathema comes from the fact that Microsoft has announced
the retirement of their J# compiler and the end of all support in the
year 2015.
So you'll have 7 years to worry if anyone is going to complete the GCC
back-end that targets the CLR, or if any alternative Java compiler will
become available. But I'm not suggesting this as a long range plan, more
of a stop-gap measure that exploits the technology that currently
exists.




-----Original Message-----
From: mortic44 [mailto:moran.shemer@gmail.com]
Sent: Saturday, January 26, 2008 4:40 AM
To: general@lucene.apache.org
Subject: Re: Searching sub string


Thanks Daniel.

Unfortunately i'm using Lucene .Net and
"QueryParser.setAllowLeadingWildcard()" is available on ver 2.1 which
has
not released yet.

Moran



Daniel Naber-8 wrote:
>
> On Freitag, 25. Januar 2008, mortic44 wrote:
>
>> I'd like to perform a substring search (query like *foo*).
>> As you better know, it is not possible to use * as the first
character.
>
> It is, at last in recent versions of Lucene Java, see
> QueryParser.setAllowLeadingWildcard(). An alternative might be to
produce
> an n-gram index.
>
> Regards
> Daniel
>
> --
> http://www.danielnaber.de
>
>

--
View this message in context:
http://www.nabble.com/Searching-sub-string-tp15089424p15106180.html
Sent from the Lucene - General mailing list archive at Nabble.com.



NOTICE:
This message may contain privileged or otherwise confidential information. If you are not the intended recipient, please immediately advise the sender by reply email and delete the message and any attachments without using, copying or disclosing the contents. (FE1)
Re: Searching sub string [ In reply to ]
Hi Moran,

In case that
- you don't need other Lucene functions or ranking
- performance is critical
- your database does not fit into RAM
you may have a look at (compressed) suffix arrays, which can perfectly
handle left and right truncation.

Best regards,

Wolfgang




mortic44 <moran.shemer@gmail.com>
25-01-2008 16:27
Please respond to
general@lucene.apache.org


To
general@lucene.apache.org
cc

Subject
Searching sub string







Hi,

I'd like to perform a substring search (query like *foo*).
As you better know, it is not possible to use * as the first character.
Is there any other solution ?

Thanks,
Moran
--
View this message in context:
http://www.nabble.com/Searching-sub-string-tp15089424p15089424.html
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Searching sub string [ In reply to ]
Thanks everyone.

Wolfgang Täger wrote:
>
> Hi Moran,
>
> In case that
> - you don't need other Lucene functions or ranking
> - performance is critical
> - your database does not fit into RAM
> you may have a look at (compressed) suffix arrays, which can perfectly
> handle left and right truncation.
>
> Best regards,
>
> Wolfgang
>
>
>
>
> mortic44 <moran.shemer@gmail.com>
> 25-01-2008 16:27
> Please respond to
> general@lucene.apache.org
>
>
> To
> general@lucene.apache.org
> cc
>
> Subject
> Searching sub string
>
>
>
>
>
>
>
> Hi,
>
> I'd like to perform a substring search (query like *foo*).
> As you better know, it is not possible to use * as the first character.
> Is there any other solution ?
>
> Thanks,
> Moran
> --
> View this message in context:
> http://www.nabble.com/Searching-sub-string-tp15089424p15089424.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>
>
>

--
View this message in context: http://www.nabble.com/Searching-sub-string-tp15089424p15140172.html
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Searching sub string [ In reply to ]
Wolfgang is right, but you can also enable the leading wildcard in the QueryParser, I believe, plus you can index reversed tokens and stick to the trailing wildcards.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Wolfgang Täger <wtaeger@epo.org>
To: general@lucene.apache.org
Sent: Monday, January 28, 2008 2:43:06 AM
Subject: Re: Searching sub string

Hi
Moran,

In
case
that
-
you
don't
need
other
Lucene
functions
or
ranking
-
performance
is
critical
-
your
database
does
not
fit
into
RAM
you
may
have
a
look
at
(compressed)
suffix
arrays,
which
can
perfectly
handle
left
and
right
truncation.

Best
regards,





Wolfgang




mortic44
<moran.shemer@gmail.com>
25-01-2008
16:27
Please
respond
to
general@lucene.apache.org


To
general@lucene.apache.org
cc

Subject
Searching
sub
string







Hi,

I'd
like
to
perform
a
substring
search
(query
like
*foo*).
As
you
better
know,
it
is
not
possible
to
use
*
as
the
first
character.
Is
there
any
other
solution
?

Thanks,
Moran
--
View
this
message
in
context:
http://www.nabble.com/Searching-sub-string-tp15089424p15089424.html
Sent
from
the
Lucene
-
General
mailing
list
archive
at
Nabble.com.