Mailing List Archive

What is edit distance 2 mean for fuzzy queries?
Hi,-

search Query: [+streetDFLT:ridg~2, +cityDFLT:"nashua",
+regionDFLT:"new-hampshire", +countryDFLT:"united" +countryDFLT:"states"]

Name: Ridge Rd
Score: 35.297863
ID: 10242301
Country Code: US
Coordinates: 42.70569, -71.49599
Search Key: street="RIDGE" city="NASHUA" municipality="HILLSBOROUGH"
region="NEW HAMPSHIRE" country="UNITED STATES"

for this i can find RIDGE but

with search string RID i cant find RIDGE at all:

search Query: [+streetDFLT:rid~2, +cityDFLT:"nashua",
+regionDFLT:"new-hampshire", +countryDFLT:"united" +countryDFLT:"states"]

Name: NASHUA
Score: 28.291311
ID: 21014865
Country Code: US
Coordinates: 42.75873, -71.46438
Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
HAMPSHIRE" country="UNITED STATES"

Name: NASHUA
Score: 28.291311
ID: 21014865
Country Code: US
Coordinates: 42.75873, -71.46438
Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
HAMPSHIRE" country="UNITED STATES"

Name: NASHUA
Score: 28.291311
ID: 21014865
Country Code: US
Coordinates: 42.75873, -71.46438
Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
HAMPSHIRE" country="UNITED STATES"

Name: NASHUA
Score: 28.291311
ID: 21014865
Country Code: US
Coordinates: 42.75873, -71.46438
Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
HAMPSHIRE" country="UNITED STATES"

Name: Pennichuck St
Score: 28.291311
ID: 8022314
Country Code: US
Coordinates: 42.79266, -71.46672
Search Key: street="PENNICHUCK" city="NASHUA"
municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED STATES"

Name: Hartford Ln
Score: 28.291311
ID: 9817672
Country Code: US
Coordinates: 42.78252, -71.49689
Search Key: street="HARTFORD" city="NASHUA" municipality="HILLSBOROUGH"
region="NEW HAMPSHIRE" country="UNITED STATES"

Name: Marblehead Dr
Score: 28.291311
ID: 12762505
Country Code: US
Coordinates: 42.79743, -71.50919
Search Key: street="MARBLEHEAD" city="NASHUA"
municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED STATES"

RID is two edit distances away from RIDGE , right? Should i enable
something during indexing for fuzzy queries?
Best regards



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: What is edit distance 2 mean for fuzzy queries? [ In reply to ]
i think i have an answer for this one:

for words shorter or equal to length 5 the edit distance 1 works only
but as the word gets longer

i see edit distance 2 works ok.

Best regards


On 6/28/19 2:25 PM, baris.kazar@oracle.com wrote:
> Hi,-
>
> search Query: [+streetDFLT:ridg~2, +cityDFLT:"nashua",
> +regionDFLT:"new-hampshire", +countryDFLT:"united" +countryDFLT:"states"]
>
> Name: Ridge Rd
> Score: 35.297863
> ID: 10242301
> Country Code: US
> Coordinates: 42.70569, -71.49599
> Search Key: street="RIDGE" city="NASHUA" municipality="HILLSBOROUGH"
> region="NEW HAMPSHIRE" country="UNITED STATES"
>
> for this i can find RIDGE but
>
> with search string RID i cant find RIDGE at all:
>
> search Query: [+streetDFLT:rid~2, +cityDFLT:"nashua",
> +regionDFLT:"new-hampshire", +countryDFLT:"united" +countryDFLT:"states"]
>
> Name: NASHUA
> Score: 28.291311
> ID: 21014865
> Country Code: US
> Coordinates: 42.75873, -71.46438
> Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> HAMPSHIRE" country="UNITED STATES"
>
> Name: NASHUA
> Score: 28.291311
> ID: 21014865
> Country Code: US
> Coordinates: 42.75873, -71.46438
> Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> HAMPSHIRE" country="UNITED STATES"
>
> Name: NASHUA
> Score: 28.291311
> ID: 21014865
> Country Code: US
> Coordinates: 42.75873, -71.46438
> Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> HAMPSHIRE" country="UNITED STATES"
>
> Name: NASHUA
> Score: 28.291311
> ID: 21014865
> Country Code: US
> Coordinates: 42.75873, -71.46438
> Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> HAMPSHIRE" country="UNITED STATES"
>
> Name: Pennichuck St
> Score: 28.291311
> ID: 8022314
> Country Code: US
> Coordinates: 42.79266, -71.46672
> Search Key: street="PENNICHUCK" city="NASHUA"
> municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED
> STATES"
>
> Name: Hartford Ln
> Score: 28.291311
> ID: 9817672
> Country Code: US
> Coordinates: 42.78252, -71.49689
> Search Key: street="HARTFORD" city="NASHUA"
> municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED
> STATES"
>
> Name: Marblehead Dr
> Score: 28.291311
> ID: 12762505
> Country Code: US
> Coordinates: 42.79743, -71.50919
> Search Key: street="MARBLEHEAD" city="NASHUA"
> municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED
> STATES"
>
> RID is two edit distances away from RIDGE , right? Should i enable
> something during indexing for fuzzy queries?
> Best regards
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: What is edit distance 2 mean for fuzzy queries? [ In reply to ]
Hi Baris,

Terms of length 1 or 2 will sometimes not match because of how the scaled
distance between two terms is computed. For a term to match, the edit
distance between the terms must be less than the minimum length term
(either the input term, or the candidate term). For example, FuzzyQuery on
term "abcd" with maxEdits=2 will not match an indexed term "ab", and
FuzzyQuery on term "a" with maxEdits=2 will not match an indexed term "abc".

You can check it from here:
https://lucene.apache.org/core/8_2_0/core/org/apache/lucene/search/FuzzyQuery.html

Kind Regards,
Furkan KAMACI

On Fri, Jun 28, 2019 at 10:10 PM <baris.kazar@oracle.com> wrote:

> i think i have an answer for this one:
>
> for words shorter or equal to length 5 the edit distance 1 works only
> but as the word gets longer
>
> i see edit distance 2 works ok.
>
> Best regards
>
>
> On 6/28/19 2:25 PM, baris.kazar@oracle.com wrote:
> > Hi,-
> >
> > search Query: [+streetDFLT:ridg~2, +cityDFLT:"nashua",
> > +regionDFLT:"new-hampshire", +countryDFLT:"united" +countryDFLT:"states"]
> >
> > Name: Ridge Rd
> > Score: 35.297863
> > ID: 10242301
> > Country Code: US
> > Coordinates: 42.70569, -71.49599
> > Search Key: street="RIDGE" city="NASHUA" municipality="HILLSBOROUGH"
> > region="NEW HAMPSHIRE" country="UNITED STATES"
> >
> > for this i can find RIDGE but
> >
> > with search string RID i cant find RIDGE at all:
> >
> > search Query: [+streetDFLT:rid~2, +cityDFLT:"nashua",
> > +regionDFLT:"new-hampshire", +countryDFLT:"united" +countryDFLT:"states"]
> >
> > Name: NASHUA
> > Score: 28.291311
> > ID: 21014865
> > Country Code: US
> > Coordinates: 42.75873, -71.46438
> > Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> > HAMPSHIRE" country="UNITED STATES"
> >
> > Name: NASHUA
> > Score: 28.291311
> > ID: 21014865
> > Country Code: US
> > Coordinates: 42.75873, -71.46438
> > Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> > HAMPSHIRE" country="UNITED STATES"
> >
> > Name: NASHUA
> > Score: 28.291311
> > ID: 21014865
> > Country Code: US
> > Coordinates: 42.75873, -71.46438
> > Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> > HAMPSHIRE" country="UNITED STATES"
> >
> > Name: NASHUA
> > Score: 28.291311
> > ID: 21014865
> > Country Code: US
> > Coordinates: 42.75873, -71.46438
> > Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> > HAMPSHIRE" country="UNITED STATES"
> >
> > Name: Pennichuck St
> > Score: 28.291311
> > ID: 8022314
> > Country Code: US
> > Coordinates: 42.79266, -71.46672
> > Search Key: street="PENNICHUCK" city="NASHUA"
> > municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED
> > STATES"
> >
> > Name: Hartford Ln
> > Score: 28.291311
> > ID: 9817672
> > Country Code: US
> > Coordinates: 42.78252, -71.49689
> > Search Key: street="HARTFORD" city="NASHUA"
> > municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED
> > STATES"
> >
> > Name: Marblehead Dr
> > Score: 28.291311
> > ID: 12762505
> > Country Code: US
> > Coordinates: 42.79743, -71.50919
> > Search Key: street="MARBLEHEAD" city="NASHUA"
> > municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED
> > STATES"
> >
> > RID is two edit distances away from RIDGE , right? Should i enable
> > something during indexing for fuzzy queries?
> > Best regards
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: What is edit distance 2 mean for fuzzy queries? [ In reply to ]
thanks, yes, i was looking into the scaled distance computation in more details.
as i mentioned i have a rough formula now like for words shorter or equal tom length 5 edit distance 1 works
but as the word gets longer edit distance 2 works too.
Best

----- Original Message -----
From: furkankamaci@gmail.com
To: java-user@lucene.apache.org
Sent: Sunday, August 4, 2019 8:41:27 PM GMT -05:00 US/Canada Eastern
Subject: Re: What is edit distance 2 mean for fuzzy queries?

Hi Baris,

Terms of length 1 or 2 will sometimes not match because of how the scaled
distance between two terms is computed. For a term to match, the edit
distance between the terms must be less than the minimum length term
(either the input term, or the candidate term). For example, FuzzyQuery on
term "abcd" with maxEdits=2 will not match an indexed term "ab", and
FuzzyQuery on term "a" with maxEdits=2 will not match an indexed term "abc".

You can check it from here:
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F2-5F0_core_org_apache_lucene_search_FuzzyQuery.html&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=0AsZgOka6srIFiLVeiuXvuQt8Ln5HPFd6SrnTjqmT_Y&s=GLZPVWAbE6Q18wH8vkiHk3O6AoTEEhkoRfbMb8DJJ-s&e=

Kind Regards,
Furkan KAMACI

On Fri, Jun 28, 2019 at 10:10 PM <baris.kazar@oracle.com> wrote:

> i think i have an answer for this one:
>
> for words shorter or equal to length 5 the edit distance 1 works only
> but as the word gets longer
>
> i see edit distance 2 works ok.
>
> Best regards
>
>
> On 6/28/19 2:25 PM, baris.kazar@oracle.com wrote:
> > Hi,-
> >
> > search Query: [+streetDFLT:ridg~2, +cityDFLT:"nashua",
> > +regionDFLT:"new-hampshire", +countryDFLT:"united" +countryDFLT:"states"]
> >
> > Name: Ridge Rd
> > Score: 35.297863
> > ID: 10242301
> > Country Code: US
> > Coordinates: 42.70569, -71.49599
> > Search Key: street="RIDGE" city="NASHUA" municipality="HILLSBOROUGH"
> > region="NEW HAMPSHIRE" country="UNITED STATES"
> >
> > for this i can find RIDGE but
> >
> > with search string RID i cant find RIDGE at all:
> >
> > search Query: [+streetDFLT:rid~2, +cityDFLT:"nashua",
> > +regionDFLT:"new-hampshire", +countryDFLT:"united" +countryDFLT:"states"]
> >
> > Name: NASHUA
> > Score: 28.291311
> > ID: 21014865
> > Country Code: US
> > Coordinates: 42.75873, -71.46438
> > Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> > HAMPSHIRE" country="UNITED STATES"
> >
> > Name: NASHUA
> > Score: 28.291311
> > ID: 21014865
> > Country Code: US
> > Coordinates: 42.75873, -71.46438
> > Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> > HAMPSHIRE" country="UNITED STATES"
> >
> > Name: NASHUA
> > Score: 28.291311
> > ID: 21014865
> > Country Code: US
> > Coordinates: 42.75873, -71.46438
> > Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> > HAMPSHIRE" country="UNITED STATES"
> >
> > Name: NASHUA
> > Score: 28.291311
> > ID: 21014865
> > Country Code: US
> > Coordinates: 42.75873, -71.46438
> > Search Key: city="NASHUA" municipality="HILLSBOROUGH" region="NEW
> > HAMPSHIRE" country="UNITED STATES"
> >
> > Name: Pennichuck St
> > Score: 28.291311
> > ID: 8022314
> > Country Code: US
> > Coordinates: 42.79266, -71.46672
> > Search Key: street="PENNICHUCK" city="NASHUA"
> > municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED
> > STATES"
> >
> > Name: Hartford Ln
> > Score: 28.291311
> > ID: 9817672
> > Country Code: US
> > Coordinates: 42.78252, -71.49689
> > Search Key: street="HARTFORD" city="NASHUA"
> > municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED
> > STATES"
> >
> > Name: Marblehead Dr
> > Score: 28.291311
> > ID: 12762505
> > Country Code: US
> > Coordinates: 42.79743, -71.50919
> > Search Key: street="MARBLEHEAD" city="NASHUA"
> > municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED
> > STATES"
> >
> > RID is two edit distances away from RIDGE , right? Should i enable
> > something during indexing for fuzzy queries?
> > Best regards
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org