Mailing List Archive

Attribute Search
I am trying index a set of data, storing only a "primary key". This primary key I left un-indexed. There is one "text" field, that I indexed and tokenized.

The others I neither want to store or tokenized. My reasoning was that "not tokenizing" would produce the smallest index. The remaining fields were lastname, firstname, etc.

However, my queries did not work correctly; never returning any hits.

I finally gave up and re-indexed with Tokenize set to true on all the fields.

Now my queries work. And to my surprise, the index was smaller that when I did not tokenize.

I found this a little counter-intuitive.

Can someone explain this?
RE: Attribute Search [ In reply to ]
I am new here too but here's my 2 cents.
If you don't tokenize your db textvalues, what do you say will be the
resulting terms indexed? I think not what you expect.
Your non tokenized fields probably are not filtered out hence a lastname
like 'Smith' will not be a hit if the query is 'smith' the search being case
sensitive.
I last name is "smith B" (middle initial), search on 'smith' won't return
either because not a token.
I suggest you double check your values in your DB especially if DB is case
sensitive.
Does your analyser takes into account the accent if Latin type of locale?


-----Original Message-----
From: Cecil, Paula New [mailto:cnew@fuse.net]
Sent: Monday, November 19, 2001 9:47 PM
To: LUCENE Text Search
Subject: Attribute Search


I am trying index a set of data, storing only a "primary key". This primary
key I left un-indexed. There is one "text" field, that I indexed and
tokenized.

The others I neither want to store or tokenized. My reasoning was that "not
tokenizing" would produce the smallest index. The remaining fields were
lastname, firstname, etc.

However, my queries did not work correctly; never returning any hits.

I finally gave up and re-indexed with Tokenize set to true on all the
fields.

Now my queries work. And to my surprise, the index was smaller that when I
did not tokenize.

I found this a little counter-intuitive.

Can someone explain this?

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Attribute Search [ In reply to ]
Actually, "last name" is not a good example. Social security numbers, phone
numbers, PO numbers, organization codes, etc. are better examples.

These fields are not even text. So I did not think it made sense to
tokenize them. But I did want them indexed and searchable.

-----Original Message-----
From: Emmanuel Bridonneau [mailto:EBridonneau@epicentric.com]
Sent: Monday, November 19, 2001 10:02 PM
To: 'Lucene Users List'
Subject: RE: Attribute Search


I am new here too but here's my 2 cents.
If you don't tokenize your db textvalues, what do you say will be the
resulting terms indexed? I think not what you expect.
Your non tokenized fields probably are not filtered out hence a lastname
like 'Smith' will not be a hit if the query is 'smith' the search being case
sensitive.
I last name is "smith B" (middle initial), search on 'smith' won't return
either because not a token.
I suggest you double check your values in your DB especially if DB is case
sensitive.
Does your analyser takes into account the accent if Latin type of locale?


-----Original Message-----
From: Cecil, Paula New [mailto:cnew@fuse.net]
Sent: Monday, November 19, 2001 9:47 PM
To: LUCENE Text Search
Subject: Attribute Search


I am trying index a set of data, storing only a "primary key". This primary
key I left un-indexed. There is one "text" field, that I indexed and
tokenized.

The others I neither want to store or tokenized. My reasoning was that "not
tokenizing" would produce the smallest index. The remaining fields were
lastname, firstname, etc.

However, my queries did not work correctly; never returning any hits.

I finally gave up and re-indexed with Tokenize set to true on all the
fields.

Now my queries work. And to my surprise, the index was smaller that when I
did not tokenize.

I found this a little counter-intuitive.

Can someone explain this?

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Attribute Search [ In reply to ]
Did you check w/FAQ 26 on searching?
http://www.lucene.com/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq
#q26


-----Original Message-----
From: New, Cecil (GEAE) [mailto:cecil.new@ae.ge.com]
Sent: Tuesday, November 20, 2001 2:19 PM
To: 'Lucene Users List'
Subject: RE: Attribute Search


Actually, "last name" is not a good example. Social security numbers, phone
numbers, PO numbers, organization codes, etc. are better examples.

These fields are not even text. So I did not think it made sense to
tokenize them. But I did want them indexed and searchable.

-----Original Message-----
From: Emmanuel Bridonneau [mailto:EBridonneau@epicentric.com]
Sent: Monday, November 19, 2001 10:02 PM
To: 'Lucene Users List'
Subject: RE: Attribute Search


I am new here too but here's my 2 cents.
If you don't tokenize your db textvalues, what do you say will be the
resulting terms indexed? I think not what you expect.
Your non tokenized fields probably are not filtered out hence a lastname
like 'Smith' will not be a hit if the query is 'smith' the search being case
sensitive.
I last name is "smith B" (middle initial), search on 'smith' won't return
either because not a token.
I suggest you double check your values in your DB especially if DB is case
sensitive.
Does your analyser takes into account the accent if Latin type of locale?


-----Original Message-----
From: Cecil, Paula New [mailto:cnew@fuse.net]
Sent: Monday, November 19, 2001 9:47 PM
To: LUCENE Text Search
Subject: Attribute Search


I am trying index a set of data, storing only a "primary key". This primary
key I left un-indexed. There is one "text" field, that I indexed and
tokenized.

The others I neither want to store or tokenized. My reasoning was that "not
tokenizing" would produce the smallest index. The remaining fields were
lastname, firstname, etc.

However, my queries did not work correctly; never returning any hits.

I finally gave up and re-indexed with Tokenize set to true on all the
fields.

Now my queries work. And to my surprise, the index was smaller that when I
did not tokenize.

I found this a little counter-intuitive.

Can someone explain this?

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Attribute Search [ In reply to ]
What about trying to export the content of your database (the fields you're
interested in) into flat files? You then feed your indexer with your
FileInputStream using the pattern you describe below and recreate your
search and see if result differs.

-----Original Message-----
From: Cecil, Paula New [mailto:cnew@fuse.net]
Sent: Wednesday, November 21, 2001 1:08 PM
To: Lucene Users List
Subject: Re: Attribute Search


I came across a tutorial which had some details on the static factory Field
methods. But none of the factory methods return a Field object with the
following settings:
Store => false
Index => true
Tokenize => false

I'm beginning to think this is a bug - that this combination is handled
correctly.


----- Original Message -----
From: Emmanuel Bridonneau <EBridonneau@epicentric.com>
To: 'Lucene Users List' <lucene-user@jakarta.apache.org>
Sent: Tuesday, November 20, 2001 4:12 PM
Subject: RE: Attribute Search


> Did you check w/FAQ 26 on searching?
>
http://www.lucene.com/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq
> #q26
>
>
> -----Original Message-----
> From: New, Cecil (GEAE) [mailto:cecil.new@ae.ge.com]
> Sent: Tuesday, November 20, 2001 2:19 PM
> To: 'Lucene Users List'
> Subject: RE: Attribute Search
>
>
> Actually, "last name" is not a good example. Social security numbers,
phone
> numbers, PO numbers, organization codes, etc. are better examples.
>
> These fields are not even text. So I did not think it made sense to
> tokenize them. But I did want them indexed and searchable.
>
> -----Original Message-----
> From: Emmanuel Bridonneau [mailto:EBridonneau@epicentric.com]
> Sent: Monday, November 19, 2001 10:02 PM
> To: 'Lucene Users List'
> Subject: RE: Attribute Search
>
>
> I am new here too but here's my 2 cents.
> If you don't tokenize your db textvalues, what do you say will be the
> resulting terms indexed? I think not what you expect.
> Your non tokenized fields probably are not filtered out hence a lastname
> like 'Smith' will not be a hit if the query is 'smith' the search being
case
> sensitive.
> I last name is "smith B" (middle initial), search on 'smith' won't return
> either because not a token.
> I suggest you double check your values in your DB especially if DB is case
> sensitive.
> Does your analyser takes into account the accent if Latin type of locale?
>
>
> -----Original Message-----
> From: Cecil, Paula New [mailto:cnew@fuse.net]
> Sent: Monday, November 19, 2001 9:47 PM
> To: LUCENE Text Search
> Subject: Attribute Search
>
>
> I am trying index a set of data, storing only a "primary key". This
primary
> key I left un-indexed. There is one "text" field, that I indexed and
> tokenized.
>
> The others I neither want to store or tokenized. My reasoning was that
"not
> tokenizing" would produce the smallest index. The remaining fields were
> lastname, firstname, etc.
>
> However, my queries did not work correctly; never returning any hits.
>
> I finally gave up and re-indexed with Tokenize set to true on all the
> fields.
>
> Now my queries work. And to my surprise, the index was smaller that when
I
> did not tokenize.
>
> I found this a little counter-intuitive.
>
> Can someone explain this?
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>



--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Attribute Search [ In reply to ]
Paula,

>I came across a tutorial which had some details on the static factory Field
>methods. But none of the factory methods return a Field object with the
>following settings:
>Store => false
>Index => true
>Tokenize => false
>
>I'm beginning to think this is a bug - that this combination is handled
>correctly.

The Field() constructor is public, can't you use that instead of one
of the factory methods?

public Field(String name,
String string,
boolean store,
boolean index,
boolean token)

Regards,
Ype

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Attribute Search [ In reply to ]
I came across a tutorial which had some details on the static factory Field
methods. But none of the factory methods return a Field object with the
following settings:
Store => false
Index => true
Tokenize => false

I'm beginning to think this is a bug - that this combination is handled
correctly.


----- Original Message -----
From: Emmanuel Bridonneau <EBridonneau@epicentric.com>
To: 'Lucene Users List' <lucene-user@jakarta.apache.org>
Sent: Tuesday, November 20, 2001 4:12 PM
Subject: RE: Attribute Search


> Did you check w/FAQ 26 on searching?
>
http://www.lucene.com/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq
> #q26
>
>
> -----Original Message-----
> From: New, Cecil (GEAE) [mailto:cecil.new@ae.ge.com]
> Sent: Tuesday, November 20, 2001 2:19 PM
> To: 'Lucene Users List'
> Subject: RE: Attribute Search
>
>
> Actually, "last name" is not a good example. Social security numbers,
phone
> numbers, PO numbers, organization codes, etc. are better examples.
>
> These fields are not even text. So I did not think it made sense to
> tokenize them. But I did want them indexed and searchable.
>
> -----Original Message-----
> From: Emmanuel Bridonneau [mailto:EBridonneau@epicentric.com]
> Sent: Monday, November 19, 2001 10:02 PM
> To: 'Lucene Users List'
> Subject: RE: Attribute Search
>
>
> I am new here too but here's my 2 cents.
> If you don't tokenize your db textvalues, what do you say will be the
> resulting terms indexed? I think not what you expect.
> Your non tokenized fields probably are not filtered out hence a lastname
> like 'Smith' will not be a hit if the query is 'smith' the search being
case
> sensitive.
> I last name is "smith B" (middle initial), search on 'smith' won't return
> either because not a token.
> I suggest you double check your values in your DB especially if DB is case
> sensitive.
> Does your analyser takes into account the accent if Latin type of locale?
>
>
> -----Original Message-----
> From: Cecil, Paula New [mailto:cnew@fuse.net]
> Sent: Monday, November 19, 2001 9:47 PM
> To: LUCENE Text Search
> Subject: Attribute Search
>
>
> I am trying index a set of data, storing only a "primary key". This
primary
> key I left un-indexed. There is one "text" field, that I indexed and
> tokenized.
>
> The others I neither want to store or tokenized. My reasoning was that
"not
> tokenizing" would produce the smallest index. The remaining fields were
> lastname, firstname, etc.
>
> However, my queries did not work correctly; never returning any hits.
>
> I finally gave up and re-indexed with Tokenize set to true on all the
> fields.
>
> Now my queries work. And to my surprise, the index was smaller that when
I
> did not tokenize.
>
> I found this a little counter-intuitive.
>
> Can someone explain this?
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Attribute Search [ In reply to ]
this is exactly what I was doing. Store=false, index=true, and token=false.

This combination is *not* represented by one of the factory methods. It
appeared to work ok, but searches *never* returned any hits.

That's why I suspect it is a bug.

-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl]
Sent: Wednesday, November 21, 2001 2:51 PM
To: Lucene Users List
Subject: Re: Attribute Search


Paula,

>I came across a tutorial which had some details on the static factory Field
>methods. But none of the factory methods return a Field object with the
>following settings:
>Store => false
>Index => true
>Tokenize => false
>
>I'm beginning to think this is a bug - that this combination is handled
>correctly.

The Field() constructor is public, can't you use that instead of one
of the factory methods?

public Field(String name,
String string,
boolean store,
boolean index,
boolean token)

Regards,
Ype

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Attribute Search [ In reply to ]
> From: New, Cecil (GEAE) [mailto:cecil.new@ae.ge.com]
>
> this is exactly what I was doing. Store=false, index=true,
> and token=false.
>
> It appeared to work ok, but searches *never* returned any hits.
>
> That's why I suspect it is a bug.

If you think this is a bug, please submit a test case, as a simple class
whose 'main()' method illustrates the problem.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>