Mailing List Archive

Lucene TypeAttribute not used during querying
Hello,

I wonder why the TypeAttribute is not used for queries ?
It seems that it is used only during analysis.
Why it is not used in org.apache.lucene.index.Term ?

Paul Bédaride
RE: Lucene TypeAttribute not used during querying [ In reply to ]
Hi,

The type attribute is not stored in index. The main intention behind this attribute is to use it inside the analysis chain. E.g. you have some tokenizer/stemmer/whatever that sets the attribute. The last TokenFilter before indexing may then change the term accordingly (e.g. adding the type as a payload, or append it to the term itsself) to get the information into index - but this is mainly your task. The same applies for other language specific attributes (like Japanese ones). The keyword attribute is another example, it is also not indexed, but is solely used to control behavior of later TokenFilters (e.g. prevent stemming).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Paul Bedaride [mailto:paul.bedaride@xilopix.com]
> Sent: Wednesday, September 23, 2015 11:16 AM
> To: general@lucene.apache.org
> Subject: Lucene TypeAttribute not used during querying
>
> Hello,
>
> I wonder why the TypeAttribute is not used for queries ?
> It seems that it is used only during analysis.
> Why it is not used in org.apache.lucene.index.Term ?
>
> Paul Bédaride
Re: Lucene TypeAttribute not used during querying [ In reply to ]
Ok so it is not possible to store other part of information in the index
? like part-of-speach ?

Thanks for the fast answer

Paul

On 23/09/2015 11:21, Uwe Schindler wrote:
> Hi,
>
> The type attribute is not stored in index. The main intention behind this attribute is to use it inside the analysis chain. E.g. you have some tokenizer/stemmer/whatever that sets the attribute. The last TokenFilter before indexing may then change the term accordingly (e.g. adding the type as a payload, or append it to the term itsself) to get the information into index - but this is mainly your task. The same applies for other language specific attributes (like Japanese ones). The keyword attribute is another example, it is also not indexed, but is solely used to control behavior of later TokenFilters (e.g. prevent stemming).
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Paul Bedaride [mailto:paul.bedaride@xilopix.com]
>> Sent: Wednesday, September 23, 2015 11:16 AM
>> To: general@lucene.apache.org
>> Subject: Lucene TypeAttribute not used during querying
>>
>> Hello,
>>
>> I wonder why the TypeAttribute is not used for queries ?
>> It seems that it is used only during analysis.
>> Why it is not used in org.apache.lucene.index.Term ?
>>
>> Paul Bédaride
RE: Lucene TypeAttribute not used during querying [ In reply to ]
Not as attributes.

As said before, you have to write a separate TokenFilter at the end of your indexing chain, that collects the attributes you want to index and add them to the term:
- Append the type to term like: TokenFilter's incrementToken does something like: termAtt.append('#').append(typeAtt);
- Create a payload out of it: see payload package in analyzers-common for examples.

After that you can query using the "extended term" or using payload queries.

You may ask the question about how to query then: appending the type on the term itself (see above like "term#type") is no problem during search, because also on search side the analyzer is used. The search query gets analyzed and the last TokenFilter of the analyzer will add the type to the term as described before. The query will then hit all terms with that type.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Paul Bedaride [mailto:paul.bedaride@xilopix.com]
> Sent: Wednesday, September 23, 2015 11:38 AM
> To: general@lucene.apache.org
> Subject: Re: Lucene TypeAttribute not used during querying
>
> Ok so it is not possible to store other part of information in the index ? like
> part-of-speach ?
>
> Thanks for the fast answer
>
> Paul
>
> On 23/09/2015 11:21, Uwe Schindler wrote:
> > Hi,
> >
> > The type attribute is not stored in index. The main intention behind this
> attribute is to use it inside the analysis chain. E.g. you have some
> tokenizer/stemmer/whatever that sets the attribute. The last TokenFilter
> before indexing may then change the term accordingly (e.g. adding the type
> as a payload, or append it to the term itsself) to get the information into
> index - but this is mainly your task. The same applies for other language
> specific attributes (like Japanese ones). The keyword attribute is another
> example, it is also not indexed, but is solely used to control behavior of later
> TokenFilters (e.g. prevent stemming).
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Paul Bedaride [mailto:paul.bedaride@xilopix.com]
> >> Sent: Wednesday, September 23, 2015 11:16 AM
> >> To: general@lucene.apache.org
> >> Subject: Lucene TypeAttribute not used during querying
> >>
> >> Hello,
> >>
> >> I wonder why the TypeAttribute is not used for queries ?
> >> It seems that it is used only during analysis.
> >> Why it is not used in org.apache.lucene.index.Term ?
> >>
> >> Paul Bédaride
Re: Lucene TypeAttribute not used during querying [ In reply to ]
Paul -

There are a couple of TokenFilter’s of note in Lucene that leverage the “type” attribute that may help:

- TypeAsPayloadTokenFilter: sets the payload of the terms to the type attribute value
- TypeTokenFilter: includes or excludes tokens that have are in a designated set of types

And there’s also TokenTypeSinkFilter that can be used with a TeeSinkTokenFilter to tee tokens with a particular type.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com




> On Sep 23, 2015, at 5:38 AM, Paul Bedaride <paul.bedaride@xilopix.com> wrote:
>
> Ok so it is not possible to store other part of information in the index ? like part-of-speach ?
>
> Thanks for the fast answer
>
> Paul
>
> On 23/09/2015 11:21, Uwe Schindler wrote:
>> Hi,
>>
>> The type attribute is not stored in index. The main intention behind this attribute is to use it inside the analysis chain. E.g. you have some tokenizer/stemmer/whatever that sets the attribute. The last TokenFilter before indexing may then change the term accordingly (e.g. adding the type as a payload, or append it to the term itsself) to get the information into index - but this is mainly your task. The same applies for other language specific attributes (like Japanese ones). The keyword attribute is another example, it is also not indexed, but is solely used to control behavior of later TokenFilters (e.g. prevent stemming).
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: Paul Bedaride [mailto:paul.bedaride@xilopix.com]
>>> Sent: Wednesday, September 23, 2015 11:16 AM
>>> To: general@lucene.apache.org
>>> Subject: Lucene TypeAttribute not used during querying
>>>
>>> Hello,
>>>
>>> I wonder why the TypeAttribute is not used for queries ?
>>> It seems that it is used only during analysis.
>>> Why it is not used in org.apache.lucene.index.Term ?
>>>
>>> Paul Bédaride
>