Mailing List Archive

Why Lucene's Suggest API can ONLY load field terms which is Store.YES?
I have a document `category` field, which is a "|,;" separator separated
string, in indexing phase, i do manually split the value into atomic terms
and index as StringField, & i also add a same name StoredField which
contains original value form:





*List<String> terms = analyzer.analysis((String)fieldValue); for(String
term: terms) { doc.add(new StringField(fieldName, term, Store.NO));
}doc.add(new StoredField(fieldName, (String)fieldValue));*

Then i use Suggest API to load this field's all terms:















* Set<String> terms = new HashSet<String>();
DocumentDictionary dict = new DocumentDictionary(this.indexReader,
fieldName, null); InputIterator it; try { it =
dict.getEntryIterator(); // BytesRef byteRef = null;
while((byteRef = it.next()) != null){ String term
= byteRef.utf8ToString(); terms.add(term); }
} catch (IOException e) { e.printStackTrace();
log.error(e.getMessage(), e); }*

To my supprise, terms seems only returning the STORED value, which is the
original value form, but i expect they should be the terms i put in each
StringField!

Is this a design miss or impl. limit?
Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES? [ In reply to ]
Hello,
It's by design: StringFields are searchable and filled by analysis output,
StoredFields are returned input values.
That's it.

On Fri, Dec 27, 2019 at 11:32 AM ??? <ctengctsh@gmail.com> wrote:

> I have a document `category` field, which is a "|,;" separator separated
> string, in indexing phase, i do manually split the value into atomic terms
> and index as StringField, & i also add a same name StoredField which
> contains original value form:
>
>
>
>
>
> *List<String> terms = analyzer.analysis((String)fieldValue); for(String
> term: terms) { doc.add(new StringField(fieldName, term, Store.NO));
> }doc.add(new StoredField(fieldName, (String)fieldValue));*
>
> Then i use Suggest API to load this field's all terms:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * Set<String> terms = new HashSet<String>();
> DocumentDictionary dict = new DocumentDictionary(this.indexReader,
> fieldName, null); InputIterator it; try { it =
> dict.getEntryIterator(); // BytesRef byteRef = null;
> while((byteRef = it.next()) != null){ String term
> = byteRef.utf8ToString(); terms.add(term); }
> } catch (IOException e) { e.printStackTrace();
> log.error(e.getMessage(), e); }*
>
> To my supprise, terms seems only returning the STORED value, which is the
> original value form, but i expect they should be the terms i put in each
> StringField!
>
> Is this a design miss or impl. limit?
>


--
Sincerely yours
Mikhail Khludnev
Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES? [ In reply to ]
But i feel very confused about this design: if i can search by some
indexable field, means there should be some terms stored somewhere, so i
should be able to get these terms as a Dictionary?

Lucene docs says it uses the same field name for 2 kinds of index data
store when set Store.YES, it seems treating them the same, here i have to
make 2 field names to compat the confusing and inner-conflicting design...

Mikhail Khludnev <mkhl@apache.org> ?2019?12?27??? ??5:05???

> Hello,
> It's by design: StringFields are searchable and filled by analysis output,
> StoredFields are returned input values.
> That's it.
>
> On Fri, Dec 27, 2019 at 11:32 AM ??? <ctengctsh@gmail.com> wrote:
>
> > I have a document `category` field, which is a "|,;" separator separated
> > string, in indexing phase, i do manually split the value into atomic
> terms
> > and index as StringField, & i also add a same name StoredField which
> > contains original value form:
> >
> >
> >
> >
> >
> > *List<String> terms = analyzer.analysis((String)fieldValue); for(String
> > term: terms) { doc.add(new StringField(fieldName, term, Store.NO));
> > }doc.add(new StoredField(fieldName, (String)fieldValue));*
> >
> > Then i use Suggest API to load this field's all terms:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > * Set<String> terms = new HashSet<String>();
> > DocumentDictionary dict = new DocumentDictionary(this.indexReader,
> > fieldName, null); InputIterator it; try { it =
> > dict.getEntryIterator(); // BytesRef byteRef =
> null;
> > while((byteRef = it.next()) != null){ String
> term
> > = byteRef.utf8ToString(); terms.add(term); }
> > } catch (IOException e) { e.printStackTrace();
> > log.error(e.getMessage(), e); }*
> >
> > To my supprise, terms seems only returning the STORED value, which is the
> > original value form, but i expect they should be the terms i put in each
> > StringField!
> >
> > Is this a design miss or impl. limit?
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES? [ In reply to ]
On Fri, Dec 27, 2019 at 12:12 PM ??? <ctengctsh@gmail.com> wrote:

> But i feel very confused about this design: if i can search by some
> indexable field, means there should be some terms stored somewhere, so i
> should be able to get these terms as a Dictionary?
>
Right. Here we go MultiTerms.getTerms()


>
> Lucene docs says it uses the same field name for 2 kinds of index data
> store when set Store.YES, it seems treating them the same, here i have to
> make 2 field names to compat the confusing and inner-conflicting design...
>
It might seems so. Almost everyone got though these doubts. I like to quote
this talk https://www.youtube.com/watch?v=T5RmMNDR5XI



>
> Mikhail Khludnev <mkhl@apache.org> ?2019?12?27??? ??5:05???
>
> > Hello,
> > It's by design: StringFields are searchable and filled by analysis
> output,
> > StoredFields are returned input values.
> > That's it.
> >
> > On Fri, Dec 27, 2019 at 11:32 AM ??? <ctengctsh@gmail.com> wrote:
> >
> > > I have a document `category` field, which is a "|,;" separator
> separated
> > > string, in indexing phase, i do manually split the value into atomic
> > terms
> > > and index as StringField, & i also add a same name StoredField which
> > > contains original value form:
> > >
> > >
> > >
> > >
> > >
> > > *List<String> terms = analyzer.analysis((String)fieldValue); for(String
> > > term: terms) { doc.add(new StringField(fieldName, term,
> Store.NO));
> > > }doc.add(new StoredField(fieldName, (String)fieldValue));*
> > >
> > > Then i use Suggest API to load this field's all terms:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > * Set<String> terms = new HashSet<String>();
> > > DocumentDictionary dict = new DocumentDictionary(this.indexReader,
> > > fieldName, null); InputIterator it; try { it =
> > > dict.getEntryIterator(); // BytesRef byteRef =
> > null;
> > > while((byteRef = it.next()) != null){ String
> > term
> > > = byteRef.utf8ToString(); terms.add(term); }
> > > } catch (IOException e) { e.printStackTrace();
> > > log.error(e.getMessage(), e); }*
> > >
> > > To my supprise, terms seems only returning the STORED value, which is
> the
> > > original value form, but i expect they should be the terms i put in
> each
> > > StringField!
> > >
> > > Is this a design miss or impl. limit?
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


--
Sincerely yours
Mikhail Khludnev