Mailing List Archive

Autosuggest/Autocomplete: What are the best practices to build Suggester?
Hi

I recently started to use the Autosuggest/Autocomplete package as
suggested by Robert

https://www.mail-archive.com/java-user@lucene.apache.org/msg51403.html

which works very fine, thanks again for your help :-)

But it is not clear to me what are the best practices building a
suggester using an InputIterator

https://lucene.apache.org/core/8_10_1/suggest/org/apache/lucene/search/suggest/Lookup.html#build-org.apache.lucene.search.suggest.InputIterator-

regarding

- scalability
- thousands of terms
- thousands of contexts (including personalized contexts)
- updating during runtime (singleton / thread safe)

So far I do something as follows

entities.add(new Item("traffic accident","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),3) );
entities.add(new Item("event","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),2));
entities.add(new Item("person","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),4));
entities.add(new Item("coverage check","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("coverage","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("contract search","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("claims management system","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));

suggester.build(new ItemIterator(entities.iterator()));

whereas the terms associated with the context "public" are intended for
all contexts and the terms associated with the context
"a84581a3-302f-4b73-80d9-0e60da5238f9" are only for a private domain
context, in this example an insurance company.

Let's assume we have thousands of private domain contexts and the terms
keep changing continuously, because people upload new documents with new
terms into these contexts.

Will the current implementation of building the suggester using
InputIterator scale for such a situation?

I assumed/expected actually that the suggester is implemented like an
IndexReader/DirectoryReader for searching, which means for each context
I could have a separate "SuggesterDirectory", which can be updated
during runtime and scales easily.

Or do I misunderstand the current concept of how to build a suggester?

Thanks

Michael
Re: Autosuggest/Autocomplete: What are the best practices to build Suggester? [ In reply to ]
I just realize that I can set an index directory when constructing the
Suggester, for example

Directory indexDir = FSDirectory.open(indexDirPath);

AnalyzingInfixSuggester suggester =new AnalyzingInfixSuggester(indexDir, analyzer, analyzer,3,true);

and that I build the index using an ItemIterator when it does not exist
yet, for example

if (!indexDirPath.toFile().isDirectory() ||
indexDirPath.toFile().list().length == 0) {
      List<Item> entities = new ArrayList<Item>();

entities.add(new Item("traffic accident","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),3) );
entities.add(new Item("event","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),2));
entities.add(new Item("person","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),4));
entities.add(new Item("coverage check","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("coverage","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("contract search","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
entities.add(new Item("claims management system","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));

suggester.build(new ItemIterator(entities.iterator()));
)

I was a little confused, because all the implementation examples I found
were using an in-memory directory.

My bad, everything good now, thank you :-)

Michael



Am 18.11.21 um 09:47 schrieb Michael Wechner:
> Hi
>
> I recently started to use the Autosuggest/Autocomplete package as
> suggested by Robert
>
> https://www.mail-archive.com/java-user@lucene.apache.org/msg51403.html
>
> which works very fine, thanks again for your help :-)
>
> But it is not clear to me what are the best practices building a
> suggester using an InputIterator
>
> https://lucene.apache.org/core/8_10_1/suggest/org/apache/lucene/search/suggest/Lookup.html#build-org.apache.lucene.search.suggest.InputIterator-
>
>
> regarding
>
> - scalability
> - thousands of terms
> - thousands of contexts (including personalized contexts)
> - updating during runtime (singleton / thread safe)
>
> So far I do something as follows
>
> entities.add(new Item("traffic
> accident","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),3)
> );
> entities.add(new
> Item("event","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),2));
> entities.add(new
> Item("person","",asList("public","a84581a3-302f-4b73-80d9-0e60da5238f9"),4));
> entities.add(new Item("coverage
> check","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
> entities.add(new
> Item("coverage","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
> entities.add(new Item("contract
> search","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
> entities.add(new Item("claims management
> system","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1));
>
> suggester.build(new ItemIterator(entities.iterator()));
>
> whereas the terms associated with the context "public" are intended
> for all contexts and the terms associated with the context
> "a84581a3-302f-4b73-80d9-0e60da5238f9" are only for a private domain
> context, in this example an insurance company.
>
> Let's assume we have thousands of private domain contexts and the
> terms keep changing continuously, because people upload new documents
> with new terms into these contexts.
>
> Will the current implementation of building the suggester using
> InputIterator scale for such a situation?
>
> I assumed/expected actually that the suggester is implemented like an
> IndexReader/DirectoryReader for searching, which means for each
> context I could have a separate "SuggesterDirectory", which can be
> updated during runtime and scales easily.
>
> Or do I misunderstand the current concept of how to build a suggester?
>
> Thanks
>
> Michael