Mailing List Archive

suggester for multivalued (tags) field
Hi Everyone

I have a use case where I need to provide suggestions (autocomplete) on
multivalued tags. For example if i have documents as below

docId| users | tags (multivalued/array field)
-----------------------------------------------------------
doc1 [user1, user2,user3] one, two, thirty
doc2 [user1, user3, user5] two, twenty five, four
doc3 [user2, user4] thirty, forty nine, twenty

Query:
--------
{
prefix : "tw"
users: ["user1"]
}

Expected Output: (filter + no duplicates)
----------------------
"two",
"twenty five"

I am curious to know if i can use the FST based suggester
(FSTCompletionLookup). I would like to filter by a few fields (if not
many). I looked into elasticsearch and context suggester (
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#context-suggester)
seem to be a good fit.

As the FST is maintained in a heap, I am wondering will this scale well and
will it create GC issues or some other scaling issues for me in future.
Here are some things i would like to understand

1. How does adding a context affect the performance and memory footprint ?
Does it create one FST for each unique combination of context ?
2. What is the recommendation for the number of shards (if i decide to put
this in a separate index)? should i keep the number of shards minimum ?
3. Does it scale well horizontally ? As the index size grows can i add more
machines and expect the system to scale well ?

Any explanation of the internal implementation detail would also help me
understand it better. I read
http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html
to get the overall idea. I would appreciate some practical advice on if
this is a good approach.

I am also curious to hear if there are alternatives to elasticsearch that
people have used to provide suggestions on multi valued fields.

Thanks
Srini