Hello devs,
I'd like to have an API design discussion here on the dev list concerning
adding asynchronous request semantics to SolrClient. I think the
SolrClient APIs require great peer-review consideration and so I'm having
this discussion here instead of a JIRA issue.
Background: Sometimes, a SolrJ user (including Solr itself) may wish to
issue a request but not block/wait for the response because it can do other
useful work before eventually needing the results. For example, maybe the
user has multiple requests to send to Solr that can be done in parallel and
_then_ process the results. This is in fact what Solr needs internally for
during distributed-search (see HttpShardHandler) and during
distributed-indexing (see ConcurrentUpdateHttp2SolrClient).
Why?: Greater efficiency. This can be accomplished easily in a generic
fashion with threads (e.g.
CompletableFuture.supplyAsync(Supplier,Executor)) with use of the
SolrClient concurrently. However, this can result in lots of threads and
context switching. I really like Cao Dat's explanation in the description
here: https://issues.apache.org/jira/browse/SOLR-14354 Thanks to HTTP/2,
there is a new approach in which the underlying HTTP client can handle the
asynchronous nature more efficiently with minimal threads. Complete
plumbing of this implies that SolrClient (and/or its derivatives) need a
similar async mechanism. Disclaimer: I haven't measured any of this but
am piggybacking off of Cao's's insights on SOLR-14354
Where?: An async API _could_ go only on some SolrClient classes that
natively support it, avoiding changing SolrClient itself. Maybe this is
what should occur first before "graduating" the method to SolrClient where
we devise a default approach, although I would prefer just putting it on
SolrClient. The default approach might be configured to throw
UnsupportedOperationException, or perhaps might simply use an Executor to
get it done in an obvious way (assuming we can get ahold of an Executor
somewhere?). If you're writing a delegating SolrClient (which I've done in
the past), you might want to take-care to delegate this method.
Another aspect of "Where" is whether SolrRequest should additionally have
an API alternative to "process" which is currently synchronous and calls
out to SolrClient.request(this).
What?: What should this API look like? This is my primary interest right
now and the meat of the discussion.
SolrClient's primary signature that actually does the work (synchronously)
is this:
public abstract NamedList<Object>
request(@SuppressWarnings({"rawtypes"})final SolrRequest request,
String collection)
throws SolrServerException, IOException;
I don't like the "collection" there as it's redundant with either the
configured SolrClient default or a setting in SolrRequest, but I digress
from the important matter at hand.
In SOLR-14354, recently committed by Cao Dat destined for Solr 8.7, there
is a new (undocumented) method on Http2SolrClient (*not* SolrClient):
public Cancellable asyncRequest(@SuppressWarnings({"rawtypes"})
SolrRequest solrRequest, String collection,
AsyncListener<NamedList<Object>> asyncListener) {
So firstly, it's only on Http2SolrClient, which means if you're using
CloudHttp2SolrClient (which does not subclass it, counterintuitively to
users) then too bad -- SOLR-14675 filed by my colleague.
Secondly, I really dislike this API signature (sorry Cao). It introduces
two custom and obscure Solr interfaces, Cancellable and AsyncListener. I
claim that we can and should use one abstraction provided by the JDK --
CompletableFuture<NamedList<Object>>. The user can cancel() it if they
like, they can use it like a Future which is maybe what they prefer, or
they can attach a listener / handler via whenComplete(lambda) that even has
access to an exception if there was one. Based on some toying around with
CompletableFuture lately, I suspect what would be most straight-forward is
to have an asyncRequest method that *returns* a CompletableFuture, and
which merely takes the SolrRequest parameter and nothing else.
Alternatively the client could supply a CompletableFuture parameter that
Solr will call complete() on etc. but that seems a bit less natural to the
notion of a method that returns it's results, albeit with a wrapper.
Proposal:
public CompletableFuture<NamedList<Object>>
requestAsync(SolrRequest<?> request);
BTW I'm trying to avoid implementation details here. My objective is to
devise a user-friendly API, and my hope is that the eventual implementation
is reasonable.
I hope I haven't bored you all to tears and that you find this public
discussion useful.
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
I'd like to have an API design discussion here on the dev list concerning
adding asynchronous request semantics to SolrClient. I think the
SolrClient APIs require great peer-review consideration and so I'm having
this discussion here instead of a JIRA issue.
Background: Sometimes, a SolrJ user (including Solr itself) may wish to
issue a request but not block/wait for the response because it can do other
useful work before eventually needing the results. For example, maybe the
user has multiple requests to send to Solr that can be done in parallel and
_then_ process the results. This is in fact what Solr needs internally for
during distributed-search (see HttpShardHandler) and during
distributed-indexing (see ConcurrentUpdateHttp2SolrClient).
Why?: Greater efficiency. This can be accomplished easily in a generic
fashion with threads (e.g.
CompletableFuture.supplyAsync(Supplier,Executor)) with use of the
SolrClient concurrently. However, this can result in lots of threads and
context switching. I really like Cao Dat's explanation in the description
here: https://issues.apache.org/jira/browse/SOLR-14354 Thanks to HTTP/2,
there is a new approach in which the underlying HTTP client can handle the
asynchronous nature more efficiently with minimal threads. Complete
plumbing of this implies that SolrClient (and/or its derivatives) need a
similar async mechanism. Disclaimer: I haven't measured any of this but
am piggybacking off of Cao's's insights on SOLR-14354
Where?: An async API _could_ go only on some SolrClient classes that
natively support it, avoiding changing SolrClient itself. Maybe this is
what should occur first before "graduating" the method to SolrClient where
we devise a default approach, although I would prefer just putting it on
SolrClient. The default approach might be configured to throw
UnsupportedOperationException, or perhaps might simply use an Executor to
get it done in an obvious way (assuming we can get ahold of an Executor
somewhere?). If you're writing a delegating SolrClient (which I've done in
the past), you might want to take-care to delegate this method.
Another aspect of "Where" is whether SolrRequest should additionally have
an API alternative to "process" which is currently synchronous and calls
out to SolrClient.request(this).
What?: What should this API look like? This is my primary interest right
now and the meat of the discussion.
SolrClient's primary signature that actually does the work (synchronously)
is this:
public abstract NamedList<Object>
request(@SuppressWarnings({"rawtypes"})final SolrRequest request,
String collection)
throws SolrServerException, IOException;
I don't like the "collection" there as it's redundant with either the
configured SolrClient default or a setting in SolrRequest, but I digress
from the important matter at hand.
In SOLR-14354, recently committed by Cao Dat destined for Solr 8.7, there
is a new (undocumented) method on Http2SolrClient (*not* SolrClient):
public Cancellable asyncRequest(@SuppressWarnings({"rawtypes"})
SolrRequest solrRequest, String collection,
AsyncListener<NamedList<Object>> asyncListener) {
So firstly, it's only on Http2SolrClient, which means if you're using
CloudHttp2SolrClient (which does not subclass it, counterintuitively to
users) then too bad -- SOLR-14675 filed by my colleague.
Secondly, I really dislike this API signature (sorry Cao). It introduces
two custom and obscure Solr interfaces, Cancellable and AsyncListener. I
claim that we can and should use one abstraction provided by the JDK --
CompletableFuture<NamedList<Object>>. The user can cancel() it if they
like, they can use it like a Future which is maybe what they prefer, or
they can attach a listener / handler via whenComplete(lambda) that even has
access to an exception if there was one. Based on some toying around with
CompletableFuture lately, I suspect what would be most straight-forward is
to have an asyncRequest method that *returns* a CompletableFuture, and
which merely takes the SolrRequest parameter and nothing else.
Alternatively the client could supply a CompletableFuture parameter that
Solr will call complete() on etc. but that seems a bit less natural to the
notion of a method that returns it's results, albeit with a wrapper.
Proposal:
public CompletableFuture<NamedList<Object>>
requestAsync(SolrRequest<?> request);
BTW I'm trying to avoid implementation details here. My objective is to
devise a user-friendly API, and my hope is that the eventual implementation
is reasonable.
I hope I haven't bored you all to tears and that you find this public
discussion useful.
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley