Mailing List Archive

Simultaneous Indexing and searching
Hi there,

I am beginner for using Lucene especially in the area of Indexing and searching simultaneously.

Our environment is that we have several webserver for the search front-end that submit search request and also a backend server that do the full text indexing; whereas the index files are stored in a NFS volume such that both the indexing and searchs are pointing to this same NFS volume. The indexing may happen whenever something new documents comes in or get updated.

Our project requires that both indexing and searching can be happened at the same time (or the blocking should be as short as possible, e.g. under a second)

We have search through the Internet and found something like this references:
http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html
http://blog.mikemccandless.com/2011/11/near-real-time-readers-with-lucenes.html

but seems those only apply to indexing and search in the same server (correct me if I am wrong).

Could somebody tell me how to implement such system, e.g. what Lucene classes to be used and the caveat, or how to setup ,etc?

Regards
Richard
Re: Simultaneous Indexing and searching [ In reply to ]
So ... this is a fairly complex topic I can't really cover it in depth
here; how to architect a distributed search engine service. Most
people opt to use Solr or Elasticsearch since they solve that problem
for you. Those systems work best when the indexes are local to the
service that is accessing them, and build systems to distribute data
internally; distributing via NFS is generally not a *good idea* (tm),
although it may work most of the time. In your case, have you
considered building a search service that runs on the same box as your
indexer and responds to queries from the web server(s)?

On Tue, Sep 1, 2020 at 11:13 AM Richard So
<brothersevenlonglegs@hotmail.com> wrote:
>
> Hi there,
>
> I am beginner for using Lucene especially in the area of Indexing and searching simultaneously.
>
> Our environment is that we have several webserver for the search front-end that submit search request and also a backend server that do the full text indexing; whereas the index files are stored in a NFS volume such that both the indexing and searchs are pointing to this same NFS volume. The indexing may happen whenever something new documents comes in or get updated.
>
> Our project requires that both indexing and searching can be happened at the same time (or the blocking should be as short as possible, e.g. under a second)
>
> We have search through the Internet and found something like this references:
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html
> http://blog.mikemccandless.com/2011/11/near-real-time-readers-with-lucenes.html
>
> but seems those only apply to indexing and search in the same server (correct me if I am wrong).
>
> Could somebody tell me how to implement such system, e.g. what Lucene classes to be used and the caveat, or how to setup ,etc?
>
> Regards
> Richard
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Simultaneous Indexing and searching [ In reply to ]
FWIW, I agree with Michael: this is not a simple problem and there's been a
lot of effort in Elasticsearch and Solr to solve it in a robust way. If you
can't use ES/solr, I believe there are some posts on the ES blog about how
they write/delete/merge shards (Lucene indices).

On Tue, Sep 1, 2020 at 11:40 AM Michael Sokolov <msokolov@gmail.com> wrote:

> So ... this is a fairly complex topic I can't really cover it in depth
> here; how to architect a distributed search engine service. Most
> people opt to use Solr or Elasticsearch since they solve that problem
> for you. Those systems work best when the indexes are local to the
> service that is accessing them, and build systems to distribute data
> internally; distributing via NFS is generally not a *good idea* (tm),
> although it may work most of the time. In your case, have you
> considered building a search service that runs on the same box as your
> indexer and responds to queries from the web server(s)?
>
> On Tue, Sep 1, 2020 at 11:13 AM Richard So
> <brothersevenlonglegs@hotmail.com> wrote:
> >
> > Hi there,
> >
> > I am beginner for using Lucene especially in the area of Indexing and
> searching simultaneously.
> >
> > Our environment is that we have several webserver for the search
> front-end that submit search request and also a backend server that do the
> full text indexing; whereas the index files are stored in a NFS volume such
> that both the indexing and searchs are pointing to this same NFS volume.
> The indexing may happen whenever something new documents comes in or get
> updated.
> >
> > Our project requires that both indexing and searching can be happened at
> the same time (or the blocking should be as short as possible, e.g. under a
> second)
> >
> > We have search through the Internet and found something like this
> references:
> >
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html
> >
> http://blog.mikemccandless.com/2011/11/near-real-time-readers-with-lucenes.html
> >
> > but seems those only apply to indexing and search in the same server
> (correct me if I am wrong).
> >
> > Could somebody tell me how to implement such system, e.g. what Lucene
> classes to be used and the caveat, or how to setup ,etc?
> >
> > Regards
> > Richard
> >
> >
> >
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Simultaneous Indexing and searching [ In reply to ]
Also can check out https://github.com/zuliaio/zuliasearch for a thinner
approach closer to native lucene or just to see examples of using lucene.

On Wed, Sep 2, 2020, 11:24 AM Alex K <aklibisz@gmail.com> wrote:

> FWIW, I agree with Michael: this is not a simple problem and there's been a
> lot of effort in Elasticsearch and Solr to solve it in a robust way. If you
> can't use ES/solr, I believe there are some posts on the ES blog about how
> they write/delete/merge shards (Lucene indices).
>
> On Tue, Sep 1, 2020 at 11:40 AM Michael Sokolov <msokolov@gmail.com>
> wrote:
>
> > So ... this is a fairly complex topic I can't really cover it in depth
> > here; how to architect a distributed search engine service. Most
> > people opt to use Solr or Elasticsearch since they solve that problem
> > for you. Those systems work best when the indexes are local to the
> > service that is accessing them, and build systems to distribute data
> > internally; distributing via NFS is generally not a *good idea* (tm),
> > although it may work most of the time. In your case, have you
> > considered building a search service that runs on the same box as your
> > indexer and responds to queries from the web server(s)?
> >
> > On Tue, Sep 1, 2020 at 11:13 AM Richard So
> > <brothersevenlonglegs@hotmail.com> wrote:
> > >
> > > Hi there,
> > >
> > > I am beginner for using Lucene especially in the area of Indexing and
> > searching simultaneously.
> > >
> > > Our environment is that we have several webserver for the search
> > front-end that submit search request and also a backend server that do
> the
> > full text indexing; whereas the index files are stored in a NFS volume
> such
> > that both the indexing and searchs are pointing to this same NFS volume.
> > The indexing may happen whenever something new documents comes in or get
> > updated.
> > >
> > > Our project requires that both indexing and searching can be happened
> at
> > the same time (or the blocking should be as short as possible, e.g.
> under a
> > second)
> > >
> > > We have search through the Internet and found something like this
> > references:
> > >
> >
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html
> > >
> >
> http://blog.mikemccandless.com/2011/11/near-real-time-readers-with-lucenes.html
> > >
> > > but seems those only apply to indexing and search in the same server
> > (correct me if I am wrong).
> > >
> > > Could somebody tell me how to implement such system, e.g. what Lucene
> > classes to be used and the caveat, or how to setup ,etc?
> > >
> > > Regards
> > > Richard
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
Re: Simultaneous Indexing and searching [ In reply to ]
Hi Richard,

it seems like lucene index replication could help you here: you could
create the index on the backend server and replicate it to the frontend
servers.

http://shaierera.blogspot.com/2013/05/the-replicator.html
<http://shaierera.blogspot.com/2013/05/the-replicator.html>
http://blog.mikemccandless.com/2017/09/lucenes-near-real-time-segment-index.html
<http://blog.mikemccandless.com/2017/09/lucenes-near-real-time-segment-index.html>


This way, the frontend servers can issue queries against their copy of
the index, and the backend server can perform updates that are then
replicated to the frontend server indexes.

Best regards
Christoph

On 01.09.2020 08:28, Richard So wrote:
> Hi there,
>
> I am beginner for using Lucene especially in the area of Indexing and searching simultaneously.
>
> Our environment is that we have several webserver for the search front-end that submit search request and also a backend server that do the full text indexing; whereas the index files are stored in a NFS volume such that both the indexing and searchs are pointing to this same NFS volume. The indexing may happen whenever something new documents comes in or get updated.
>
> Our project requires that both indexing and searching can be happened at the same time (or the blocking should be as short as possible, e.g. under a second)
>
> We have search through the Internet and found something like this references:
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html
> http://blog.mikemccandless.com/2011/11/near-real-time-readers-with-lucenes.html
>
> but seems those only apply to indexing and search in the same server (correct me if I am wrong).
>
> Could somebody tell me how to implement such system, e.g. what Lucene classes to be used and the caveat, or how to setup ,etc?
>
> Regards
> Richard
>
>
>
>
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org