Mailing List Archive

Lucene -- Working with dynamic data
Hi,

We have been using the hibernate search(internally uses Lucene) and Apache
Lucene in your project. Our Application is a Network Management
Application. We would be getting the Alarms,Traps,Clients(like mobile
devices, laptops in a network),etc.. in the network.

Basically these data are very dynamic in nature. Our Search in the
application would search for any of these Alarms,Traps,Clients,etc...

Over a period of time, there will be lot of records available and the older
one doesnt make sense to search and list.

Hence the data older than a day would be pruned through the sql queries and
not through hibernate to make the things faster. Now What we did is, rather
than updating the index, we just delete the older index and rebuild the
entire index again so that its faster w.r.t to both searching and indexing.

But to index, it takes around 10-15 mins to reindex the whole data.

Now the question is, Is there any standard solutions to address this kind
of problem ?

How do we deal with the dynamic data with lucene where-in we need to prune
the records in the database ???

Any suggestions ??

Regards
Prathib Kumar.

Regards
Prathib Kumar.
Re: Lucene -- Working with dynamic data [ In reply to ]
Hi Prathib,

I'm not sure this is standard one, but why don't you use Solr's distributed search
(or SolrCloud)? You can have multiple indexes and each index corresponds to the appropriate
time period, e.g. a month.

For example, you have 12 indexes for the past 12 months and one current active index
for this month. You can search throughout indexes using distributed search.

0. current active index for this month (Sep/2015)
1. an index for the last month (Aug/2015)
2. an index for (Jul/2015)
:
12. an index for (Sep/2015)

In this system, you can focus only on this month for maintenance. And at the beginning
of the next month, you start maintaining the new active index for Oct/2015, at the same
time, you throw away the oldest (Sep/2015) index from your list.

regards,

Koji

On 2015/09/02 8:46, Prathib Kumar wrote:
> Hi,
>
> We have been using the hibernate search(internally uses Lucene) and Apache
> Lucene in your project. Our Application is a Network Management
> Application. We would be getting the Alarms,Traps,Clients(like mobile
> devices, laptops in a network),etc.. in the network.
>
> Basically these data are very dynamic in nature. Our Search in the
> application would search for any of these Alarms,Traps,Clients,etc...
>
> Over a period of time, there will be lot of records available and the older
> one doesnt make sense to search and list.
>
> Hence the data older than a day would be pruned through the sql queries and
> not through hibernate to make the things faster. Now What we did is, rather
> than updating the index, we just delete the older index and rebuild the
> entire index again so that its faster w.r.t to both searching and indexing.
>
> But to index, it takes around 10-15 mins to reindex the whole data.
>
> Now the question is, Is there any standard solutions to address this kind
> of problem ?
>
> How do we deal with the dynamic data with lucene where-in we need to prune
> the records in the database ???
>
> Any suggestions ??
>
> Regards
> Prathib Kumar.
>
> Regards
> Prathib Kumar.
>
Re: Lucene -- Working with dynamic data [ In reply to ]
Hi Prathib,

As this is a monitoring application, have you considered a stored search
solution? We created Luwak (based on Lucene) for exactly this purpose:
https://github.com/flaxsearch/luwak - note that Luwak will (very shortly)
build with Lucene 5.3 rather than the fork we created, as the requisite
features are now in a release build of Lucene.

Cheers

Charlie

On 2 September 2015 at 00:46, Prathib Kumar <kgprathib@gmail.com> wrote:

> Hi,
>
> We have been using the hibernate search(internally uses Lucene) and Apache
> Lucene in your project. Our Application is a Network Management
> Application. We would be getting the Alarms,Traps,Clients(like mobile
> devices, laptops in a network),etc.. in the network.
>
> Basically these data are very dynamic in nature. Our Search in the
> application would search for any of these Alarms,Traps,Clients,etc...
>
> Over a period of time, there will be lot of records available and the older
> one doesnt make sense to search and list.
>
> Hence the data older than a day would be pruned through the sql queries and
> not through hibernate to make the things faster. Now What we did is, rather
> than updating the index, we just delete the older index and rebuild the
> entire index again so that its faster w.r.t to both searching and indexing.
>
> But to index, it takes around 10-15 mins to reindex the whole data.
>
> Now the question is, Is there any standard solutions to address this kind
> of problem ?
>
> How do we deal with the dynamic data with lucene where-in we need to prune
> the records in the database ???
>
> Any suggestions ??
>
> Regards
> Prathib Kumar.
>
> Regards
> Prathib Kumar.
>
RE: Lucene -- Working with dynamic data [ In reply to ]
Then again you could revisit the decision to use Lucene and look at OSS Network Management solutions. Of course if your events aren't under governance of some kind, I can understand why you want a search engine. FWIW: I can't imagine a professional system engineer at AOL/MicroSoft/Comcast or any other place I've worked not looking at me sideways if I built alerting that needed a search engine instead of industry standard (proprietary and open source) system engineering methods.

G'luck
Will

-----Original Message-----
From: Charlie Hull [mailto:charlie@flax.co.uk]
Sent: Wednesday, September 02, 2015 3:21 AM
To: general@lucene.apache.org
Subject: Re: Lucene -- Working with dynamic data

Hi Prathib,

As this is a monitoring application, have you considered a stored search solution? We created Luwak (based on Lucene) for exactly this purpose:
https://github.com/flaxsearch/luwak - note that Luwak will (very shortly) build with Lucene 5.3 rather than the fork we created, as the requisite features are now in a release build of Lucene.

Cheers

Charlie

On 2 September 2015 at 00:46, Prathib Kumar <kgprathib@gmail.com> wrote:

> Hi,
>
> We have been using the hibernate search(internally uses Lucene) and
> Apache Lucene in your project. Our Application is a Network Management
> Application. We would be getting the Alarms,Traps,Clients(like mobile
> devices, laptops in a network),etc.. in the network.
>
> Basically these data are very dynamic in nature. Our Search in the
> application would search for any of these Alarms,Traps,Clients,etc...
>
> Over a period of time, there will be lot of records available and the
> older one doesnt make sense to search and list.
>
> Hence the data older than a day would be pruned through the sql
> queries and not through hibernate to make the things faster. Now What
> we did is, rather than updating the index, we just delete the older
> index and rebuild the entire index again so that its faster w.r.t to both searching and indexing.
>
> But to index, it takes around 10-15 mins to reindex the whole data.
>
> Now the question is, Is there any standard solutions to address this
> kind of problem ?
>
> How do we deal with the dynamic data with lucene where-in we need to
> prune the records in the database ???
>
> Any suggestions ??
>
> Regards
> Prathib Kumar.
>
> Regards
> Prathib Kumar.
>
Re: Lucene -- Working with dynamic data [ In reply to ]
Sorry to revive this old thread, just catching up and couldn't resist
commenting:

I can't name customers but there's actually many companies with proper
"professional engineers" like the ones you mention, and even some of
those you mention, which actually use Lucene for such purposes.
Including many use Hibernate Search, like Prathib in the original
question.

To answer the original question, if it's still useful:
Hibernate Search doesn't allow a "delete by query" as there are
various consistency risks when it comes to distribution and
transaction handling, but it does allow a "delete by term". You could
just label all your indexed documents with a non-tokenized "date"
field, and periodically run a delete query by Term on the past days;
that should be very efficient.

I'll be happy to implement also a "delete by range query"
functionality; similarly to the TermQuery that's a safe option
although not implemented yet.

Sanne

On 2 September 2015 at 09:42, Will Martin <wmartinusa@gmail.com> wrote:
> Then again you could revisit the decision to use Lucene and look at OSS Network Management solutions. Of course if your events aren't under governance of some kind, I can understand why you want a search engine. FWIW: I can't imagine a professional system engineer at AOL/MicroSoft/Comcast or any other place I've worked not looking at me sideways if I built alerting that needed a search engine instead of industry standard (proprietary and open source) system engineering methods.
>
> G'luck
> Will
>
> -----Original Message-----
> From: Charlie Hull [mailto:charlie@flax.co.uk]
> Sent: Wednesday, September 02, 2015 3:21 AM
> To: general@lucene.apache.org
> Subject: Re: Lucene -- Working with dynamic data
>
> Hi Prathib,
>
> As this is a monitoring application, have you considered a stored search solution? We created Luwak (based on Lucene) for exactly this purpose:
> https://github.com/flaxsearch/luwak - note that Luwak will (very shortly) build with Lucene 5.3 rather than the fork we created, as the requisite features are now in a release build of Lucene.
>
> Cheers
>
> Charlie
>
> On 2 September 2015 at 00:46, Prathib Kumar <kgprathib@gmail.com> wrote:
>
>> Hi,
>>
>> We have been using the hibernate search(internally uses Lucene) and
>> Apache Lucene in your project. Our Application is a Network Management
>> Application. We would be getting the Alarms,Traps,Clients(like mobile
>> devices, laptops in a network),etc.. in the network.
>>
>> Basically these data are very dynamic in nature. Our Search in the
>> application would search for any of these Alarms,Traps,Clients,etc...
>>
>> Over a period of time, there will be lot of records available and the
>> older one doesnt make sense to search and list.
>>
>> Hence the data older than a day would be pruned through the sql
>> queries and not through hibernate to make the things faster. Now What
>> we did is, rather than updating the index, we just delete the older
>> index and rebuild the entire index again so that its faster w.r.t to both searching and indexing.
>>
>> But to index, it takes around 10-15 mins to reindex the whole data.
>>
>> Now the question is, Is there any standard solutions to address this
>> kind of problem ?
>>
>> How do we deal with the dynamic data with lucene where-in we need to
>> prune the records in the database ???
>>
>> Any suggestions ??
>>
>> Regards
>> Prathib Kumar.
>>
>> Regards
>> Prathib Kumar.
>>
>