Mailing List Archive

Should I use one or many index's?
Hello, I'm just starting out with lucene and need to decide how to organize
my index's..

I have a project where many users (1000's) will need to search only there
own documents, maybe upto 1000 documents each at a time

I'm trying to decide if I should go with 1 large index or one smaller index
for each user, I would guess that 1 smaller index per user would give faster
search results, but is it a good idea with regards to using many index
reader/writers at once performance/memory wise? - with the 1 big index
approach I can just keep 1 of each open globaly. Also is there big
performance loss opening/closing readers as a user needs them?

I guess my options are really just:
1. One large index / One global reader/writer.
2. Many index's / many global readers/writers.
3. Many index's / open&close index's as needed

any advice appreciated,
thanks
Anthony





--
View this message in context: http://www.nabble.com/Should+I+use+one+or+many+index%27s--t1684400.html#a4568925
Sent from the Lucene - General forum at Nabble.com.
Re: Should I use one or many index's? [ In reply to ]
If the users only should have access to search their own documents, it would
probably make sense to keep their respective index locally. Besides greater
query speed, it would also simplify things when updating/appending the
index. So, that would mean one index, one IndexModifier and one
IndexSearcher living on each persons computer. Or am I missing something?

Fredrik


On 5/26/06, acorcutt <acorcutt@gmail.com> wrote:
>
>
> Hello, I'm just starting out with lucene and need to decide how to
> organize
> my index's..
>
> I have a project where many users (1000's) will need to search only there
> own documents, maybe upto 1000 documents each at a time
>
> I'm trying to decide if I should go with 1 large index or one smaller
> index
> for each user, I would guess that 1 smaller index per user would give
> faster
> search results, but is it a good idea with regards to using many index
> reader/writers at once performance/memory wise? - with the 1 big index
> approach I can just keep 1 of each open globaly. Also is there big
> performance loss opening/closing readers as a user needs them?
>
> I guess my options are really just:
> 1. One large index / One global reader/writer.
> 2. Many index's / many global readers/writers.
> 3. Many index's / open&close index's as needed
>
> any advice appreciated,
> thanks
> Anthony
>
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/Should+I+use+one+or+many+index%27s--t1684400.html#a4568925
> Sent from the Lucene - General forum at Nabble.com.
>
>
Re: Should I use one or many index's? [ In reply to ]
sorry I did not mention this will be on a web server.. so if its one index
per user it would mean many indexmodifiers running at once on the server.
I guess mainly im trying to determine if running many indexmodifiers is a
bad thing - I can see that smaller indexs will be quicker, but would that
speed be lost by running maybe 1000 at once?
--
View this message in context: http://www.nabble.com/Should+I+use+one+or+many+index%27s--t1684400.html#a4574646
Sent from the Lucene - General forum at Nabble.com.
Re: Should I use one or many index's? [ In reply to ]
Ok, I figured you had some setup like that.

Personally, I would prefer one large index. The overhead associated with
opening/closing/managing thousands of searchers/modifiers is much bigger
than to incorporate the personal restriction in the query. Also, you risk
running out of filepointers, depending on your OS.
If your documents are indexed like (userId, title, text, date), it's no
hassle to search the subset of a certain userId even if the index is several
million records big, by using a MultiFieldQuery which enforces the userId
field.
I have personally deployed this tactic in several projects where we've
handled user restrictions / subset searches, and it's very convinient.
Regarding the IndexModifier - you can only have one modifier open at once
since they take write-looks. So, you would probably want to have a "crawler"
which updates the index continously while you have an arbitrary amount of
IndexSearchers (read-only) attached to the very same index.

Anyway, my $0.02!



On 5/26/06, acorcutt <acorcutt@gmail.com> wrote:
>
>
> sorry I did not mention this will be on a web server.. so if its one index
> per user it would mean many indexmodifiers running at once on the server.
> I guess mainly im trying to determine if running many indexmodifiers is a
> bad thing - I can see that smaller indexs will be quicker, but would that
> speed be lost by running maybe 1000 at once?
> --
> View this message in context:
> http://www.nabble.com/Should+I+use+one+or+many+index%27s--t1684400.html#a4574646
> Sent from the Lucene - General forum at Nabble.com.
>
>
Re: Should I use one or many index's? [ In reply to ]
ok thanks, im going to have to look into how lucene handles concurrent
inserts/updates/deletes if i go for 1 big index - i need them near
real-time, from multiple users at once. With smaller indexes i dont see a
problem with the updates as most would be less than 100mb and only 1 user
updating it.
at the moment sqlserver is looking a better option.
--
View this message in context: http://www.nabble.com/Should+I+use+one+or+many+index%27s--t1684400.html#a4584615
Sent from the Lucene - General forum at Nabble.com.