Mailing List Archive

Queries with large number of hits.
So I have been testing search times of KinoSearch -r3875 vs Solr..
I'm not trying to do a full test matrix just testing it on a few
things I would like to use KinoSearch for.

I'm pretty limited in my Solr knowledge and for that matter KinoSearch
but from the few tests I have done.
KinoSearch Seems to do pretty well when the number of hits (doc hits) are low...

But as the hits go up it starts falling back quickly. Searching on one
term with one value. (foo:commonvalue),
Once I start asking for more that one value both with large hits it
gets real bad. i.e. (foo:common OR foo:morecommon)

So now I have made claims... :)
I'll try to give more details.

My test.
I built a small index (about 1M docs) in both indexes trying to set
the index up with the same settings in both.
I removed the solr Doc Caching,Query Caching (my simple Search app
did no caching)
They are both running on the same box with the indexes on the same raid drive
The indexes are optimized and read only (no writes touching them)
The box has 8G Ram and doing nothing else (both indexes are for sure
in disk cache)
Java was Given 1G Ram (after all the testing using about Res:57M Virt:1194m)
My FastCGI worker after all the test was using < 9M of Ram
I wrote a simple test perl script to query the index's this script
using keep alives to the searchers (one at a time) and runs the same
search n-times returning only 1 Hit result per search and times the
whole process..
Ran this script on another box doing nothing but my test.

So as you can see this whole "test" is pretty simple with many
possible holes to try and get this Apples Vs Oranges test running.


But regardless.. After adding some profiling in my FastCGI app it
seems pretty clear 99% of the time for these large result searches are
in the $searcher->search(query => $query, num_wanted => 1) as one
would expect. Is there anything I can do to make these searches
perform better? What data could I provide to help?

Thanks,
-Dan

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
Quoting Dan <dmarkham@gmail.com>:
> Once I start asking for more that one value both with large hits it
> gets real bad. i.e. (foo:common OR foo:morecommon)

When you say 'real bad', what exactly do you mean? Can you provide
some comparative performance times?

Henry


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
Sure...
All times are for 1000 Searches.

Small Hit(112) count (bag:belly)
Solr: 2.12 Sec.
Kino: 2.41 Sec.
That Looks Great 415/sec vs 472/sec..

Large Hit(783284) (bag:usa)
Solr: 23.81
Kino: 88.48

Mutli Large Hit(789870) terms (bag:usa OR bag:us OR bag:ca)
Solr: 76.47
Kino: 264.51


thanks,

-Dan






On Sun, Sep 14, 2008 at 10:41 AM, Henka <henka@cityweb.co.za> wrote:
> Quoting Dan <dmarkham@gmail.com>:
>>
>> Once I start asking for more that one value both with large hits it
>> gets real bad. i.e. (foo:common OR foo:morecommon)
>
> When you say 'real bad', what exactly do you mean? Can you provide some
> comparative performance times?
>
> Henry
>
>
> _______________________________________________
> KinoSearch mailing list
> KinoSearch@rectangular.com
> http://www.rectangular.com/mailman/listinfo/kinosearch
>

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
On Sep 13, 2008, at 1:56 PM, Dan wrote:

> So now I have made claims... :)
> I'll try to give more details.

In my book, benchmarking claims presented without code, corpus, stats,
raw data, and detailed methodological descriptions qualify as
"anecdotal evidence". If you have a scientific background, you know
what that means: not to be ignored, but requiring a high degree of
skepticism and not particularly useful.

> So as you can see this whole "test" is pretty simple with many
> possible holes to try and get this Apples Vs Oranges test running.

KinoSearch is a low-level engine analogous to Lucene; Solr is a higher-
level library built on top of Lucene that does a lot of extra stuff,
including copious caching.

A comparison of Lucene to KinoSearch would be more germane from a
development standpoint. By using Solr rather than Lucene, you've
polluted the experiment with an extra layer of variables. I actually
think that testing with all of Solr's default caching mechanisms *on*
would be more interesting in a sense than what we've gotten from you
so far. It wouldn't be helpful for development in terms of
identifying optimization opportunities within KS, but it might be more
interesting for decision makers.

> Is there anything I can do to make these searches perform better?

There are a couple of known issues that on the todo list that affect
search speed. One is a bugfix (SegPList_Skip_To had to be temporarily
disabled due to corrupt .skip files), and the other is a design flaw,
described in <http://www.mail-archive.com/java-dev@lucene.apache.org/msg15825.html
>. Additionally, implementing the PForDelta compression algorithm
for postings should speed up searching, but I'd planned to put that off.

However, measuring progress on those issues using a closed source
benchmark with "many possible holes" would be foolish. If we're going
to do benchmarking at all, we're going to do it right: <http://www.rectangular.com/kinosearch/benchmarks.html
>.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
> In my book, benchmarking claims presented without code, corpus, stats, raw
> data, and detailed methodological descriptions qualify as "anecdotal
> evidence". If you have a scientific background, you know what that means:
> not to be ignored, but requiring a high degree of skepticism and not
> particularly useful.

Agreed.

>> Is there anything I can do to make these searches perform better?
>
> There are a couple of known issues that on the todo list that affect search
> speed. One is a bugfix (SegPList_Skip_To had to be temporarily disabled due
> to corrupt .skip files), and the other is a design flaw, described in
> <http://www.mail-archive.com/java-dev@lucene.apache.org/msg15825.html>.
> Additionally, implementing the PForDelta compression algorithm for postings
> should speed up searching, but I'd planned to put that off.

These sound interesting. Seems I have more reading to do, thanks for
the pointers.

Thanks,

-Dan

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
On Sun, Sep 14, 2008 at 4:36 PM, Marvin Humphrey <marvin@rectangular.com> wrote:
>> Is there anything I can do to make these searches perform better?
>
> There are a couple of known issues that on the todo list that affect search
> speed. One is a bugfix (SegPList_Skip_To had to be temporarily disabled due
> to corrupt .skip files), and the other is a design flaw, described in
> <http://www.mail-archive.com/java-dev@lucene.apache.org/msg15825.html>.
> Additionally, implementing the PForDelta compression algorithm for postings
> should speed up searching, but I'd planned to put that off.

Hi Marvin ---

Taking Dan's tests at face value, for the moment, I'm not quite
understanding how the issues you are pointing at would affect speed
this much. It seems like his chosen terms can't be occurring so many
times per document that the extra position decoding could be this
significant. But maybe I'm not understanding the Lucene thread well
enough. Is the Lucene position data kept in a separate stream? Or is
it just not processed until requested?

Dan, my quick summary as a long-term observer is that there would be
no unsolveable reason that KinoSearch should be significantly slower
than Solr here, presuming you do indeed have caching turned off. If
it is this much slower, it's probably a bug that can be fixed, and
Marvin is remarkable about fixing well-reported bugs quickly. If
creating a real benchmark (a good idea) seems too difficult, finding
the hotstop with something like Oprofile might be a good way to focus
his attention

Nathan Kurz
nate@verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
> Dan, my quick summary as a long-term observer is that there would be
> no unsolveable reason that KinoSearch should be significantly slower
> than Solr here, presuming you do indeed have caching turned off. If
> it is this much slower, it's probably a bug that can be fixed

I'm not 100% sure about my setup but do *think* I have them off ... I
still have a lot to learn..
It's no doubting my numbers and test methods are highly questionable!

> Marvin is remarkable about fixing well-reported bugs quickly.

Agreed!


> If creating a real benchmark (a good idea) seems too difficult, finding
> the hotstop with something like Oprofile might be a good way to focus
> his attention

Both may be over my head at the moment but thats not going to stop me
from trying. :)

Thanks,

-Dan

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
On Sep 14, 2008, at 10:02 PM, Nathan Kurz wrote:

> Taking Dan's tests at face value, for the moment, I'm not quite
> understanding how the issues you are pointing at would affect speed
> this much.

I don't think addressing those items would have that level of impact,
either. It's really, really easy to screw up these kind of
comparative benchmarks, though. Before I published the indexing
benchmarks, I submitted the Lucene app to the lucene dev list for
critiquing and even after all the grilling it got there we STILL
missed a crucial bug in it.

That said, I wouldn't surprise me if current Lucene search-time
performance exceeded that of KS trunk at least until the issues I
listed are addressed -- I just don't know by how much.

> It seems like his chosen terms can't be occurring so many
> times per document that the extra position decoding could be this
> significant.

The extra positional decoding is probably big enough to think about.
No way it could account for a fourfold discrepancy though. More like
5% - 20%.

> Is the Lucene position data kept in a separate stream?

Exactly.

The dev branch of KinoSearch implements the "flexible indexing" model
described at <http://wiki.apache.org/lucene-java/FlexibleIndexing>,
where doc number, frequency, positions, and boost all reside in one
unified file (per field). In contrast, each Lucene segment has...

* One .frq file which holds document number and term frequency info.
* One .prx file which holds positions data.
* One file per field holding boost data. These files are lazily
slurped into RAM as soon as they are needed and cached for the life
of the IndexReader.

We knew about the extra-positions-overhead problem from the start, but
we figured it would be enough if we gave people the option of
disabling positions on a per-field basis. My take now, having since
put flexible indexing into practice, is that ad-hoc disabling not a
practical approach. You need multiple streams.

> If creating a real benchmark (a good idea) seems too difficult,

We can get started with a benchmark for simple term queries against
the Reuters corpus.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
> the hotstop with something like Oprofile might be a good way to focus his attention

I warmed up the index then reset the opreport and ran the query once..

Here is the report for that one query.


[root@spare04 ~]# opreport -alf
/usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/auto/KinoSearch/KinoSearch.so
CPU: Core 2, speed 2000 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
unit mask of 0x00 (Unhalted core cycles) count 100000

samples cum. samples % cum. % symbol name
1737 1737 15.9915 15.9915 kino_ScorerDocQ_top_next
1393 3130 12.8245 28.8161 kino_ScorePost_read_record
1156 4286 10.6426 39.4587 kino_InStream_read_u8
808 5094 7.4388 46.8974 advance_after_current
796 5890 7.3283 54.2257 kino_SegPList_next
780 6670 7.1810 61.4067 kino_InStream_read_c32
715 7385 6.5826 67.9893 kino_ScorePostScorer_tally
693 8078 6.3800 74.3694 __i686.get_pc_thunk.bx
602 8680 5.5423 79.9116 kino_TermScorer_next
583 9263 5.3673 85.2790 .plt
466 9729 4.2902 89.5691 kino_ORScorer_tally
314 10043 2.8908 92.4600 kino_Scorer_collect
237 10280 2.1819 94.6419 kino_TDColl_collect
236 10516 2.1727 96.8146 kino_MemMan_wrapped_realloc
213 10729 1.9610 98.7755 kino_SegPList_get_posting
84 10813 0.7733 99.5489 kino_ORScorer_next
11 10824 0.1013 99.6502 read_internal
8 10832 0.0737 99.7238 kino_FSFileDes_seek
7 10839 0.0644 99.7883 refill
4 10843 0.0368 99.8251 kino_InStream_tell
3 10846 0.0276 99.8527 kino_FSFileDes_read
2 10848 0.0184 99.8711 kino_InStream_read_c64
1 10849 0.0092 99.8803 kino_CB_destroy
1 10850 0.0092 99.8895 kino_CB_equals_str
1 10851 0.0092 99.8987 kino_CB_hash_code
1 10852 0.0092 99.9079 kino_CB_iter_init
1 10853 0.0092 99.9171 kino_CB_vcatf
1 10854 0.0092 99.9263 kino_FSFolder_real_file_exists
1 10855 0.0092 99.9356 kino_Hash_fetch
1 10856 0.0092 99.9448 kino_LexStepper_read_record
1 10857 0.0092 99.9540 kino_Obj_inc_refcount
1 10858 0.0092 99.9632 kino_SegReader_max_docs
1 10859 0.0092 99.9724 kino_Sim_idf
1 10860 0.0092 99.9816 kino_TermQuery_destroy
1 10861 0.0092 99.9908 kino_VA_fetch
1 10862 0.0092 100.000 kino_ViewCB_nip_one

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
On Wed, Sep 17, 2008 at 12:23 PM, Dan <dmarkham@gmail.com> wrote:
> I warmed up the index then reset the opreport and ran the query once..
>
> Here is the report for that one query.

Thanks for posting that Dan. Looks great! This is presumably for one
of the expensive queries?

My quick impression is that while there is probably room improvement
here, there is nothing terribly amiss. Streaming the data from the
index is taking about 2/3 of the time, and the actual searching is
taking about 1/3. This is expensive, but nothing short of an
massively impractical mmap'd uncompressed data format :) is going to
get rid of that whole 2/3's. But since the processing time is
probably close to proportional to the file size, maybe this is where
Lucene has the advantage.

An interesting quick test might be to try some phrase queries. As
Marvin pointed out, Lucene keeps the position data in a separate file
thus doesn't have to deal with it in the queries you are testing. If
the KinoSearch time stays about the same, but the Lucene time jumps
significantly, this would implicate the single file architecture.
Re-indexing KinoSearch without positions and re-running your previous
searches would also be an inverse way to test this hypothesis.

Nathan Kurz
nate@verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
On Sep 17, 2008, at 1:16 PM, Nathan Kurz wrote:

> But since the processing time is
> probably close to proportional to the file size, maybe this is where
> Lucene has the advantage.

I've now performed the experiment pitting Lucene's TermQuery against
KinoSearch's. The results were worse than expected, and extra
positions decoding overhead seem to be a bigger factor than I'd
thought it would be. For a somewhat common term ("Reuters" in the
Reuters corpus), KS is 3x slower. For a very common term ("the"),
it's around 6x slower, and that's worse than it sounds because the 6x
factor is multiplying a number that's big to begin with.

Proliferating positions account for the non-constant slowdown: "the"
occurs at many positions, and thus has greater per-document scanning
cost in comparison to "Reuters", which most likely appears once per
document.

> Re-indexing KinoSearch without positions and re-running your previous
> searches would also be an inverse way to test this hypothesis.


The way to do this in KS is to override FieldSpec->posting() in a
subclass so that it returns a MatchPosting instead of a ScorePosting.
MatchPosting was a placeholder until today, but now a provisional
implementation has been completed for testing purposes.

Happily, with MatchPosting, we get much closer to Lucene performance:
around 1.3x for "Reuters" and around 1.7x for "the".
MatchPostingScorer doesn't presently take document/field boost into
account, so it's doing a little less work than Lucene's TermScorer,
but nevertheless, the results are instructive.

Since we're still slower than Lucene even with MatchPosting, we may
want to work on optimizing TermScorer. Probably we need some sort of
bulk read -- a la what Lucene does, see below. We may also want to
restructure things so that we're reading at the level of PostingList
rather than Posting to cut down on method call overhead from nesting.
Right now, Lucene's TermScorer.next() calls TermDocs.read(int[],int[])
once every 20 calls. In contrast, KinoSearch's TermScorer_Next calls
PList_Next() and PList_Get_Posting() every time.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

/** Advances to the next document matching the query.
* <br>The iterator over the matching documents is buffered using
* {@link TermDocs#read(int[],int[])}.
* @return true iff there is another document matching the query.
*/
public boolean next() throws IOException {
pointer++;
if (pointer >= pointerMax) {
pointerMax = termDocs.read(docs, freqs); // refill buffer
if (pointerMax != 0) {
pointer = 0;
} else {
termDocs.close(); // close stream
doc = Integer.MAX_VALUE; // set to sentinel
value
return false;
}
}
doc = docs[pointer];
return true;
}


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
On Thu, Sep 18, 2008 at 10:25 PM, Marvin Humphrey
<marvin@rectangular.com> wrote:
> I've now performed the experiment pitting Lucene's TermQuery against
> KinoSearch's. The results were worse than expected, and extra positions
> decoding overhead seem to be a bigger factor than I'd thought it would be.

Thanks for your quick research and improvements, Marvin. You prompted
me to run a few quick tests as well:

kinosearch_instream/perl$ search "the OR and OR of OR for" (ten times)
kinosearch_instream/perl$ opreport -alt2 */KinoSearch.so
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples cum. samples % cum. % symbol name
190 190 23.8394 23.8394 kino_InStream_read_c32
112 302 14.0527 37.8921 kino_ScorePost_read_record
68 370 8.5320 46.4241 kino_ScorerDocQ_top_next
55 425 6.9009 53.3250 kino_TermScorer_next
51 476 6.3990 59.7240 kino_InStream_read_u8
45 521 5.6462 65.3701 kino_SegPList_next
31 552 3.8896 69.2597 __i686.get_pc_thunk.bx
30 582 3.7641 73.0238 advance_after_current
29 611 3.6386 76.6625 anonymous symbol from section .plt
25 636 3.1368 79.7992 kino_MemMan_wrapped_realloc
22 658 2.7604 82.5596 kino_ScorePostScorer_tally
17 675 2.1330 84.6926 kino_ORScorer_tally

"opannotate --source */KinoSearch.so" is also useful to glance at, as
is adding the '-c' flag to opreport to see where the calls are coming
from and where the functions are spending their time internally.

The main thing that jumped out is that the function call to
Instream_read_c32 is killing us. I don't see any way to have this
remain a per-int function call and still get good performance. You
need to figure out some to decode the entire posting with fewer
function calls, and this is where the bulk of them are coming from.
I'd suggest having the Store level return an undecoded raw posting,
and let the Posting class decode the parts it wants. That way the
VByte code can be macros that work on a local buffer in a tight loop.
I'm sure there are other ways to do it, though.

The second thing that jumps out is that decompressing VBytes is
expensive. P4Delta might be a significant advantage, or perhaps there
are ways to optimize the decompression with the existing scheme. I
toyed a bit with trying to come up with a branchless way of doing it,
but gave up without much to show for it.

The third thing (tiny, but perhaps easy to fix) is that
Scorepost_read_record is spending 40% of its time in REALLOC. Is the
enlarged position buffer not getting reused for some reason?

The last thing (sort of a non-thing) is that to my surprise the
double/triple buffering of the Posting data doesn't seem to have much
of a negative effect. I still think it's worth trying to avoid this,
but it's barely on the current radar.

> The way to do this in KS is to override FieldSpec->posting() in a subclass
> so that it returns a MatchPosting instead of a ScorePosting. MatchPosting
> was a placeholder until today, but now a provisional implementation has been
> completed for testing purposes.
> Happily, with MatchPosting, we get much closer to Lucene performance: around
> 1.3x for "Reuters" and around 1.7x for "the".

Did you also recreate an index without the position information, or
are these times based on just skipping over the position info in the
index? Times would probably be closer if the information was
out-of-line.

Nathan Kurz
nate@verse.com

ps. The directions for building the Reuters benchmark index seem out
of date. '-Mblib' no longer finds the uninstalled KinoSearch.so in
the parent hierarchy.

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
On Sep 19, 2008, at 11:25 AM, Nathan Kurz wrote:

> The third thing (tiny, but perhaps easy to fix) is that
> Scorepost_read_record is spending 40% of its time in REALLOC. Is the
> enlarged position buffer not getting reused for some reason?

Oi, good catch! With one line of code, we see a 10-20% search-time
speed improvement:

Index: ../c_src/KinoSearch/Posting/ScorePosting.c
===================================================================
--- ../c_src/KinoSearch/Posting/ScorePosting.c (revision 3882)
+++ ../c_src/KinoSearch/Posting/ScorePosting.c (working copy)
@@ -145,6 +145,7 @@
num_prox = self->freq;
if (num_prox > self->prox_cap) {
self->prox = REALLOCATE(self->prox, num_prox, u32_t);
+ self->prox_cap = num_prox;
}
positions = self->prox;

> ps. The directions for building the Reuters benchmark index seem out
> of date. '-Mblib' no longer finds the uninstalled KinoSearch.so in
> the parent hierarchy.

I'll try to get updates committed later this evening.

Incidentally, although there are c. 19,000 unique documents in the
Reuters corpus, the indexing benchmarker will loop if you specify a
larger number, e.g. --docs=1000000.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
On Sep 19, 2008, at 11:25 AM, Nathan Kurz wrote:

> You prompted me to run a few quick tests as well:

Great stuff, Nathan.

I'd like to move the design discussion to the Lucy developers list, so
I'm going to direct my main reply there. The ultimate home for the
work that's gone into KinoSearch is probably going to be Apache. I
can put whatever I want into the rectangular.com KinoSearch subversion
repository regardless, but stuff that goes into Lucy has to have been
discussed in Apache channels. So -- there's no drawback for KS if we
have our conversation at Apache, but there's a benefit for Lucy.

What these benchmarks have shown is that the present implementation of
"Flexible Indexing" in KinoSearch just isn't fast enough at search-
time. Change of some kind is required. PForDelta merits serious
consideration, but it's a substantial break from the current design.

Interested parties not already subscribed to lucy-dev can sign up via <http://lucene.apache.org/lucy/mailing_lists.html
>.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Queries with large number of hits. [ In reply to ]
This change made a big difference for me also thanks.


-Dan


> Oi, good catch! With one line of code, we see a 10-20% search-time speed
> improvement:
>
> Index: ../c_src/KinoSearch/Posting/ScorePosting.c
> ===================================================================
> --- ../c_src/KinoSearch/Posting/ScorePosting.c (revision 3882)
> +++ ../c_src/KinoSearch/Posting/ScorePosting.c (working copy)
> @@ -145,6 +145,7 @@
> num_prox = self->freq;
> if (num_prox > self->prox_cap) {
> self->prox = REALLOCATE(self->prox, num_prox, u32_t);
> + self->prox_cap = num_prox;
> }
> positions = self->prox;
>
>> ps. The directions for building the Reuters benchmark index seem out
>> of date. '-Mblib' no longer finds the uninstalled KinoSearch.so in
>> the parent hierarchy.
>
> I'll try to get updates committed later this evening.
>
> Incidentally, although there are c. 19,000 unique documents in the Reuters
> corpus, the indexing benchmarker will loop if you specify a larger number,
> e.g. --docs=1000000.
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
> _______________________________________________
> KinoSearch mailing list
> KinoSearch@rectangular.com
> http://www.rectangular.com/mailman/listinfo/kinosearch
>

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch