> From: Brian Goetz [mailto:brian@quiotix.com]
>
> I'd like to see the existing test programs converted into
> JUnit test cases
> -- I'm willing to do this if someone will tell me how they
> work and what
> they're supposed to output and how to invoke them.
These are mostly things that I wrote years ago when first developing Lucene,
e.g., to test the index code before the search code was written, etc. They
were mostly never really standalone test programs, but rather things whose
output I would eyeball for correctness (e.g. DocTest), things that I would
use for benchmarking and profiling when exploring different implementations
of things (e.g. IndexTest, TermEnumTest). Some serve both purposes
(AnalysisTest, PriorityQueueTest). Still others are stress tests
(ThreadSafetyTest). If you have questions about a particular one I'd be
glad to tell you what I can remember about it.
I agree that we should build a test suite. This code could provide a
starting point, but some work will be required. Useful tests might be:
- Storage Tests: check that FSDirectory and RAMDirectory can write, read
and seek files of various sizes.
- Analysis Tests: check that various analyzers generate the expected
tokens.
- Index Tests: build an index based on generated or static data, and check
that it contains the information that it should.
- Search Tests: make sure that searches return all the matching documents.
Perhaps different folks can volunteer to write tests for different areas of
the code? I would be happy to do any of one of these, but not all four!
In addition it would be good to have some performance tests. We could
either download a reference collection, or use a synthetic document
generator that could efficiently generate streams of terms distributed
according to Zipf's law. A generator-based approach would let you specify,
e.g. average document length. Then we could time analysis, index and search
tasks over reasonably large indexes. This could be done periodically to
make sure that changes don't make things slower, and would be useful for
benchmarking Lucene performance on various platforms. Does anyone have a
good document generator? Or can anyone suggest a good, preferably
plain-text, static test collection?
Doug
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
> I'd like to see the existing test programs converted into
> JUnit test cases
> -- I'm willing to do this if someone will tell me how they
> work and what
> they're supposed to output and how to invoke them.
These are mostly things that I wrote years ago when first developing Lucene,
e.g., to test the index code before the search code was written, etc. They
were mostly never really standalone test programs, but rather things whose
output I would eyeball for correctness (e.g. DocTest), things that I would
use for benchmarking and profiling when exploring different implementations
of things (e.g. IndexTest, TermEnumTest). Some serve both purposes
(AnalysisTest, PriorityQueueTest). Still others are stress tests
(ThreadSafetyTest). If you have questions about a particular one I'd be
glad to tell you what I can remember about it.
I agree that we should build a test suite. This code could provide a
starting point, but some work will be required. Useful tests might be:
- Storage Tests: check that FSDirectory and RAMDirectory can write, read
and seek files of various sizes.
- Analysis Tests: check that various analyzers generate the expected
tokens.
- Index Tests: build an index based on generated or static data, and check
that it contains the information that it should.
- Search Tests: make sure that searches return all the matching documents.
Perhaps different folks can volunteer to write tests for different areas of
the code? I would be happy to do any of one of these, but not all four!
In addition it would be good to have some performance tests. We could
either download a reference collection, or use a synthetic document
generator that could efficiently generate streams of terms distributed
according to Zipf's law. A generator-based approach would let you specify,
e.g. average document length. Then we could time analysis, index and search
tasks over reasonably large indexes. This could be done periodically to
make sure that changes don't make things slower, and would be useful for
benchmarking Lucene performance on various platforms. Does anyone have a
good document generator? Or can anyone suggest a good, preferably
plain-text, static test collection?
Doug
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>