Mailing List Archive

index file of lucene8.7 is larger than the 7.7
Hi, everyone.
I found that the index file of lucene8.7 is larger than the 7.7 version:
My data source: lucene/demo/src/test/org/apache/lucene/demo/test-files/docs
The index code is as follows:
InputStream stream = Files.newInputStream(file)
Document doc = new Document();
Field pathField = new StringField("path", file.toString(), Field.Store.YES);
doc.add(pathField);
doc.add(new LongPoint("modified", lastModified));
doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(stream, StandardCharsets.UTF_8))));


Index size
8.7: 136K
7.7: 116K
I guess it is caused by LUCENE-9027?
Can anyone tell me why?
Re: index file of lucene8.7 is larger than the 7.7 [ In reply to ]
As a disclaimer, it can be misleading to draw conclusions on space
efficiency based on such a small index.

Can you compare file sizes by extension across 7.7 and 8.7? You might
need to call IndexWriterConfig#setUseCompoundFile(false) to prevent
the flush from wrapping your segment files in a compound file.

On Wed, Nov 17, 2021 at 6:28 AM xiaoshi <xiaoshi_2014@163.com> wrote:
>
> Hi, everyone.
> I found that the index file of lucene8.7 is larger than the 7.7 version:
> My data source: lucene/demo/src/test/org/apache/lucene/demo/test-files/docs
> The index code is as follows:
> InputStream stream = Files.newInputStream(file)
> Document doc = new Document();
> Field pathField = new StringField("path", file.toString(), Field.Store.YES);
> doc.add(pathField);
> doc.add(new LongPoint("modified", lastModified));
> doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(stream, StandardCharsets.UTF_8))));
>
>
> Index size
> 8.7: 136K
> 7.7: 116K
> I guess it is caused by LUCENE-9027?
> Can anyone tell me why?



--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org