Nothing changed between two index generations except the data changed a
bit as i described.
When Lucene is done generating index, that is what i am reporting as the
size of the directory where all index files are stored.
I dont know about deleted docs? How do you trace that? yes the queries
run exactly the same way (same number of results) most of the time the
order is just changed which is fine; or some few different entries show
up and i dont know why since lowecase filter should normalize even if
original data casing changes.
Yes absolutely sure nothing else changed. i kept all those things the
same across two runs.
actually does lucene repository have these kinda experiments accross
versions (major or minor versions)?
if i were lucene i would do these experiments to see the impact on index
end results. this will help find out some potential un-indentified bugs.
Methodology:
have a large dataset like 15 million docs
run index at each time a new version comes out with very common settings.
i am not using solr, pure lucene 7.7.2. these info were in the other
email here. let me copy paste here:
===== previous email ====
On a related issue:
i experience that with Version 7.7.2 i experienced this:
data is all lower case (same amount of docs as next case though)
vs
data is camel case except last word always in capital letters
but i used in indexer the lowercase filter in both cases so indexing is
done with all lower cases and i saw the first case's index size for case
is like 9.5GB
but same data size for second case was 11GB.
what causes such difference and increase in index size? amount of docs
are the same in both cases.
Best regards
On 11/13/20 7:39 AM, Erick Erickson wrote:
> What does “final finished sizes” mean? After optimize of just after finishing all indexing?
> The former is what counts here.
>
> And you provided no information on the number of deleted docs in the two cases. Is
> the number of deletedDocs the same (or close)? And does the q=*:* query
> return the same numFound?
>
> Finally, are you absolutely and totally sure that no other options changed. For instance,
> you specified docValues=true for some field in one but not the other. Or stored=true
> etc. If you’re using the same schema.
>
> And you also haven’t provided information on what versions of Solr you’re talking about.
> You mention 7.7.2, but not the _other_ version of solr. If you’re going from one major
> version to another, sometimes defaults change for docValues on primitive fields
> especially. I’d consider firing up Luke and examining the field definitions in
> detail.
>
> Best,
> Erick
>
>> On Nov 13, 2020, at 12:16 AM, baris.kazar@oracle.com wrote:
>>
>> Hi,-
>> Thanks.
>> These are final finished sizes in both cases.
>> Best regards
>>
>>
>>> On Nov 12, 2020, at 11:12 PM, Erick Erickson <erickerickson@gmail.com> wrote:
>>>
>>> ?Yes, that issue is fixed. The “Resolution” tag is the key, it’s marked “fixed” and the version is 8.0
>>>
>>> As for your other question, index size is a very imprecise number. How many deleted documents are there
>>> in each case? Deleted documents take up disk space until the segments containing them are merged away.
>>>
>>> Best,
>>> Erick
>>>
>>>> On Nov 12, 2020, at 5:35 PM, baris.kazar@oracle.com wrote:
>>>>
>>>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/LUCENE-8448__;!!GqivPVa7Brio!I3RsAXIoDcPmpP_sc8C29vn8DcAXSvIgH7pvcxyDaBnfhdJAk24zPpQhqP035V1IJA$
>>>>
>>>>
>>>> Hi,-
>>>>
>>>> is this issue fixed please? Could You please help me figure it out?
>>>>
>>>> Best regards
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org