Mailing List Archive

Index File structure, in particular TermInfo
Hi,

I'm using Lucene 1.4.3 Java version.

In order to solve some particular problems, I'm trying to access the cfs
file directly from outside the Java framework.
However reading the tis file turns out to be difficult:

I tried to follow
http://lucene.apache.org/java/docs/fileformats.html

and successfully read the first entries, but then there was a problem. I
then found in the source code (TermInfosWriter), that SkipDelta
is sometimes omitted. After fixing this problem, there apparently is still
another problem occurring after several hundred entries.
It looks like ProxDelta is missing too in these cases.

However I didn't find this in the source.

Therefore my question is whether there are exceptions from the scheme
given on the fileformats page:





1. TermInfos --> <TermInfo>TermCount
TermInfo --> <Term, DocFreq, FreqDelta, ProxDelta, SkipDelta>
Term --> <PrefixLength, Suffix, FieldNum>



Note: I'm reading tis, not tii at the moment, but maybe this is related.

Thanks,

Wolfgang