While upgrading I ran afoul of some inconsistencies in our schema
usage, and to fix them I've ended up having to add data to our index
that I'd rather not. Let me give a little context: We have a
parent/child document structure. Some fields are shared across partn
and child docs, others are not. Our index has a sort key, and in order
for all the parent/child docs to sort together correctly, we add the
same (docvalues) fields that are part of the sortkey to both parent
and child docs. Some of these fields are *also* indexed as postings
(StringField) of the same name, but we only index the postings field
on the parent document, since child documents are never searched for
on their own - always in conjunction with a parent.
The schema-checking code we added in Lucene 9 does not allow this: it
enforces that all documents having a field should have the same "index
options", and failing to index the postings gets interpreted as having
index options = NONE (because of the presence of the doc values field
of the same name, I think?)
Our current solution is to also index the postings for the child
document (but just with an empty string value). This seems gross, and
creates postings in the index that we will never use.
Another possibility would be to rename the fields so that the postings
and docvalues fields have different names. But in this case our
application-level schema diverges from our Lucene schema, adding a
layer of complexity we'd rather not introduce.
Finally, could we relax this constraint, always allowing index
options=NONE regardless of how other docs are indexed? Would it cause
problems?
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
usage, and to fix them I've ended up having to add data to our index
that I'd rather not. Let me give a little context: We have a
parent/child document structure. Some fields are shared across partn
and child docs, others are not. Our index has a sort key, and in order
for all the parent/child docs to sort together correctly, we add the
same (docvalues) fields that are part of the sortkey to both parent
and child docs. Some of these fields are *also* indexed as postings
(StringField) of the same name, but we only index the postings field
on the parent document, since child documents are never searched for
on their own - always in conjunction with a parent.
The schema-checking code we added in Lucene 9 does not allow this: it
enforces that all documents having a field should have the same "index
options", and failing to index the postings gets interpreted as having
index options = NONE (because of the presence of the doc values field
of the same name, I think?)
Our current solution is to also index the postings for the child
document (but just with an empty string value). This seems gross, and
creates postings in the index that we will never use.
Another possibility would be to rename the fields so that the postings
and docvalues fields have different names. But in this case our
application-level schema diverges from our Lucene schema, adding a
layer of complexity we'd rather not introduce.
Finally, could we relax this constraint, always allowing index
options=NONE regardless of how other docs are indexed? Would it cause
problems?
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org