Hi everyone,
I am getting an
org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException, while
trying to insert a list with 9 elements, of which one is 242905 bytes long,
into Solr. I am aware that StrField has a hard limit of slightly less than
32k. I am using a TextField that by my understanding hasn't got such a
limit, as tested here
<https://stackoverflow.com/questions/32936361/in-solr-what-is-the-maximum-size-of-a-text-field>
(taking into consideration that the field wasn't multivalued). So I'm
wondering what is the correlation here, and how could it be solved? Below I
have the error and the relevant part of the solr managed_schema. I am still
new to Solr so take into account that there could be something obvious I am
missing.
ERROR:
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException",
"error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException",
"root-error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException"],
"msg":"Async exception during distributed update: Error from
server at http://solr-host:8983/solr/search_collection_xx: Bad Request
\n\n request: http://solr-host:8983/solr/search_collection_xx \n\n
Remote error message: Exception writing document id <document_id> to
the index; possible analysis error: Document contains at least one
immense term in field=\"text_field_name\" (whose UTF8 encoding is
longer than the max length 32766), all of which were skipped. Please
correct the analyzer to not produce such terms. The prefix of the
first immense term is: '[.115, 97, 115, 109, 101, 45, 100, 97, 109,
101, 46, 99, 111, 109, 47, 108, 121, 99, 107, 97, 47, 37, 50, 50, 37,
50, 48, 109, 101, 116]...', original message: bytes can be at most
32766 in length; got 242905. Perhaps the document has an indexed
string field (solr.StrField) which is too large",
"code":400}
}
relevant managed_schema:
<dynamicField name="text_field_*" indexed="true" stored="true"
multiValued="true" type="case_insensitive_text" />
<fieldType name="case_insensitive_text" class="solr.TextField"
multiValued="false">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Best regards,
Marko
I am getting an
org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException, while
trying to insert a list with 9 elements, of which one is 242905 bytes long,
into Solr. I am aware that StrField has a hard limit of slightly less than
32k. I am using a TextField that by my understanding hasn't got such a
limit, as tested here
<https://stackoverflow.com/questions/32936361/in-solr-what-is-the-maximum-size-of-a-text-field>
(taking into consideration that the field wasn't multivalued). So I'm
wondering what is the correlation here, and how could it be solved? Below I
have the error and the relevant part of the solr managed_schema. I am still
new to Solr so take into account that there could be something obvious I am
missing.
ERROR:
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException",
"error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException",
"root-error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException"],
"msg":"Async exception during distributed update: Error from
server at http://solr-host:8983/solr/search_collection_xx: Bad Request
\n\n request: http://solr-host:8983/solr/search_collection_xx \n\n
Remote error message: Exception writing document id <document_id> to
the index; possible analysis error: Document contains at least one
immense term in field=\"text_field_name\" (whose UTF8 encoding is
longer than the max length 32766), all of which were skipped. Please
correct the analyzer to not produce such terms. The prefix of the
first immense term is: '[.115, 97, 115, 109, 101, 45, 100, 97, 109,
101, 46, 99, 111, 109, 47, 108, 121, 99, 107, 97, 47, 37, 50, 50, 37,
50, 48, 109, 101, 116]...', original message: bytes can be at most
32766 in length; got 242905. Perhaps the document has an indexed
string field (solr.StrField) which is too large",
"code":400}
}
relevant managed_schema:
<dynamicField name="text_field_*" indexed="true" stored="true"
multiValued="true" type="case_insensitive_text" />
<fieldType name="case_insensitive_text" class="solr.TextField"
multiValued="false">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Best regards,
Marko