Mailing List Archive

Getting a MaxBytesLengthExceededException for a TextField
Hi everyone,

I am getting an
org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException, while
trying to insert a list with 9 elements, of which one is 242905 bytes long,
into Solr. I am aware that StrField has a hard limit of slightly less than
32k. I am using a TextField that by my understanding hasn't got such a
limit, as tested here
<https://stackoverflow.com/questions/32936361/in-solr-what-is-the-maximum-size-of-a-text-field>
(taking into consideration that the field wasn't multivalued). So I'm
wondering what is the correlation here, and how could it be solved? Below I
have the error and the relevant part of the solr managed_schema. I am still
new to Solr so take into account that there could be something obvious I am
missing.

ERROR:

"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException",
"error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException",
"root-error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException"],
"msg":"Async exception during distributed update: Error from
server at http://solr-host:8983/solr/search_collection_xx: Bad Request
\n\n request: http://solr-host:8983/solr/search_collection_xx \n\n
Remote error message: Exception writing document id <document_id> to
the index; possible analysis error: Document contains at least one
immense term in field=\"text_field_name\" (whose UTF8 encoding is
longer than the max length 32766), all of which were skipped. Please
correct the analyzer to not produce such terms. The prefix of the
first immense term is: '[.115, 97, 115, 109, 101, 45, 100, 97, 109,
101, 46, 99, 111, 109, 47, 108, 121, 99, 107, 97, 47, 37, 50, 50, 37,
50, 48, 109, 101, 116]...', original message: bytes can be at most
32766 in length; got 242905. Perhaps the document has an indexed
string field (solr.StrField) which is too large",
"code":400}
}

relevant managed_schema:

<dynamicField name="text_field_*" indexed="true" stored="true"
multiValued="true" type="case_insensitive_text" />

<fieldType name="case_insensitive_text" class="solr.TextField"
multiValued="false">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>


Best regards,
Marko
Re: Getting a MaxBytesLengthExceededException for a TextField [ In reply to ]
Text-based fields indeed do not have that limit for the _entire_ field. They _do_ have that limit for any single token produced. So if your field contains, say, a base-64 encoded image that is not broken up into smaller tokens, you’ll still get this error.

Best,
Erick

> On Oct 25, 2019, at 4:28 AM, Marko ?urlin <marko.curlin@reversinglabs.com> wrote:
>
> Hi everyone,
>
> I am getting an
> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException, while
> trying to insert a list with 9 elements, of which one is 242905 bytes long,
> into Solr. I am aware that StrField has a hard limit of slightly less than
> 32k. I am using a TextField that by my understanding hasn't got such a
> limit, as tested here
> <https://stackoverflow.com/questions/32936361/in-solr-what-is-the-maximum-size-of-a-text-field>
> (taking into consideration that the field wasn't multivalued). So I'm
> wondering what is the correlation here, and how could it be solved? Below I
> have the error and the relevant part of the solr managed_schema. I am still
> new to Solr so take into account that there could be something obvious I am
> missing.
>
> ERROR:
>
> "error":{
> "metadata":[
> "error-class","org.apache.solr.common.SolrException",
> "root-error-class","org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException",
> "error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException",
> "root-error-class","org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException"],
> "msg":"Async exception during distributed update: Error from
> server at http://solr-host:8983/solr/search_collection_xx: Bad Request
> \n\n request: http://solr-host:8983/solr/search_collection_xx \n\n
> Remote error message: Exception writing document id <document_id> to
> the index; possible analysis error: Document contains at least one
> immense term in field=\"text_field_name\" (whose UTF8 encoding is
> longer than the max length 32766), all of which were skipped. Please
> correct the analyzer to not produce such terms. The prefix of the
> first immense term is: '[.115, 97, 115, 109, 101, 45, 100, 97, 109,
> 101, 46, 99, 111, 109, 47, 108, 121, 99, 107, 97, 47, 37, 50, 50, 37,
> 50, 48, 109, 101, 116]...', original message: bytes can be at most
> 32766 in length; got 242905. Perhaps the document has an indexed
> string field (solr.StrField) which is too large",
> "code":400}
> }
>
> relevant managed_schema:
>
> <dynamicField name="text_field_*" indexed="true" stored="true"
> multiValued="true" type="case_insensitive_text" />
>
> <fieldType name="case_insensitive_text" class="solr.TextField"
> multiValued="false">
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
>
>
> Best regards,
> Marko


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org