Mailing List Archive

Max Field Length
Hi

Exist a max length for a Field value?
I have problems indexing large body files.
The bottom isn't indexed.

Bye,
Ernesto.

--
Ernesto De Santis - Colaborativa.net
Córdoba 1147 Piso 6 Oficinas 3 y 4
(S2000AWO) Rosario, SF, Argentina.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Max Field Length [ In reply to ]
On May 6, 2005, at 4:42 PM, Ernesto De Santis wrote:

> Hi
>
> Exist a max length for a Field value?
> I have problems indexing large body files.
> The bottom isn't indexed.
>
> Bye,
> Ernesto.
>
> --
> Ernesto De Santis - Colaborativa.net
> Córdoba 1147 Piso 6 Oficinas 3 y 4
> (S2000AWO) Rosario, SF, Argentina.
>

After you create your IndexWriter, do the following:

writer.maxFieldLength = Integer.MAX_VALUE;

Substitute you own limit if you don't want it (effectively)
unlimited. The default value is 10,000 terms.


--
Bill Tschumy
Otherwise -- Austin, TX
http://www.otherwise.com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Max Field Length [ In reply to ]
Hi;

I think by default only 10,000 terms will be indexed for a field.

You can change this using the maxFieldLength method of IndexWriter.

Luke

----- Original Message -----
From: "Ernesto De Santis" <ernesto.desantis@colaborativa.net>
To: "Lucene Users List" <java-user@lucene.apache.org>
Sent: Friday, May 06, 2005 5:42 PM
Subject: Max Field Length


> Hi
>
> Exist a max length for a Field value?
> I have problems indexing large body files.
> The bottom isn't indexed.
>
> Bye,
> Ernesto.
>
> --
> Ernesto De Santis - Colaborativa.net
> Córdoba 1147 Piso 6 Oficinas 3 y 4
> (S2000AWO) Rosario, SF, Argentina.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Max Field Length [ In reply to ]
Hi Scott,

There is no way to lift this limit. The assumption is that a user would
never type a 32kB keyword in a search bar, so indexing such long keywords
is wasteful. Some tokenizers like StandardTokenizer can be configured to
limit the length of the tokens that they produce, there is also a
LengthFilter that can be appended to the analysis chain to filter out
tokens that exceed the maximum term length.

I would note that modifying the source code is going to require more than
bumping the hardcoded limit as we rely on this limit in a few places, e.g.
ByteBlockPool.

On Fri, Sep 23, 2022 at 12:59 AM Scott Guthery <sguthery@gmail.com> wrote:

> Lucene 9.3 seems to have a (post-Analyzer) maximum field length of 32767.
> Is there a way of increasing this without resorting to the source code?
>
> Thanks for any guidance.
>
> Cheers, Scott
>


--
Adrien
Re: Max Field Length [ In reply to ]
Thanks much, Adrian. I hadn't realized that the size limit was on one
token in the text as opposed to being a limit on the length of the entire
text field. I'm loading patents, so I suspect that the very long word is a
DNA sequence.

Thanks also for your guidance with regard to setting maximums.

Cheers, Scott

>
>
Re: Max Field Length [ In reply to ]
I wonder if it would make sense to provide a TruncationFilter in
addition to the LengthFilter. That way long tokens in source text
could be better supported, albeit with some confusion if they share
the same very long prefix...

On Fri, Sep 23, 2022 at 9:56 AM Scott Guthery <sguthery@gmail.com> wrote:
>
> Thanks much, Adrian. I hadn't realized that the size limit was on one
> token in the text as opposed to being a limit on the length of the entire
> text field. I'm loading patents, so I suspect that the very long word is a
> DNA sequence.
>
> Thanks also for your guidance with regard to setting maximums.
>
> Cheers, Scott
>
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Max Field Length [ In reply to ]
We have a TruncateTokenFilter in lucene/analysis/common. :)

On Fri, Sep 23, 2022 at 4:39 PM Michael Sokolov <msokolov@gmail.com> wrote:

> I wonder if it would make sense to provide a TruncationFilter in
> addition to the LengthFilter. That way long tokens in source text
> could be better supported, albeit with some confusion if they share
> the same very long prefix...
>
> On Fri, Sep 23, 2022 at 9:56 AM Scott Guthery <sguthery@gmail.com> wrote:
> >
> > Thanks much, Adrian. I hadn't realized that the size limit was on one
> > token in the text as opposed to being a limit on the length of the entire
> > text field. I'm loading patents, so I suspect that the very long word
> is a
> > DNA sequence.
> >
> > Thanks also for your guidance with regard to setting maximums.
> >
> > Cheers, Scott
> >
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--
Adrien
Re: Max Field Length [ In reply to ]
ooh

On Fri, Sep 23, 2022 at 11:02 AM Adrien Grand <jpountz@gmail.com> wrote:
>
> We have a TruncateTokenFilter in lucene/analysis/common. :)
>
> On Fri, Sep 23, 2022 at 4:39 PM Michael Sokolov <msokolov@gmail.com> wrote:
>
> > I wonder if it would make sense to provide a TruncationFilter in
> > addition to the LengthFilter. That way long tokens in source text
> > could be better supported, albeit with some confusion if they share
> > the same very long prefix...
> >
> > On Fri, Sep 23, 2022 at 9:56 AM Scott Guthery <sguthery@gmail.com> wrote:
> > >
> > > Thanks much, Adrian. I hadn't realized that the size limit was on one
> > > token in the text as opposed to being a limit on the length of the entire
> > > text field. I'm loading patents, so I suspect that the very long word
> > is a
> > > DNA sequence.
> > >
> > > Thanks also for your guidance with regard to setting maximums.
> > >
> > > Cheers, Scott
> > >
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> --
> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org