Mailing List Archive

Add maxFields Option to IndexWriter
Hi All,

I work on Lucene at MongoDB.

I would like to limit the amount of fields in an index to prevent tenants
from causing a mapping explosion.

Since IndexWriter.getFieldNames has been deprecated
<https://issues.apache.org/jira/browse/LUCENE-8909>, there is no way to do
this without using a reader (which comes with a set of problems regarding
flush/commit rates).

Would love to add to Lucene the ability to have IndexWriters limiting the
number of fields. Curious to hear your thoughts.

Thanks,
Oren
Re: Add maxFields Option to IndexWriter [ In reply to ]
I don't like the idea of IndexWriter limiting field names, but I do like
the idea of un-deprecating that method, which appeared to have a trivial
implementation. Try commenting on the issue of it's deprecations, which
has various watchers to get their attention.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia <oren.ovadia@mongodb.com.invalid>
wrote:

> Hi All,
>
> I work on Lucene at MongoDB.
>
> I would like to limit the amount of fields in an index to prevent tenants
> from causing a mapping explosion.
>
> Since IndexWriter.getFieldNames has been deprecated
> <https://issues.apache.org/jira/browse/LUCENE-8909>, there is no way to
> do this without using a reader (which comes with a set of problems
> regarding flush/commit rates).
>
> Would love to add to Lucene the ability to have IndexWriters limiting the
> number of fields. Curious to hear your thoughts.
>
> Thanks,
> Oren
>
>
Re: Add maxFields Option to IndexWriter [ In reply to ]
I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end.
I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.

WDYT

Sent from a mobile device

> On 14. Jan 2021, at 06:36, David Smiley <dsmiley@apache.org> wrote:
>
> ?
> I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation. Try commenting on the issue of it's deprecations, which has various watchers to get their attention.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
>> On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia <oren.ovadia@mongodb.com.invalid> wrote:
>> Hi All,
>>
>> I work on Lucene at MongoDB.
>>
>> I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion.
>>
>> Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).
>>
>> Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.
>>
>> Thanks,
>> Oren
>>
Re: Add maxFields Option to IndexWriter [ In reply to ]
I like Oren's idea and Simon's proposal of unlimited by default but
configurable.
Marcus

On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer <simon.willnauer@gmail.com>
wrote:

> I personally have pretty positive experience with what I call softlimits.
> At elastic we use them all over the place to catch issues when a user
> likely misconfigures something or if there is likely a issue on the users
> end.
> I think having an option on the IW that allows to limit the fieldnumbers.
> We can even extract a general limits object with total num docs etc. if we
> want. We can still set stuff to unlimited by default.
>
> WDYT
>
> Sent from a mobile device
>
> On 14. Jan 2021, at 06:36, David Smiley <dsmiley@apache.org> wrote:
>
> ?
> I don't like the idea of IndexWriter limiting field names, but I do like
> the idea of un-deprecating that method, which appeared to have a trivial
> implementation. Try commenting on the issue of it's deprecations, which
> has various watchers to get their attention.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia
> <oren.ovadia@mongodb.com.invalid> wrote:
>
>> Hi All,
>>
>> I work on Lucene at MongoDB.
>>
>> I would like to limit the amount of fields in an index to prevent tenants
>> from causing a mapping explosion.
>>
>> Since IndexWriter.getFieldNames has been deprecated
>> <https://issues.apache.org/jira/browse/LUCENE-8909>, there is no way to
>> do this without using a reader (which comes with a set of problems
>> regarding flush/commit rates).
>>
>> Would love to add to Lucene the ability to have IndexWriters limiting the
>> number of fields. Curious to hear your thoughts.
>>
>> Thanks,
>> Oren
>>
>>

--
Marcus Eagan
Re: Add maxFields Option to IndexWriter [ In reply to ]
I think it makes sense to un-deprecate that API (why did we deprecate it?),
but I'm not sure IW should be in the business of soft/hard limits on field
count?

I agree such limits make sense if the integrity of the index is at risk,
e.g. IW does enforce a max number of unique documents in one index.

But for number of fields, as long as we expose the API, then the layer
above Lucene can handle soft/hard limits, notifying the user correctly,
rejecting updates, etc.?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 14, 2021 at 5:36 PM Marcus Eagan <marcuseagan@gmail.com> wrote:

> I like Oren's idea and Simon's proposal of unlimited by default but
> configurable.
> Marcus
>
> On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer <
> simon.willnauer@gmail.com> wrote:
>
>> I personally have pretty positive experience with what I call softlimits.
>> At elastic we use them all over the place to catch issues when a user
>> likely misconfigures something or if there is likely a issue on the users
>> end.
>> I think having an option on the IW that allows to limit the fieldnumbers.
>> We can even extract a general limits object with total num docs etc. if we
>> want. We can still set stuff to unlimited by default.
>>
>> WDYT
>>
>> Sent from a mobile device
>>
>> On 14. Jan 2021, at 06:36, David Smiley <dsmiley@apache.org> wrote:
>>
>> ?
>> I don't like the idea of IndexWriter limiting field names, but I do like
>> the idea of un-deprecating that method, which appeared to have a trivial
>> implementation. Try commenting on the issue of it's deprecations, which
>> has various watchers to get their attention.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia
>> <oren.ovadia@mongodb.com.invalid> wrote:
>>
>>> Hi All,
>>>
>>> I work on Lucene at MongoDB.
>>>
>>> I would like to limit the amount of fields in an index to prevent
>>> tenants from causing a mapping explosion.
>>>
>>> Since IndexWriter.getFieldNames has been deprecated
>>> <https://issues.apache.org/jira/browse/LUCENE-8909>, there is no way to
>>> do this without using a reader (which comes with a set of problems
>>> regarding flush/commit rates).
>>>
>>> Would love to add to Lucene the ability to have IndexWriters limiting
>>> the number of fields. Curious to hear your thoughts.
>>>
>>> Thanks,
>>> Oren
>>>
>>>
>
> --
> Marcus Eagan
>
>
Re: Add maxFields Option to IndexWriter [ In reply to ]
Thanks for the responses and advice.

Un-deprecating sounds great, it solves our issue and gives us the
flexibility to choose different strategies to deal with it (soft/hard
limits etc.).
Created LUCENE-9680 <https://issues.apache.org/jira/browse/LUCENE-9680> to
track this, I'll have a patch ready by the beginning of next week.

Best,
Oren

P.S: getFieldNames was deprecated after SOLR-12368
<https://issues.apache.org/jira/browse/SOLR-12368> made in-place DV updates
easier for fields that didn't exist.

On Tue, Jan 19, 2021 at 7:42 AM Michael McCandless <
lucene@mikemccandless.com> wrote:

> I think it makes sense to un-deprecate that API (why did we deprecate
> it?), but I'm not sure IW should be in the business of soft/hard limits on
> field count?
>
> I agree such limits make sense if the integrity of the index is at risk,
> e.g. IW does enforce a max number of unique documents in one index.
>
> But for number of fields, as long as we expose the API, then the layer
> above Lucene can handle soft/hard limits, notifying the user correctly,
> rejecting updates, etc.?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Jan 14, 2021 at 5:36 PM Marcus Eagan <marcuseagan@gmail.com>
> wrote:
>
>> I like Oren's idea and Simon's proposal of unlimited by default but
>> configurable.
>> Marcus
>>
>> On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer <
>> simon.willnauer@gmail.com> wrote:
>>
>>> I personally have pretty positive experience with what I call
>>> softlimits. At elastic we use them all over the place to catch issues when
>>> a user likely misconfigures something or if there is likely a issue on the
>>> users end.
>>> I think having an option on the IW that allows to limit the
>>> fieldnumbers. We can even extract a general limits object with total num
>>> docs etc. if we want. We can still set stuff to unlimited by default.
>>>
>>> WDYT
>>>
>>> Sent from a mobile device
>>>
>>> On 14. Jan 2021, at 06:36, David Smiley <dsmiley@apache.org> wrote:
>>>
>>> ?
>>> I don't like the idea of IndexWriter limiting field names, but I do like
>>> the idea of un-deprecating that method, which appeared to have a trivial
>>> implementation. Try commenting on the issue of it's deprecations, which
>>> has various watchers to get their attention.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia
>>> <oren.ovadia@mongodb.com.invalid> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I work on Lucene at MongoDB.
>>>>
>>>> I would like to limit the amount of fields in an index to prevent
>>>> tenants from causing a mapping explosion.
>>>>
>>>> Since IndexWriter.getFieldNames has been deprecated
>>>> <https://issues.apache.org/jira/browse/LUCENE-8909>, there is no way
>>>> to do this without using a reader (which comes with a set of problems
>>>> regarding flush/commit rates).
>>>>
>>>> Would love to add to Lucene the ability to have IndexWriters limiting
>>>> the number of fields. Curious to hear your thoughts.
>>>>
>>>> Thanks,
>>>> Oren
>>>>
>>>>
>>
>> --
>> Marcus Eagan
>>
>>
Re: Add maxFields Option to IndexWriter [ In reply to ]
I have a PR ready here: https://github.com/apache/lucene-solr/pull/2231
Thanks in advance for taking a look.

Is anyone game to help me back port this to the upcoming minor version in
8.7?

Thank you,
Oren


On Tue, Jan 19, 2021 at 5:56 PM Oren Ovadia <oren.ovadia@mongodb.com> wrote:

> Thanks for the responses and advice.
>
> Un-deprecating sounds great, it solves our issue and gives us the
> flexibility to choose different strategies to deal with it (soft/hard
> limits etc.).
> Created LUCENE-9680 <https://issues.apache.org/jira/browse/LUCENE-9680> to
> track this, I'll have a patch ready by the beginning of next week.
>
> Best,
> Oren
>
> P.S: getFieldNames was deprecated after SOLR-12368
> <https://issues.apache.org/jira/browse/SOLR-12368> made in-place DV
> updates easier for fields that didn't exist.
>
> On Tue, Jan 19, 2021 at 7:42 AM Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> I think it makes sense to un-deprecate that API (why did we deprecate
>> it?), but I'm not sure IW should be in the business of soft/hard limits on
>> field count?
>>
>> I agree such limits make sense if the integrity of the index is at risk,
>> e.g. IW does enforce a max number of unique documents in one index.
>>
>> But for number of fields, as long as we expose the API, then the layer
>> above Lucene can handle soft/hard limits, notifying the user correctly,
>> rejecting updates, etc.?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Jan 14, 2021 at 5:36 PM Marcus Eagan <marcuseagan@gmail.com>
>> wrote:
>>
>>> I like Oren's idea and Simon's proposal of unlimited by default but
>>> configurable.
>>> Marcus
>>>
>>> On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer <
>>> simon.willnauer@gmail.com> wrote:
>>>
>>>> I personally have pretty positive experience with what I call
>>>> softlimits. At elastic we use them all over the place to catch issues when
>>>> a user likely misconfigures something or if there is likely a issue on the
>>>> users end.
>>>> I think having an option on the IW that allows to limit the
>>>> fieldnumbers. We can even extract a general limits object with total num
>>>> docs etc. if we want. We can still set stuff to unlimited by default.
>>>>
>>>> WDYT
>>>>
>>>> Sent from a mobile device
>>>>
>>>> On 14. Jan 2021, at 06:36, David Smiley <dsmiley@apache.org> wrote:
>>>>
>>>> ?
>>>> I don't like the idea of IndexWriter limiting field names, but I do
>>>> like the idea of un-deprecating that method, which appeared to have a
>>>> trivial implementation. Try commenting on the issue of it's deprecations,
>>>> which has various watchers to get their attention.
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia
>>>> <oren.ovadia@mongodb.com.invalid> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I work on Lucene at MongoDB.
>>>>>
>>>>> I would like to limit the amount of fields in an index to prevent
>>>>> tenants from causing a mapping explosion.
>>>>>
>>>>> Since IndexWriter.getFieldNames has been deprecated
>>>>> <https://issues.apache.org/jira/browse/LUCENE-8909>, there is no way
>>>>> to do this without using a reader (which comes with a set of problems
>>>>> regarding flush/commit rates).
>>>>>
>>>>> Would love to add to Lucene the ability to have IndexWriters limiting
>>>>> the number of fields. Curious to hear your thoughts.
>>>>>
>>>>> Thanks,
>>>>> Oren
>>>>>
>>>>>
>>>
>>> --
>>> Marcus Eagan
>>>
>>>
Re: Add maxFields Option to IndexWriter [ In reply to ]
Bump for a review on this small PR:
https://github.com/apache/lucene-solr/pull/2231

Best,
Oren

On Thu, Jan 21, 2021 at 5:57 PM Oren Ovadia <oren.ovadia@mongodb.com> wrote:

> I have a PR ready here: https://github.com/apache/lucene-solr/pull/2231
> Thanks in advance for taking a look.
>
> Is anyone game to help me back port this to the upcoming minor version in
> 8.7?
>
> Thank you,
> Oren
>
>
> On Tue, Jan 19, 2021 at 5:56 PM Oren Ovadia <oren.ovadia@mongodb.com>
> wrote:
>
>> Thanks for the responses and advice.
>>
>> Un-deprecating sounds great, it solves our issue and gives us the
>> flexibility to choose different strategies to deal with it (soft/hard
>> limits etc.).
>> Created LUCENE-9680 <https://issues.apache.org/jira/browse/LUCENE-9680> to
>> track this, I'll have a patch ready by the beginning of next week.
>>
>> Best,
>> Oren
>>
>> P.S: getFieldNames was deprecated after SOLR-12368
>> <https://issues.apache.org/jira/browse/SOLR-12368> made in-place DV
>> updates easier for fields that didn't exist.
>>
>> On Tue, Jan 19, 2021 at 7:42 AM Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> I think it makes sense to un-deprecate that API (why did we deprecate
>>> it?), but I'm not sure IW should be in the business of soft/hard limits on
>>> field count?
>>>
>>> I agree such limits make sense if the integrity of the index is at risk,
>>> e.g. IW does enforce a max number of unique documents in one index.
>>>
>>> But for number of fields, as long as we expose the API, then the layer
>>> above Lucene can handle soft/hard limits, notifying the user correctly,
>>> rejecting updates, etc.?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Thu, Jan 14, 2021 at 5:36 PM Marcus Eagan <marcuseagan@gmail.com>
>>> wrote:
>>>
>>>> I like Oren's idea and Simon's proposal of unlimited by default but
>>>> configurable.
>>>> Marcus
>>>>
>>>> On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer <
>>>> simon.willnauer@gmail.com> wrote:
>>>>
>>>>> I personally have pretty positive experience with what I call
>>>>> softlimits. At elastic we use them all over the place to catch issues when
>>>>> a user likely misconfigures something or if there is likely a issue on the
>>>>> users end.
>>>>> I think having an option on the IW that allows to limit the
>>>>> fieldnumbers. We can even extract a general limits object with total num
>>>>> docs etc. if we want. We can still set stuff to unlimited by default.
>>>>>
>>>>> WDYT
>>>>>
>>>>> Sent from a mobile device
>>>>>
>>>>> On 14. Jan 2021, at 06:36, David Smiley <dsmiley@apache.org> wrote:
>>>>>
>>>>> ?
>>>>> I don't like the idea of IndexWriter limiting field names, but I do
>>>>> like the idea of un-deprecating that method, which appeared to have a
>>>>> trivial implementation. Try commenting on the issue of it's deprecations,
>>>>> which has various watchers to get their attention.
>>>>>
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>
>>>>>
>>>>> On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia
>>>>> <oren.ovadia@mongodb.com.invalid> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I work on Lucene at MongoDB.
>>>>>>
>>>>>> I would like to limit the amount of fields in an index to prevent
>>>>>> tenants from causing a mapping explosion.
>>>>>>
>>>>>> Since IndexWriter.getFieldNames has been deprecated
>>>>>> <https://issues.apache.org/jira/browse/LUCENE-8909>, there is no way
>>>>>> to do this without using a reader (which comes with a set of problems
>>>>>> regarding flush/commit rates).
>>>>>>
>>>>>> Would love to add to Lucene the ability to have IndexWriters limiting
>>>>>> the number of fields. Curious to hear your thoughts.
>>>>>>
>>>>>> Thanks,
>>>>>> Oren
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Marcus Eagan
>>>>
>>>>