[SERVER-12250] support arbitrary numbers of indexes Created: 03/Jan/14  Updated: 01/Aug/23

Status: Backlog
Project: Core Server
Component/s: Index Maintenance, Storage
Affects Version/s: 2.4.8, 2.5.4
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Chad Kreimendahl Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 20
Labels: indexing, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Any


Issue Links:
Duplicate
is duplicated by SERVER-21591 Support more than 64 indexes per coll... Closed
Related
is related to SERVER-1140 Support 64 (or more) indexes Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Minor Change
Participants:
Case:

 Description   

Currently NIndexesMax is set to 64 in namespace_details.h. We'd like to have this possibly be configurable or be set to a larger number such as 128 or 256.

It may sound a tad excessive, but we have a very good use case for this, especially with the new index intersections in version 2.5.5.



 Comments   
Comment by Chad Kreimendahl [ 16/Jun/20 ]

I think updates to extend the functionality of wildcards would get us exactly what we need, so I created a new ticket for it last night.

If wildcards with specifically included fields could be non-sparse, allow for multiple concurrent use, and query for not null... that would give us 100% of what we need. You could also allow patterns post "$**". We could easily set up something like this:

db.collection.createIndex(
  { "$**" : 1 },
  { "wildcardProjection" :
    { "BucketOFields.Field1.SearchSort" : 1, "BucketOFields.Field2.SearchSort" : 1 }
  }
)

The obvious drawback here is that we have to rebuild the entire index every time we add fields. The alternative option was the patters post "$**" that might look like this:

 

db.collection.createIndex({"BucketOFields.$**.SearchSort": 1});

Comment by Michael Gargiulo [ 16/Jun/20 ]

sallgeud ralf.strobel jussi@vainu.io 

Thank you for your comments, my understanding from the older comments was that wildcard indexes were sufficient for the majority of the use cases presented, but that is clearly not the case. While this wasn't addressed as part of 4.4, it does not mean it will not be addressed in the future.

I am going continue to investigate and bring these new comments to discuss with the SERVER engineering teams. The ticket will be re-opened for tracking purposes and updates will be made after I discuss further with the teams.

Comment by Jussi Kauppila [ 16/Jun/20 ]

Agree with Chad and Ralf. The problematic points are sorting + wildcard index and compound indexes.

Sometimes a small compound index with partialFilterExpression can do miracles and it might be 100kb in size.

This is just a weird limitation with mongo that other db's don't have. I guess the official stance is now that we're stuck with 64 indexes for the future.

Comment by Ralf Strobel [ 16/Jun/20 ]

I have also addressed the lack of compounding in SERVER-48570.

Comment by Chad Kreimendahl [ 16/Jun/20 ]

It doesn't appear that the wildcard index type will allow something like this "parentThing.$*.IndexableValue"  Tell me if I'm wrong in this assumption?

 

In our specific case, it appears that if we choose to go this route and just use an include filter, we could get close, but have one glaring issue: Wildcards can only be used to filter once in a query. Right now I believe it's 2 separate indexes + 1 for sorting. In theory while this potentially solves the number of indexes problem, it kills performance, as over 90% of our queries with filters have > 1 + and average of 1 sort field

Comment by Michael Gargiulo [ 15/Jun/20 ]

Per the comments and conversation above, I am closing this ticket as the support of wild card indexes alleviates the problem for most use cases reported. We will continue to monitor for similar reports and encourage any of the reporters/watchers on this ticket to submit another request if there is an issue in the future.

Comment by Chad Kreimendahl [ 14/May/20 ]

I think we can overcome that issue. As we already had a problem with startup times, that we believe are somewhat improved in 4.0+... we had started creating multiple clusters populating the same machines. So our primaries are about 1/3 the size they formerly were, but we run 1 primary on each of 3 machines, and two secondaries to link to their primaries. Since we rarely read from secondary, it worked out perfectly as we could lessen the number of open files and massively reduce boot times by essentially parallelizing everything by running multiple mongods per hardware server. Let's us scale well in that we can cram it full of NVMe drives and memory and split up the mongods until they're happy.  It's less fun to manage, but is fine, given how we update clusters.

Comment by Ralf Strobel [ 12/May/20 ]

I'm not against lifting this restriction in general, but system designers should keep in mind that Mongo still has a problem when too many file handles are open at the same time - WT-5479. For our project we have also adopted wildcard indexes and are very satisfied with them. We hardly found a need for having +20 Indexes on any collection any more, let alone 64.

Comment by Chad Kreimendahl [ 31/Jul/19 ]

So not really. We don't want to index all the sub properties. There's 8x the metadata properties as there is data we need indexed.

Ideally, being able to use either a wildcard qualifier or just simple text exclude would be great.

Even just defining "do not index any sub properties with this name" and having an array like ["More","Meta"], would be perfect.

Comment by Asya Kamsky [ 31/Jul/19 ]

You absolutely can use wildcard indexes the way you are describing - you just can't exclude paths via wildcards.

So if your structure was (for instance):

{
    BucketOFieldsIndexed: {
        Field1: {
            Value: "Some Value",
            SearchSort: "some value"
        },
        Field2: {
            Value: 1,
            SearchSort: "1",
        },
        Field3....
    },
    BucketOFieldsUnindexed: {
        Field1: {
            Meta: { Some: "Metadata we dont want to index" },
            More: { Metadata: "we don't want to index"}
        },
        Field2: {
            Meta: { Some: "Metadata we dont want to index" },
            More: { Metadata: "we don't want to index"}
        },
        Field3....
    }
}

You would be able to index "BucketOFieldsIndexed.$**" and it would pick up and index everything under there, including FieldNotYetThere - it's the fact that you have intermixed fields you do and don't want to index that's not compatible with our initial implementation.

I'll see if we already have a ticket for what you want or I'll file a new one.

Comment by Chad Kreimendahl [ 30/Jul/19 ]

That is indeed what we would need to do... create a new request, as Field1, Field2, Field3 are all dynamic and unknown whether they'll exist. Even their names are unknown on index creation.  If we indexed BucketOfFields.$** we would essentially be indexing 95% of our database, when in reality we need to index less than 10% of it (the .SearchSort and potentially the .Value)

If you're able to create that request to word it in a way that fits how it would be understood internally, I'll get a tiny army of people to vote for it.  

I presume, since our Fields list in BoF is dynamic, that you can't simply wildcardProjection dynamically?  As in, could we re-define it in-line and add 1 or 2 new fields and have it rebuild just with those, or would it have to rebuild everything (or even delete, recreate)?

Comment by Asya Kamsky [ 29/Jul/19 ]

You can specify one of the following:

"$**"  index every single thing in the document

"BucketOfFields.$**"  index every single thing under top level field BucketOfFields

First one of those can take a wildcardProjection option which is either inclusion or exclusion and specifies either a subset of fields to include or a subset of fields to exclude.

In your case a possible index that satisfies the condition you mention and the structure you show would be:

db.coll.createIndex(
  { "$**" : 1 },
  {
    "wildcardProjection" : {
       "BucketOfFields.Field1.SearchSort" : 1,
       "BucketOfFields.Field1.Value" : 1,
       "BucketOfFields.Field2.SearchSort" : 1,
       "BucketOfFields.Field2.Value" : 1,
       "BucketOfFields.Field3.SearchSort" : 1,
       "BucketOfFields.Field3.Value" : 1
    }
  }
)

or alternatively exclusion on Meta and More if that's more appropriate.

However there is no support for accepting a wildcard in the middle of an exclusion or inclusion path - am I correctly inferring that you would want to be able to exclude basically "BucketOfFields.*.Meta" and "BucketOfFields.*.More"? We would have to open a new ticket to track that new functionality.

As it is now you could only use wildcard indexes if your excluded fields lived in a separate subdocument from the fields you want to index.

Comment by Chad Kreimendahl [ 29/Jul/19 ]

Great... now let's say we have something like the following, and you wanted to index all of the BucketOFields.$**.SearchSort. Is that possible?

In the case below, we would not want to index the Meta or More fields, ever. We'd always index SearchSort (and possible Value if the above wildcard sub-select index is possible)

{
    BucketOFields: {
        Field1: {
            Meta: { Some: "Metadata we dont want to index" },
            More: { Metadata: "we don't want to index"},
            Value: "Some Value",
            SearchSort: "some value"
        },
        Field2: {
            Meta: { Some: "Metadata we dont want to index" },
            More: { Metadata: "we don't want to index"},
            Value: 1,
            SearchSort: "1",
        },
        Field3....
    }
}

Comment by Asya Kamsky [ 29/Jul/19 ]

sallgeud I hope documentation answers this already but just to have all the answers in a single place:

  1. yes the new index being somewhat logically equivalent to compound index "path to field", "value of field" automatically add new fields/paths as the documents are updated (atomically, same as any other data/index change), so no new index is created, there is a single wildcard index that's indexing all fields (new are added as they come in).
  2. a single index is created and therefore there is only a single file on disk
  3. this ticket is still open and tracking increasing or removing the limit
Comment by Jussi Kauppila [ 29/Jul/19 ]

The new wild-card indexes do help a bit. And also the Lucene text index feature looked very nice in 4.2 (Only with mongo atlas).

But still it would be preferable to also have the limit removed/increased for more control on what and how data is indexed. 

Comment by Chad Kreimendahl [ 28/Jul/19 ]

Are wildcard indexes created as a single index? That'd be fairly helpful. We do make use of several of the things listed as limitations to wildcards, but could save quite a few indexes with them.

 

A few questions:

  1. Do wildcard indexes automatically index new fields that appear with that match in a new document (first instance usage of a field in a record)?
  2. Do they only use 1 file on the system?
    1. If yes, they're limited within the same count of 64
    2. If not, how do you prevent newly created fields from going well beyond your 64 limit
  3. Can we also just get a 128 or 256 limit?
Comment by Asya Kamsky [ 24/Jul/19 ]

4.2 added support for Wildcard Indexes which will hopefully alleviate this problem for some of the use cases mentioned in this ticket. However, the 64 index limit has not been changed in 4.2.

Comment by Minh Nguyen [ 24/Jul/19 ]

@Jussi Kauppila, I see in 4.2 document (draft/upcomming version) it is still the same.

Hopefully they will add it to 4.2.

https://docs.mongodb.com/master/reference/limits/#indexes 

Comment by Jussi Kauppila [ 07/Aug/18 ]

Now that the MMAPV1 engine is deprecated in Mongo 4.0 is there any progress in increasing the amount of indexes?

Especially now that partial indexes and partialFilterExpression has been around for some time it would be very useful to support more than 64 indexes sometimes it would be handy to have many small indexes.

 

 

Comment by Asya Kamsky [ 07/Jun/18 ]

We are working on a related project to allow a single index answer query on any field in the document.
cade@hyperblaster.org what are the indexes that you create, are they mostly on single fields and as the number of fields in document grows you find your number of indexes growing because of that? Or are they all different compound indexes for different types of queries on same or similar subsets of fields?

Comment by Cade Embery [ 07/Jun/18 ]

This is something we desperately need for our application also. We are scaling fast and the index limit is crippling to our application.

We often hit OOM crashes due to unindexed queries, But we can't create any more indexes due to this 64 limit.

It is at the point of us either switching to an entirely different stack or possibly trying to fork Mongo and hack it to work for our needs. Obviously we really don't want to entertain either of those ideas, but the index limit is forcing our hand.

Comment by Asya Kamsky [ 05/Mar/18 ]

We don't currently allow any operations that won't safely work on all supported storage engines.

As milkie said, when we remove support for MMAPV1 we will be able to increase this limit.

Comment by Crimson Glory [ 16/Feb/18 ]

Maybe it would be possible a compromise. Do not let the user create more than 64 indexes under normal conditions, but in the error message, explain them about this and let the user use a force option.

Example:

 
> db.collection_name.createIndex({'asdf4': 1})
{
        "ok" : 0,
        "errmsg" : "add index fails, too many indexes for DB_something.collection_name key:{ asdf4: 1.0 }. You can force this operation, but you should only do this if you do not have MMAPv1 replica set nodes. WARNING: Doing this with a replica set with mixed storage engines will CRASH the set. Please see https://jira.mongodb.org/browse/SERVER-12250",
        "code" : 67,
        "codeName" : "CannotCreateIndex"
}

And then:

> db.collection_name.createIndex({'asdf4': 1}, {'force': true})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 64,
        "numIndexesAfter" : 65,
        "ok" : 1
}

Comment by Eric Milkie [ 16/Feb/18 ]

Sorry Chad, I'm afraid I cannot change the mmapv1 storage engine to support arbitrary numbers of indexes with very minor updates. I could only do it with a lot of work. However, I'm willing to entertain a pull request!

Comment by Chad Kreimendahl [ 16/Feb/18 ]

So? Make it a featureCompatibility thing. Allow *All *storage engines to support more indexes. The feature request isn't really specific to any storage engine. Seems like they could all support this with very minor updates. s/64/1024/g

Comment by Eric Milkie [ 16/Feb/18 ]

If that were allowed on WiredTiger, it would crash replica set nodes with mixed storage engines, with no easy ways to detect and prevent this from happening, in the code. One of the most common ways of moving from MMAPv1 to WiredTiger is to use mixed storage engine replica sets.

Comment by Crimson Glory [ 16/Feb/18 ]

Would it be possible at least to get this done on WiredTiger?

Comment by Eric Milkie [ 16/Feb/18 ]

This is a limit of the on-disk structure for the MMAPv1 storage engine; once MMAP is removed, we will change this limit.

Comment by Nate Smith [ 24/Aug/16 ]

Now with partial index support, an unlimited number of indexes per collection could be an extremely powerful tool. Now that it is possible to create an index for just a small specific use case, it is very frustrating that we are still limited by the total number of indexes.

For example, we could create account specific indexes on keys specific for certain accounts if we could have many, many indexes. Enterprise applications often want to support specific custom extensions per customer, and right now the only way to implement something like that would be to have a different DB per customer, which introduces a whole different set of limitations, tradeoffs, and operational complexity.

Comment by Eliot Horowitz (Inactive) [ 04/Jan/14 ]

This isn't a file system limit, a limit of our on disk structures for storing index meta data.

Comment by Chad Kreimendahl [ 04/Jan/14 ]

For my education, I'd love to know which file systems limit the change and if there's a scenario where we might want to compile our own with an increase, what file systems / OS may be best in support of those changes (short term) and if 128 vs 256 matters on the ones that work.

Comment by Chad Kreimendahl [ 04/Jan/14 ]

We create various collections to store dynamic information. A document in each those collections can actually be fairly large, containing a fair bit of metadata about the customers specific fields. We have documents with thousands of json/bson fields. However, we only need to index the sortable and filterable data points, which are always less than 10-20% of the total fields.

Our product allows the creation of dynamic reports and charts in which any field you choose could be used to filter out data or may be sortable, with some minor exceptions. What we've seen on slower systems (virtual machines, test systems, etc) is that, if you populate a very large set of data and try to sort and page to the end, mongo throws an exception stating it won't even try to get to those records because there's no index. This is perfectly acceptable exception as we would love to index those for performance reasons, anyway. However, in that process we discovered the 64 index limit.

Our concern is scalability of our reports.... an obvious reason why we chose Mongo. We need to potentially report on tens of millions of records in a very short time. When we have indexes, everything is fine... without, not so much.

We actually desperately need the index intersection, too... glad to see that was added in 2.5.5. Our indexes are 100% single-field, so our need for > 64 is not reduced by that feature. We were actually in need of it to allow those tens of millions of records to be quick to respond when doing many filters and sorts.

Comment by Eliot Horowitz (Inactive) [ 04/Jan/14 ]

Definitely something we want to do, but isn't as simple as changing the constant as it depends on the on disk format.

We're going to be creating a new catalog in a relatively near version, which will allow arbitrary numbers of indexes.

For my own curiosity, for most people index intersection reduces the number of indexes needed, curious why in your case you need more.

Generated at Thu Feb 08 03:28:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.