[SERVER-14772] Create a way to add an empty sparse index, skipping iteration of the collection Created: 02/Aug/14  Updated: 06/Dec/22  Resolved: 22/Feb/18

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Vincent Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-2366 Speed up creation of (don't populate)... Closed
Assigned Teams:
Query
Participants:

 Description   

I'm currently adding some sparse indexes to big collections for which I'm certain the indexes matches no document.
Creation of the index takes forever (because it iterates over every single document of the collections, blocking any operation meanwhile) while it could take just a few ms.

Maybe something like {{

{ sparse: true,skipSparseIndexCreation: true }

}}
Also, this is probably trivial to do.



 Comments   
Comment by Vincent [ 06/Aug/14 ]

Well, those features might never come up (or at least not in the foreseeable future... they were reported 4 years/2 years ago and are still unscheduled), so it would be nice to provide users this at their own risks.
To give you my honest opinion on MongoDB, not really related on this topic: MongoDB is soooo slowwww as soon as it has to deal with anything on big collections, when, I believe, a very limited number of features could significantly improve this speed (heard about TokuMX? Divide disk usage by 10 => you can now get SSD in your servers ; Tokenizing field names would reduce by 2 or so disk/memory usage, etc.). You have a new CEO, I hope he will focus efforts on this, because that's what people want and that's why people use NoSQL databases mainly: dealing with big data.

Comment by Thomas Rueckstiess [ 06/Aug/14 ]

Hi Vincent,

We can't add features to the product that would in many cases lead to inconsistent replica set members, even if properly documented. This would also make diagnosing indexing issues very hard, as one couldn't tell which documents should be in the index and which ones shouldn't.

It's unlikely that we're going to implement the feature as proposed at this stage. The situation would change once MongoDB supports some form of schema validation (see SERVER-3536), at which point we may be looking at shortcuts to create sparse indexes on non-existing fields.

Related to the issue you describe, we are going to focus on improving performance and reducing impact of regular index builds, see for example SERVER-676 or SERVER-6883.

If background indexes are not an option for you then you may want to build indexes in a rolling fashion on your replica set to maintain availability, as described in Build Indexes on Replica Sets.

Regards,
Thomas

Comment by Vincent [ 04/Aug/14 ]

Hi Thomas,

I'm aware of this potential risk, but the users using that command should understand them and deal with them. In my case I was 100% sure that no document contained the indexed (new) field. And probably other users faces this issue.
The solution might be to make this command "private", ie "_skipSparseIndexCreation" or so, documented with proper warnings, so only users with enough knowledge would use it.
I don't like background indexes because AFAIK they take (even) more (memory) space and more time to build/less efficient.

Comment by Thomas Rueckstiess [ 04/Aug/14 ]

Hi Vincent,

This could lead to an incomplete index containing only a subset of the documents that it should index. Additionally, secondary nodes that resync from scratch may build the index at a different time and contain a different set of documents. Depending on which node you query or which node is primary, you would get different results. This is not a desirable property of replica sets.

Have you tried building the index in the background? This way the impact to the system should be less severe and non-blocking.

Regards,
Thomas

Generated at Thu Feb 08 03:35:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.