[SERVER-10403] sparse compound index should really be sparse Created: 01/Aug/13  Updated: 26/Sep/17  Resolved: 25/Sep/15

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Antoine Girbal Assignee: Unassigned
Resolution: Duplicate Votes: 12
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-2193 Sparse indexes only support a single ... Closed
Related
is related to SERVER-13780 sparsePolicy for sparse compound indexes Closed
is related to SERVER-3981 Creating a compound sparse index shou... Closed
Backwards Compatibility: Fully Compatible
Participants:
Case:

 Description   

I've always believed from documentation that only single field sparse index were supported.
Doing quick testing, it turns out:

  • compound sparse index creates fine
  • the behavior is that it is only sparse if all fields are non existent

I feel like the behavior is not useful at all: it's rare that all fields are absent and thus it mostly behaves like a non-sparse.
The more common use case is as follows:

  • a sparse field tells if doc should be included in index
  • compound on other fields to do further matching an sorting, like time etc

Say I'm working on reconciliation software: 99% of my records are matched and 1% non matched.
I need to do fast operations on non-matched items that may include complex matching and sorting.
Right now if I want to compound the index, all 100% records will be in index tree.
Ideally I just have a sparse flag "unmatched: true" and can compound freely

{ unmatched:1, customerId: 1, date: 1 }

and obtain an index that's 99% smaller.
Today the only solution is to somehow encode information into the "unmatched" field.



 Comments   
Comment by Asya Kamsky [ 25/Sep/15 ]

Functionality covered in SERVER-785

Comment by Johnny Shields [ 10/Jul/14 ]

I like the proposal in this issue the best. I don't think there needs to be a separate SparsePolicy / StrictSparse option. Sparse indices on compound keys are essentially useless at the moment.

Comment by Kevin Locke [ 16/May/14 ]

When searching the bug tracker I noticed SERVER-13780 also discusses this issue, as I understand it, with a proposed implementation. Anyone interested may want to weigh in there as well.

Perhaps the issues should be merged or marked as related?

Comment by Antoine Girbal [ 13/Aug/13 ]

renctan In your output as soon as either x or y is specified, it will make it into the index.
That shows the current behavior and it is really non-sparse as soon as at least 1 key is specified, which is usually the case.
Hence it makes the sparse option not really useful on a compound.

Ideally if any key is non-existent, record would not make it to sparse index.
Or as an alternative, if the 1st key is non-existent, record would not make it to sparse index.

I understand we are unlikely to change this because it would probably be backward breaking, but really it would make the sparse index much more powerful as laid out in my description.
Right now to obtain this behavior, one must set a whole sparse of optional fields and unset them all when done.
Maybe we could (sadly) use a new option "strictSparse": true

Comment by Randolph Tan [ 13/Aug/13 ]

antoine I just tried this in the mongo shell on the master branch. Does this behavior match the desired behavior?

> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.user", "name" : "_id_" }
{ "v" : 1, "key" : { "x" : 1, "y" : 1 }, "ns" : "test.user", "name" : "x_1_y_1", "sparse" : true }
{ "v" : 1, "key" : { "y" : 1, "x" : 1 }, "ns" : "test.user", "name" : "y_1_x_1", "sparse" : true }
 
> db.user.find().hint("_id_")
{ "_id" : ObjectId("520a464ab4a75b62fa81c1b8"), "h" : 3 }
{ "_id" : ObjectId("520a4650b4a75b62fa81c1b9"), "x" : 3 }
{ "_id" : ObjectId("520a4654b4a75b62fa81c1ba"), "y" : 4 }
 
> db.user.find().hint("x_1_y_1")
{ "_id" : ObjectId("520a4654b4a75b62fa81c1ba"), "y" : 4 }
{ "_id" : ObjectId("520a4650b4a75b62fa81c1b9"), "x" : 3 }
 
> db.user.find().hint("y_1_x_1")
{ "_id" : ObjectId("520a4650b4a75b62fa81c1b9"), "x" : 3 }
{ "_id" : ObjectId("520a4654b4a75b62fa81c1ba"), "y" : 4 }

Generated at Thu Feb 08 03:23:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.