[SERVER-42237] Allow shard key to be prefix of multikey index Created: 15/Jul/19 Updated: 06/Dec/22 Resolved: 28/Jun/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Renato Riccio | Assignee: | [DO NOT USE] Backlog - Sharding EMEA |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||
| Sprint: | Sharding 2019-08-26 | ||||||||
| Participants: | |||||||||
| Description |
|
Currently is not possible to use as shard key a prefix of a multikey index. For this reason, even if we have as shard key {a:1} and we have already the index {a:1, b:1} that is considered multikey ONLY because the field b is an array, we must have an index on {a:1} or a non multikey index where a is a prefix. Allowing to use {a:1, b:1} for sharding will allow to save RAM, space and write overhead. |
| Comments |
| Comment by Pierlauro Sciarelli [ 28/Jun/22 ] |
|
Thanks to max.hirschhorn@mongodb.com for pointing out that the query subsystem is responsible for deduplicate by RecordId when the IndexScan is scanning over a multikey index. The above mentioned bug does not exist, closing the ticket |
| Comment by Pierlauro Sciarelli [ 07/Jun/22 ] |
|
Reopening the ticket because it is not allowed to shard a collection with only multikey indexes, however there is this faulty flow:
(check usages of requireSingleKey) A shard key index scan has to skip over these extra index entries, and de-duplicate them to ensure that each relevant document isn't handled repeatedly Provided that this observation is still true, that would mean that we have a bug since we are not handling this case in the code. |
| Comment by Ratika Gandhi [ 05/Sep/19 ] |
|
Good idea but other higher priority items on the backlog to tackle. |
| Comment by Kaloian Manassiev [ 02/Aug/19 ] |
|
To be discussed at the next eng/product sync meeting. |
| Comment by Kevin Pulo [ 30/Jul/19 ] |
|
Historically, this restriction is because in older versions the entire index was marked multikey, and it wasn't possible to tell if that was because of field a or b. I believe that is no longer the case in current versions, so this should be possible. To expand on this a little, there are two issues here.
Another aspect is that it's currently possible to get into this state by sharding with a non-multikey A + B index, which then later becomes multikey on B. The inverse situation of sharding with an A + (multikey)B index is not possible, despite being equivalent. Similarly, it's possible to shard with an A index and A + (multikey)B, and then drop the A index. The issue here is that these combinations of factors, mixed in other relevant factors like the rest of the query workload, index/data/storage sizes, etc, means that it may be hard to know the exact set of circumstances where having the A + (multikey)B index would be an overall win, compared to having both the A + (multikey)B and the solo A index (and vice-versa where it would be an overall loss). Which doesn't necessarily mean we shouldn't do it, I just want to make sure we're aware of the issues here. |