[SERVER-40483] Changing the shard key could lead to DuplicateKeyError on _id with orphan documents Created: 04/Apr/19 Updated: 29/Oct/23 Resolved: 18/Apr/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.11 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Saltz (Inactive) | Assignee: | Janna Golden |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Sharding 2019-04-22 | ||||||||
| Participants: | |||||||||
| Description |
|
If we change a document's shard key such that the document will have to change shards, we could end up with a duplicate key error on _id due to an orphaned version of that document existing on that shard. Other legitimate DuplicateKeyErrors could occur (for example, if there's a unique index on the shard key), in which case we'll throw an ordinary DuplicateKeyError. This ticket only addresses _id conflicts. Consider the following scenario: We should make sure this case leads to an error message that's more meaningful to the user than DuplicateKeyError (something indicated it's related to orphaned documents), and perhaps with a link to documentation. |
| Comments |
| Comment by Githook User [ 17/Apr/19 ] |
|
Author: {'email': 'golden.janna@gmail.com', 'name': 'jannaerin', 'username': 'jannaerin'}Message: |
| Comment by Matthew Saltz (Inactive) [ 09/Apr/19 ] |
|
Updated to specify that the ticket only focuses on _id index uniqueness conflicts |
| Comment by Andy Schwerin [ 09/Apr/19 ] |
|
Per offline discussion, I think this ticket is intended to focus only on _id index uniqueness conflicts. matthew.saltz has agreed to review the description and update it if appropriate. |
| Comment by Matthew Saltz (Inactive) [ 08/Apr/19 ] |
|
As a historical note for the ticket, since I think this is the situation you're already aware of and referring to: It's possible, if the client for some reason does not enforce global uniqueness of _id across shards, that we could end up with this error occurring even for non-orphaned documents. I think we could know when it's caused by an orphaned document by checking the routing table whenever DuplicateKeyError is thrown to check whether the document was owned by this shard, but I don't think it's completely straightforward (mostly for code arrangement reasons and where different state is tracked) . Based on our discussion the other day, I thought we concluded that it was okay to end up reporting an error in this situation given that generally speaking we assume _id is globally unique, even though it's not enforced. We can make the error message say "either related to orphaned documents or due to _id not being globally unique" - that may be clearer. What do you think? |
| Comment by Andy Schwerin [ 06/Apr/19 ] |
|
Can we definitively know when it’s caused by orphans? |