[SERVER-40483] Changing the shard key could lead to DuplicateKeyError on _id with orphan documents Created: 04/Apr/19  Updated: 29/Oct/23  Resolved: 18/Apr/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.1.11

Type: Bug Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: Janna Golden
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-40815 Updating the shard key can conflict w... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2019-04-22
Participants:

 Description   

If we change a document's shard key such that the document will have to change shards, we could end up with a duplicate key error on _id due to an orphaned version of that document existing on that shard. Other legitimate DuplicateKeyErrors could occur (for example, if there's a unique index on the shard key), in which case we'll throw an ordinary DuplicateKeyError. This ticket only addresses _id conflicts.

Consider the following scenario:
1) A document x is migrated from shard A to shard B. Suppose the RangeDeleter does not run yet, and the orphaned document x remains on shard A.
2) An update is issued to document x (residing on shard B) such that it requires moving that document back to shard A. The update operation is converted into a delete from shard B and an insert into shard A.
3) The insert operation into shard A fails with a duplicate key error on _id, because the orphaned version of x still exists on shard A.

We should make sure this case leads to an error message that's more meaningful to the user than DuplicateKeyError (something indicated it's related to orphaned documents), and perhaps with a link to documentation.



 Comments   
Comment by Githook User [ 17/Apr/19 ]

Author:

{'email': 'golden.janna@gmail.com', 'name': 'jannaerin', 'username': 'jannaerin'}

Message: SERVER-40483 Return more informative error when changing the doc shard key caues DuplicateKey error on _id
Branch: master
https://github.com/mongodb/mongo/commit/d7fb557f6fc6d486fa7107a8f64342caf552eeb4

Comment by Matthew Saltz (Inactive) [ 09/Apr/19 ]

Updated to specify that the ticket only focuses on _id index uniqueness conflicts

Comment by Andy Schwerin [ 09/Apr/19 ]

Per offline discussion, I think this ticket is intended to focus only on _id index uniqueness conflicts. matthew.saltz has agreed to review the description and update it if appropriate.

Comment by Matthew Saltz (Inactive) [ 08/Apr/19 ]

As a historical note for the ticket, since I think this is the situation you're already aware of and referring to: It's possible, if the client for some reason does not enforce global uniqueness of _id across shards, that we could end up with this error occurring even for non-orphaned documents.

I think we could know when it's caused by an orphaned document by checking the routing table whenever DuplicateKeyError is thrown to check whether the document was owned by this shard, but I don't think it's completely straightforward (mostly for code arrangement reasons and where different state is tracked) . Based on our discussion the other day, I thought we concluded that it was okay to end up reporting an error in this situation given that generally speaking we assume _id is globally unique, even though it's not enforced.

We can make the error message say "either related to orphaned documents or due to _id not being globally unique" - that may be clearer. What do you think?

Comment by Andy Schwerin [ 06/Apr/19 ]

Can we definitively know when it’s caused by orphans?

Generated at Thu Feb 08 04:55:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.