[SERVER-6835] Sharding edge case upserts documents with multiple _id attributes Created: 23/Aug/12  Updated: 11/Jul/16  Resolved: 14/Nov/13

Status: Closed
Project: Core Server
Component/s: Sharding, Shell, Usability, Write Ops
Affects Version/s: 2.0.7
Fix Version/s: 2.5.4

Type: Bug Priority: Major - P3
Reporter: Y. Wayne Huang Assignee: Scott Hernandez (Inactive)
Resolution: Done Votes: 1
Labels: logic, schema, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ubuntu 10.04lts on x86_64; sharded configuration where each shard is a replicaset


Issue Links:
Depends
depends on SERVER-4830 Reject upsert if would create duplica... Closed
Related
related to SERVER-7379 Immutable shardkey becomes mutable Closed
is related to SERVER-5710 Update may result in doc without shar... Closed
is related to SERVER-9074 Upsert fails with "cannot modify shar... Closed
is related to SERVER-11363 Update call causes mongod to crash Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

we have a large collection with a shard key on _id.c, example

{_id : {"a":"foo","b":1234,"c":3456}

. When performing upserts we found that passing a query of _id alone resulted in the query being broadcast. to address this waste of work (waste for all but 1 shard), we changed the query of the upsert to

{"_id":{....},"_id.c":3456\}

. this did properly route the upsert to the appropriate shard. it appears that with this configuration, if a document already exists based on _id, the update works fine. when the document needs to be created, mongodb mistakenly injects two _id attributes into the new document. we will see exactly what was provided in the query:

{"_id":{"a":"foo","b":1234,"c":3456},"_id":{"c":3456},...}

This probably should not occur. a side effect is that any driver which parses this document will either take the first occurrence of the attribute or the last one. the PHP driver takes the last one, resulting in an incomplete _id value.

The mongo shell does not output these documents correctly because of SERVER-718 (duplicate fields just repeat the first).

{"_id":{"a":"foo","b":1234,"c":3456},"_id":{"a":"foo","b":1234,"c":3456},...}



 Comments   
Comment by Githook User [ 14/Nov/13 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-6835: removed extra, and erroring line
Branch: master
https://github.com/mongodb/mongo/commit/3f60ef6360fb18ccf6d32df89639071e7076b1af

Comment by Githook User [ 14/Nov/13 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-11531, SERVER-10489, SERVER-6835, SERVER-4830: Refactor update system to support immutable fields, consolodate storage validation, and misc issues.
Branch: master
https://github.com/mongodb/mongo/commit/b98712c551e8ab27c33e1a5e7c694fa36c3334ce

Comment by Reuben Bond [ 10/Jul/13 ]

I'm also running into this issue - and resorted to making this query for the same reason (i.e., that indexes/shard keys are not leveraged for inner keys of a composite key)
I have a repro. It doesn't require that the collection be sharded.

duplicateIdCompositeKey.js

use test
db.repro.update({"_id.shardKey": 1, "_id": {shardKey: 1, uniquifier: 1}}, { $setOnInsert: { payload: 1 }}, { upsert: true })
db.repro.find()

Produces this output - note the duplicate, identical _id fields. This crashes the C# driver.

{ "_id" : { "shardKey" : 1, "uniquifier" : 1 }, "_id" : { "shardKey" : 1, "uniquifier" : 1 }, "payload" : 1 }

Generated at Thu Feb 08 03:12:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.