[SERVER-39495] Shard key is omitted from update and remove oplog entries with multi:true Created: 11/Feb/19  Updated: 29/Oct/23  Resolved: 28/Feb/19

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: 4.1.6
Fix Version/s: 4.1.9, 4.0.17

Type: Bug Priority: Major - P3
Reporter: Bernard Gorman Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Sharding 2019-02-25, Sharding 2019-03-11
Participants:

 Description   

For update and delete operations, a change stream respectively relies upon the kObject2FieldName and kObjectFieldName fields of the oplog entry to provide the documentKey for that event; that is, an object containing the values of each field of the shard key, plus the _id field if it is not already present as part of the shard key.

On the 4.0 branch, this is working as expected. On current master, it appears that in at least some scenarios, the update and remove oplog entries incorrectly omit the shard key fields and provide only the _id.

// Collection 'test.shardkeymissingtest' is sharded on {shardKey: 1}
mongos> db.getSiblingDB("config").collections.find({_id: "test.shardkeymissingtest"})
{ "_id" : "test.shardkeymissingtest", "lastmodEpoch" : ObjectId("5c619cea471abef89beca4df"), "lastmod" : ISODate("1970-02-19T17:02:47.296Z"), "dropped" : false, "key" : { "shardKey" : 1 }, "unique" : false, "uuid" : UUID("7db17aa6-327f-49a8-b89a-ee90b47f20da") }
 
// Insert documents on each shard
mongos> db.shardkeymissingtest.insert({shardKey: 25, x:1})
WriteResult({ "nInserted" : 1 })
mongos> db.shardkeymissingtest.insert({shardKey: 55, x:1})
WriteResult({ "nInserted" : 1 })
...
 
// Start watching the collection
mongos> let csCursor = db.shardkeymissingtest.watch()
 
// Issue a multi-update
mongos> db.shardkeymissingtest.update({}, {$set: {updated: true}}, {multi: true})
WriteResult({ "nMatched" : 3, "nUpserted" : 0, "nModified" : 3 })
 
// Obtain the change stream results. Note that the 'documentKey' field does NOT include the 'shardKey' field, only _id
mongos> csCursor
{ "_id" : { "_data" : "825C61A95E000000012B022C0100296E5A10047DB17AA6327F49A8B89AEE90B47F20DA463C5F6964003C6162634431000004" }, "operationType" : "update", "clusterTime" : Timestamp(1549904222, 1), "ns" : { "db" : "test", "coll" : "shardkeymissingtest" }, "documentKey" : { "_id" : "abcD1" }, "updateDescription" : { "updatedFields" : { "updated" : true }, "removedFields" : [ ] } }
{ "_id" : { "_data" : "825C61A95E000000022B022C0100296E5A10047DB17AA6327F49A8B89AEE90B47F20DA46645F696400645C61A92CFC5E304063E4E4C20004" }, "operationType" : "update", "clusterTime" : Timestamp(1549904222, 2), "ns" : { "db" : "test", "coll" : "shardkeymissingtest" }, "documentKey" : { "_id" : ObjectId("5c61a92cfc5e304063e4e4c2") }, "updateDescription" : { "updatedFields" : { "updated" : true }, "removedFields" : [ ] } }
mongos> db.shardkeymissingtest.getShardDistribution()
 
// Remove all the documents.
mongos> db.shardkeymissingtest.remove({})
WriteResult({ "nRemoved" : 3 })
 
// The documentKey is again incomplete in each of the resulting change stream events.
mongos> it
{ "_id" : { "_data" : "825C61AC77000000012B022C0100296E5A10047DB17AA6327F49A8B89AEE90B47F20DA463C5F6964003C6162634431000004" }, "operationType" : "delete", "clusterTime" : Timestamp(1549905015, 1), "ns" : { "db" : "test", "coll" : "shardkeymissingtest" }, "documentKey" : { "_id" : "abcD1" } }
{ "_id" : { "_data" : "825C61AC77000000012B022C0100296E5A10047DB17AA6327F49A8B89AEE90B47F20DA46645F696400645C61A930FC5E304063E4E4C30004" }, "operationType" : "delete", "clusterTime" : Timestamp(1549905015, 1), "ns" : { "db" : "test", "coll" : "shardkeymissingtest" }, "documentKey" : { "_id" : ObjectId("5c61a930fc5e304063e4e4c3") } }

This behaviour persists even when the shards are force-refreshed via _flushRoutingTableCacheUpdates, and even in cases where internally examining the ScopedCollectionMetadata proves that the mongoD is aware of the shard key for this collection.



 Comments   
Comment by Githook User [ 26/Feb/20 ]

Author:

{'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}

Message: SERVER-45599 Backport of SERVER-39495: Move ShardingState::needCollectionMetadata under OperationShardingState

ShardingState logically contains answers to questions about whether the
current instance is node in a sharded cluster, whereas
OperationShardingState is responsible for the 'shardedness' of the
commands.

This is a partial cherry-pick from b049257fbd1d215388cffaf7544f6741dbce5b45, adapted for the 4.0 branch.

Also backports the addition of more testing for multi:true/justOne:false updates and ChangeStreams, which was taken from commit 50f6bd4d6a9428a6f1df22db792d7b55d773762c.
Branch: v4.0
https://github.com/mongodb/mongo/commit/fcd2dd41189fffc6e67a8645b99974178f87ca04

Comment by Githook User [ 26/Feb/20 ]

Author:

{'username': 'kaloianm', 'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com'}

Message: SERVER-45599 Backport of SERVER-39495: Only return versioned filtering metadata for cases that actually need to do filtering

This is a partial cherry-pick from 851dad7902d6bb8c3ed25f99f565a2e2c8c8bc47, adapted for the 4.0 branch.
Branch: v4.0
https://github.com/mongodb/mongo/commit/62a6b963bc20ba74d2f3e0d62552dc3b7b1f1133

Comment by Githook User [ 29/Jan/20 ]

Author:

{'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}

Message: Revert "SERVER-45599 Backport of SERVER-39495: Only return versioned filtering metadata for cases that actually need to do filtering"

This reverts commit fe4ced8f98d731883e5a4511d434716629e457a8.
Branch: v4.0
https://github.com/mongodb/mongo/commit/a7e9c2223a6aa92f6648d3800d251d8335cd1881

Comment by Githook User [ 29/Jan/20 ]

Author:

{'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}

Message: Revert "SERVER-45599 Backport of SERVER-39495: Move ShardingState::needCollectionMetadata under OperationShardingState"

This reverts commit 1a01c53df8f7c1e016c0ccbc38b77f6b3508bf65.
Branch: v4.0
https://github.com/mongodb/mongo/commit/385511e9d05b254beb6767ed92a3cd95e83ea166

Comment by Githook User [ 26/Jan/20 ]

Author:

{'username': 'kaloianm', 'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com'}

Message: SERVER-45599 Backport of SERVER-39495: Move ShardingState::needCollectionMetadata under OperationShardingState

ShardingState logically contains answers to questions about whether the
current instance is node in a sharded cluster, whereas
OperationShardingState is responsible for the 'shardedness' of the
commands.

This is a partial cherry-pick from b049257fbd1d215388cffaf7544f6741dbce5b45, adapted for the 4.0 branch.

Also backports the addition of more testing for multi:true/justOne:false updates and ChangeStreams, which was taken from commit 50f6bd4d6a9428a6f1df22db792d7b55d773762c.
Branch: v4.0
https://github.com/mongodb/mongo/commit/1a01c53df8f7c1e016c0ccbc38b77f6b3508bf65

Comment by Githook User [ 26/Jan/20 ]

Author:

{'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm', 'name': 'Kaloian Manassiev'}

Message: SERVER-45599 Backport of SERVER-39495: Only return versioned filtering metadata for cases that actually need to do filtering

This is a partial cherry-pick from 851dad7902d6bb8c3ed25f99f565a2e2c8c8bc47, adapted for the 4.0 branch.
Branch: v4.0
https://github.com/mongodb/mongo/commit/fe4ced8f98d731883e5a4511d434716629e457a8

Comment by Githook User [ 06/Mar/19 ]

Author:

{'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}

Message: SERVER-39495 Add more testing for multi:true/justOne:false updates and ChangeStreams
Branch: master
https://github.com/mongodb/mongo/commit/50f6bd4d6a9428a6f1df22db792d7b55d773762c

Comment by Githook User [ 28/Feb/19 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-39495 Only return versioned filtering metadata for cases that actually need to do filtering
Branch: master
https://github.com/mongodb/mongo/commit/851dad7902d6bb8c3ed25f99f565a2e2c8c8bc47

Comment by Githook User [ 27/Feb/19 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-39495 Move ShardingState::needCollectionMetadata under OperationShardingState

ShardingState logically contains answers to questions about whether the
current instance is node in a sharded cluster, whereas
OperationShardingState is responsible for the 'shardedness' of the
commands.
Branch: master
https://github.com/mongodb/mongo/commit/b049257fbd1d215388cffaf7544f6741dbce5b45

Comment by Bernard Gorman [ 12/Feb/19 ]

Many thanks kaloian.manassiev! I was mostly looking at this from the perspective of fixing change streams, so there may be some additional callsites beyond the three identified above where a similar change is appropriate.

Comment by Kaloian Manassiev [ 12/Feb/19 ]

Thank you bernard.gorman for the detailed analysis! Indeed the intention of this change was that operations which were not sent with sharded information on them should be treated as unsharded (or as if the customer wrote directly to the shard). However now I see how this can break multi-writes.

I will change it back to use getCurrentMetadata instead.

Generated at Thu Feb 08 04:52:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.