Donor shards filter oplog entries relevant for session migration as part of chunk migration by extracting the shard key value from the oplog entry. In particular, it attempts to extract the shard key value from the 'o' field for op='i' insert oplog entries, from the 'o2' field for op='u' update oplog entries, and from the 'o' field for op='d' delete oplog entries. However, the shard key value has already been extracted from the document for op='u' and op='d' oplog entries as part of generating the oplog entries. For example, with a shard key pattern {"x.y": 1}, the resulting oplog entry would contain {o2: {"x.y": 5, _id: 0}} for an update and {o: {"x.y": 5, _id: 0}} for a delete. Attempting to extract the shard key value from those 'o2' and 'o' objects with ShardKeyPattern::extractShardKeyFromDoc() would result in a shard key value {"x.y": null} and lead the donor shard to incorrectly conclude the oplog entries isn't relevant for the chunk actively being migrated. This causes the recipient shard to not know the statement(s) from that oplog entry have already executed and therefore allows them to execute for a second time after the chunk migration commits.
if (nextOplog->isCrudOpType()) { auto shardKey = _keyPattern.extractShardKeyFromDoc(nextOplog->getObjectContainingDocumentKey()); if (!_chunkRange.containsKey(shardKey)) { continue; } }
[js_test:repro_retryable_update_multiple_execution_dotted_sk] uncaught exception: Error: [{ "_id" : 0, "x" : { "y" : 5 }, "counter" : 2 }] != [{ "_id" : 0, "x" : { "y" : 5 }, "counter" : 1 }] are not equal :
- is caused by
-
SERVER-31031 Don't send oplog entries that are unrelated to the chunk being migrated
- Closed
- is related to
-
SERVER-41074 Don't migrate prePostImageDoc session entries for CRUD operations outside of current migration range
- Closed
-
SERVER-55111 When using a nested shard key, a delete in a txn to a chunk that has moved is not throwing MigrationConflict
- Closed