[SERVER-56127] Retryable update may execute more than once if chunk is migrated and shard key pattern uses nested fields Created: 15/Apr/21 Updated: 29/Oct/23 Resolved: 14/Oct/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.2.0, 4.4.0, 5.0.0, 5.1.0-rc0 |
| Fix Version/s: | 5.2.0, 5.1.2, 5.0.6, 4.2.23, 4.4.17 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Bobby Morck (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v5.1, v5.0, v4.4, v4.2
|
||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | QE 2021-10-04, QE 2021-10-18 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 165 | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
Donor shards filter oplog entries relevant for session migration as part of chunk migration by extracting the shard key value from the oplog entry. In particular, it attempts to extract the shard key value from the 'o' field for op='i' insert oplog entries, from the 'o2' field for op='u' update oplog entries, and from the 'o' field for op='d' delete oplog entries. However, the shard key value has already been extracted from the document for op='u' and op='d' oplog entries as part of generating the oplog entries. For example, with a shard key pattern {"x.y": 1}, the resulting oplog entry would contain {o2: {"x.y": 5, _id: 0}} for an update and {o: {"x.y": 5, _id: 0}} for a delete. Attempting to extract the shard key value from those 'o2' and 'o' objects with ShardKeyPattern::extractShardKeyFromDoc() would result in a shard key value {"x.y": null} and lead the donor shard to incorrectly conclude the oplog entries isn't relevant for the chunk actively being migrated. This causes the recipient shard to not know the statement(s) from that oplog entry have already executed and therefore allows them to execute for a second time after the chunk migration commits.
|
| Comments |
| Comment by Githook User [ 22/Dec/21 ] |
|
Author: {'name': 'Bobby Morck', 'email': 'bobby.morck@mongodb.com', 'username': 'bmorck'}Message: (cherry picked from commit 6d8290297b563121037f8e9a9f2d37ec45ddb4bf) |
| Comment by Githook User [ 22/Dec/21 ] |
|
Author: {'name': 'Bobby Morck', 'email': 'bobby.morck@mongodb.com', 'username': 'bmorck'}Message: (cherry picked from commit 6d8290297b563121037f8e9a9f2d37ec45ddb4bf) |
| Comment by Githook User [ 14/Oct/21 ] |
|
Author: {'name': 'Bobby Morck', 'email': 'bobby.morck@mongodb.com', 'username': 'bmorck'}Message: |
| Comment by Githook User [ 13/Oct/21 ] |
|
Author: {'name': 'Ethan Zhang', 'email': 'ethan.zhang@mongodb.com', 'username': 'yzhang1991'}Message: Revert " This reverts commit fedd7fa7eaf29751d3573ff39f18c3f09abbf06c. |
| Comment by Ethan Zhang (Inactive) [ 13/Oct/21 ] |
|
Reopening this issue because it caused a BF and the original commit is being reverted. |
| Comment by Githook User [ 08/Oct/21 ] |
|
Author: {'name': 'Bobby Morck', 'email': 'bobby.morck@mongodb.com', 'username': 'bmorck'}Message: |
| Comment by Max Hirschhorn [ 06/Oct/21 ] |
Yes, that's correct. |
| Comment by Ethan Zhang (Inactive) [ 06/Oct/21 ] |
|
max.hirschhorn Am I correct that we need to backport this all the way back to 4.2? |