|
The ReshardingOplogFetcher currently specifies {afterClusterTime: <_id.ts of highest entry>, level: "majority"} when it should instead specify {afterClusterTime: <_id.clusterTime of highest entry>, level: "majority"}. Consider the following sequence:
- The donor shard prepares and commits a cross-shard transaction which spans multiple oplog entries. Let's say they have timestamps {ts: 10} and {ts: 12}.
- The ReshardingOplogFetcher inserts the first oplog entry from the cross-shard transaction. It uses {_id: {clusterTime: 12, ts: 10}} for the first oplog entry.
- The ReshardingOplogFetcher is restarted. It specifies {afterClusterTime: 10, level: "majority"}.
- Suppose that the ReshardingOplogFetcher targets a stale secondary within the donor replica set shard. The afterClusterTime will only wait for the stale secondary to have fetched the prepare oplog entry.
- The ReshardingOplogFetcher receives an empty batch. (This is because resharding's oplog fetching pipeline only returns oplog entries from multi-statement transaction when they commit or abort.) The postBatchResumeToken value with be {ts: 10} because this is the highest entry in the secondary's oplog.
- The changes from
SERVER-49897 lead the recipient to insert a synthetic no-op oplog entry into the oplog buffer collection with {_id: {clusterTime: 10, ts: 10}}. This violates the assumption that the oplog buffer collection is always written to in increasing order of _id.
- If the ReshardingOplogFetcher is restarted again, it'll specify {_id: {$gt: {_id: {clusterTime: 10, ts: 10}}}} and won't skip over the first oplog entry with {_id: {clusterTime: 12, ts: 10}}. It will attempt attempt to insert that oplog entry again and trigger a DuplicateKey error.
The ReshardingOplogFetcher will be perpetually stuck hitting DuplicateKey errors and will prevent the recipient shard from ever reaching the "strict-consistency" state. One solution is to have the ReshardingOplogFetcher specify {afterClusterTime: 12, level: "majority"} so that it could only ever see a postBatchResumeToken value of {ts: 12}.
257
|
if (_useReadConcern) {
|
258
|
auto readConcernArgs = repl::ReadConcernArgs(
|
259
|
boost::optional<LogicalTime>(_startAt.getTs()),
|
260
|
boost::optional<repl::ReadConcernLevel>(repl::ReadConcernLevel::kMajorityReadConcern));
|
261
|
aggRequest.setReadConcern(readConcernArgs.toBSONInner());
|
262
|
}
|
https://github.com/mongodb/mongo/blob/43479818bd01f27ee25b6e992045529d2ac0185a/src/mongo/db/s/resharding/resharding_oplog_fetcher.cpp#L259
[js_test:resharding_fuzzer-120e1-1630680862386-9] d20024| 2021-09-03T15:10:16.607+00:00 I COMMAND 21581 [conn1] "Received replSetStepUp request"
|
[js_test:resharding_fuzzer-120e1-1630680862386-9] d20024| 2021-09-03T15:10:17.888+00:00 W RESHARD 5127200 [OplogFetcher-985f9d35-1d7c-4735-9252-e1a426999574-shard0] "Error while fetching, retrying.","attr":{"error":{"code":11000,"codeName":"DuplicateKey","errmsg":"E11000 duplicate key error collection: config.localReshardingOplogBuffer.5163afeb-5991-43f6-a2e2-851827f35529.shard0 index: _id_ dup key: { _id: { clusterTime: Timestamp(1630681816, 10), ts: Timestamp(1630681816, 7) } }","keyPattern":{"_id":1},"keyValue":{"_id":{"clusterTime":{"$timestamp":{"t":1630681816,"i":10}},"ts":{"$timestamp":{"t":1630681816,"i":7}}}}}}
|
...
|
[js_test:resharding_fuzzer-120e1-1630680862386-9] d20024| 2021-09-03T15:20:39.030+00:00 W RESHARD 5127200 [OplogFetcher-985f9d35-1d7c-4735-9252-e1a426999574-shard0] "Error while fetching, retrying.","attr":{"error":{"code":11000,"codeName":"DuplicateKey","errmsg":"E11000 duplicate key error collection: config.localReshardingOplogBuffer.5163afeb-5991-43f6-a2e2-851827f35529.shard0 index: _id_ dup key: { _id: { clusterTime: Timestamp(1630681816, 10), ts: Timestamp(1630681816, 7) } }","keyPattern":{"_id":1},"keyValue":{"_id":{"clusterTime":{"$timestamp":{"t":1630681816,"i":10}},"ts":{"$timestamp":{"t":1630681816,"i":7}}}}}}
|
[js_test:resharding_fuzzer-120e1-1630680862386-9] assert.soon failed, msg : Timed out waiting for failpoint to be hit. Failpoint: reshardingPauseCoordinatorBeforeDecisionPersisted The hang analyzer is automatically called in assert.soon functions. If you are *expecting* assert.soon to possibly fail, call assert.soon with {runHangAnalyzer: false} as the fifth argument (you can fill unused arguments with `undefined`). Running hang analyzer from assert.soon.
|
https://evergreen.mongodb.com/lobster/build/525eaf3cc1abdad83ac549b6da098c9d/test/61323aa854f2483d51452425#bookmarks=0%2C36209%2C37732%2C94544%2C95519&f~=000~%5C%5BResharding.%2AService&f~=100~d20024%5C%7C&f~=110~%28REPL_HB%7CELECTION%29&f~=100~DuplicateKey
|