[SERVER-67457] Resharding operation aborted in the midst of contacting participants may stall on config server primary indefinitely Created: 22/Jun/22  Updated: 29/Oct/23  Resolved: 06/Jul/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.0, 6.0.0-rc10
Fix Version/s: 6.0.1, 5.0.10, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Abdul Qadeer
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-56936 Ensure ReshardingCoordinator's _flush... Closed
is related to SERVER-61444 Resharding uses of bumpCollectionVers... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.0
Sprint: Sharding 2022-06-27, Sharding 2022-07-11
Participants:
Linked BF Score: 5
Story Points: 3

 Description   

After the resharding coordinator has transitioned into the "preparing-to-donate" state, it is required to establish the DonorStateMachines and RecipientStateMachines on the participant shards before proceeding with the remainder of the resharding operation. This synchronization every participant shard is aware of the resharding operation and will react accordingly to a subsequent _flushReshardingStateChange, _shardsvrCommitReshardCollection, or _shardsvrAbortReshardCollection command. The logic prior to _tellAllParticipantsReshardingStarted() is flawed because it possible for the resharding coordinator to

  1. Atomically write the config.collections and config.chunks entries for the temporary resharding collection, advance the state in the config.collections entry to be "preparing-to-donate" for the collection being resharded, and advance the state in config.reshardingOperations to be "preparing-to-donate".
  2. Receive an abortReshardCollection command either explicitly from the user or implicitly via a setFeatureCompatibilityVersion command.
  3. Skip waiting for the replica set transaction in step (1) to become majority-committed.
  4. Broadcast the _flushRoutingTableCacheUpdatesWithWriteConcern command to all participant shards using a $configTime prior to the replica set transaction from step (1) because the Timestamp from step (1) hasn't become majority-committed yet.

Shards receive the _flushRoutingTableCacheUpdatesWithWriteConcern command and observe a state of config.collections for the source collection and temporary resharding collection entries prior to the replica set transaction from step (1). In particular, the recipient shards would not observe the config.collections entry for the temporary resharding collection at all and would treat the namespace as unsharded. The shards therefore skip constructing the DonorStateMachine and RecipientStateMachine objects but responded ok:1 to the resharding coordinator as if they had.

The resharding coordinator continues to wait for the participant shards to update their state within the config.reshardingOperations document to "done" and signal they've finished their cleanup for the resharding operation. However, because the participants shards never constructed the DonorStateMachine and RecipientStateMachine object, they'll also never perform that update on the config.reshardingOperations document. This leads the resharding coordinator to wait indefinitely on this future.

Manual intervention on the config.reshardingOperations document would be required to unblock the resharding coordinator. The source collection will be unable to perform other sharding DDL commands in the meantime.

12118:[js_test:setfcv_reshard_collection] d20276| {"t":{"$date":"2022-06-14T01:50:03.173+00:00"},"s":"I",  "c":"SHARDING", "id":21985,   "ctx":"RecoverRefreshThread","msg":"Updating metadata for this namespace because the remote metadata has a newer collection version","attr":{"namespace":"reshardingDb.testColl","activeMetadata":"collection version: 1|5||62a7e944c481ca50994b1490||Timestamp(1655171396, 41), shard version: 1|0||62a7e944c481ca50994b1490||Timestamp(1655171396, 41)","remoteMetadata":"collection version: 1|7||62a7e944c481ca50994b1490||Timestamp(1655171396, 41), shard version: 1|0||62a7e944c481ca50994b1490||Timestamp(1655171396, 41)"}}
12121:[js_test:setfcv_reshard_collection] d20278| {"t":{"$date":"2022-06-14T01:50:03.175+00:00"},"s":"I",  "c":"SHARDING", "id":21985,   "ctx":"RecoverRefreshThread","msg":"Updating metadata for this namespace because the remote metadata has a newer collection version","attr":{"namespace":"reshardingDb.testColl","activeMetadata":"collection version: 1|5||62a7e944c481ca50994b1490||Timestamp(1655171396, 41), shard version: 1|5||62a7e944c481ca50994b1490||Timestamp(1655171396, 41)","remoteMetadata":"collection version: 1|7||62a7e944c481ca50994b1490||Timestamp(1655171396, 41), shard version: 1|7||62a7e944c481ca50994b1490||Timestamp(1655171396, 41)"}}
12138:[js_test:setfcv_reshard_collection] d20278| {"t":{"$date":"2022-06-14T01:50:03.188+00:00"},"s":"I",  "c":"SHARDING", "id":21917,   "ctx":"RecoverRefreshThread","msg":"Marking collection as unsharded","attr":{"namespace":"reshardingDb.system.resharding.face3185-7dbc-4460-adc9-c6c9a1603801"}}
12139:[js_test:setfcv_reshard_collection] d20276| {"t":{"$date":"2022-06-14T01:50:03.189+00:00"},"s":"I",  "c":"SHARDING", "id":21917,   "ctx":"RecoverRefreshThread","msg":"Marking collection as unsharded","attr":{"namespace":"reshardingDb.system.resharding.face3185-7dbc-4460-adc9-c6c9a1603801"}}



 Comments   
Comment by Githook User [ 07/Jul/22 ]

Author:

{'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}

Message: SERVER-67457 Wait for majority commit on completion
Branch: v5.0
https://github.com/mongodb/mongo/commit/4f48766343c4d029d2b5cf373c3a6c46ddf6b576

Comment by Githook User [ 06/Jul/22 ]

Author:

{'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}

Message: SERVER-67457 Wait for majority commit on completion
Branch: master
https://github.com/mongodb/mongo/commit/f62a3778d114a208a3d8c6ddc09e7a7702cde8c2

Comment by Max Hirschhorn [ 22/Jun/22 ]

A solution here would be to move the _waitForMajority() into the _tellAllParticipantsReshardingStarted() logic so it is part of the onCompletion() and also to use the stepdown token rather than the abort token for the wait.

Generated at Thu Feb 08 06:08:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.