[SERVER-79682] ShardsvrReshardCollection Can Hang If Stepdown Occurs Shortly After Stepping Up Created: 03/Aug/23  Updated: 29/Oct/23  Resolved: 18/Sep/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 6.0.0
Fix Version/s: 7.2.0-rc0, 7.0.2, 5.0.22, 6.0.11, 7.1.0-rc3

Type: Bug Priority: Major - P3
Reporter: Brett Nawrocki Assignee: Nandini Bhartiya
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Assigned Teams:
Sharding NYC
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.1, v7.0, v6.0, v5.0
Sprint: Sharding NYC 2023-09-18, Sharding NYC 2023-10-02
Participants:
Linked BF Score: 12
Story Points: 3

 Description   

The ShardsvrReshardCollection command does not flag the operation context to be interrupted during a stepdown, which is commonly done by other commands. This means that when calling getOrCreateInstance, it's possible to hang in the call to _waitForRecoveryCompletion when waiting for the state to reach kRecovered. After a stepdown, the state will be set to kPaused, so it's necessary that the operation context be interrupted at stepdown to avoid the hang.

See this and this comment on BF-29457 for more information and an example of this happening.



 Comments   
Comment by Githook User [ 18/Sep/23 ]

Author:

{'name': 'Nandini Bhartiya', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-79682: Ensure the opCtx is interrupted during a stepdown for the shardSvrReshardCollection cmd.

(cherry picked from commit 90680ac48281685551dad1253986be0c50de84bc)
Branch: v6.0
https://github.com/mongodb/mongo/commit/4116e1f9f8f255f00763dc78137bda54d4f7c796

Comment by Githook User [ 18/Sep/23 ]

Author:

{'name': 'Nandini Bhartiya', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-79682: Ensure the opCtx is interrupted during a stepdown for the shardSvrReshardCollection cmd.

(cherry picked from commit 90680ac48281685551dad1253986be0c50de84bc)
Branch: v5.0
https://github.com/mongodb/mongo/commit/1edd6a7e56f6033b131de0d7be431cd67ec34ecb

Comment by Githook User [ 18/Sep/23 ]

Author:

{'name': 'Nandini Bhartiya', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-79682: Ensure the opCtx is interrupted during a stepdown for the shardSvrReshardCollection cmd.

(cherry picked from commit 90680ac48281685551dad1253986be0c50de84bc)
Branch: v7.1
https://github.com/mongodb/mongo/commit/61c872f2e97cc5a40bd8da0e2c06485ad9f4c498

Comment by Githook User [ 18/Sep/23 ]

Author:

{'name': 'Nandini Bhartiya', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-79682: Ensure the opCtx is interrupted during a stepdown for the shardSvrReshardCollection cmd.

(cherry picked from commit 90680ac48281685551dad1253986be0c50de84bc)
Branch: v7.0
https://github.com/mongodb/mongo/commit/07ac3361daab42b590f0eccb1e8086593d0697e5

Comment by Githook User [ 15/Sep/23 ]

Author:

{'name': 'Nandini Bhartiya', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-79682: Ensure the opCtx is interrupted during a stepdown for shardSvrReshardCollection cmd.
Branch: master
https://github.com/mongodb/mongo/commit/909f17054c729cd05e3b4c40a0ca284222b833bc

Generated at Thu Feb 08 06:41:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.