[SERVER-61950] ReshardingOplogFetcher waits on network request completing without interruption, potentially preventing shard step-up from ever completing Created: 07/Dec/21  Updated: 29/Oct/23  Resolved: 10/Dec/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.0, 5.1.0
Fix Version/s: 5.3.0, 5.1.2, 5.0.6, 5.2.0-rc1

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-60859 ReshardingCoordinator waits on _canEn... Closed
is related to SERVER-61633 Resharding's RecipientStateMachine do... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.2, v5.1, v5.0
Sprint: Sharding 2021-12-13
Participants:
Linked BF Score: 50
Story Points: 3

 Description   

The ReshardingOplogFetcher uses ShardRemote::runAggregation() to run resharding's oplog fetching pipeline. ShardRemote::runAggregation() uses the Fetcher class to schedule the remote network requests. Fetcher::join() doesn't wait using an OperationContext so it continues to block even after the node steps down. For as long as the network request continues to run on the remote node, the the still-active Instance will prevent PrimaryOnlyService::onStepUp() and the overall step-up procedure from completing.

We should instead have Fetcher::join() wait using an OperationContext so the Fetcher::~Fetcher() destructor can abandon waiting for the remote network request.



 Comments   
Comment by Githook User [ 10/Dec/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-61950 Make Fetcher::join() interruptible.

(cherry picked from commit 0241ba4289618d77b2f3b9a3a3d07a6d08d2c432)
Branch: v5.0
https://github.com/mongodb/mongo/commit/f0fba3a5505a21fe7e4044535aeb342c7e5e48f1

Comment by Githook User [ 10/Dec/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-61950 Make Fetcher::join() interruptible.

(cherry picked from commit 0241ba4289618d77b2f3b9a3a3d07a6d08d2c432)
Branch: v5.1
https://github.com/mongodb/mongo/commit/3cb0d2090a9fe01e0943bae92affb7c6308c32b7

Comment by Githook User [ 10/Dec/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-61950 Make Fetcher::join() interruptible.

(cherry picked from commit 0241ba4289618d77b2f3b9a3a3d07a6d08d2c432)
Branch: v5.2
https://github.com/mongodb/mongo/commit/1dfc97d44c590bf2395d1d68a443c223ea4e25db

Comment by Githook User [ 10/Dec/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-61950 Make Fetcher::join() interruptible.
Branch: master
https://github.com/mongodb/mongo/commit/0241ba4289618d77b2f3b9a3a3d07a6d08d2c432

Generated at Thu Feb 08 05:53:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.