[SERVER-79607] ShardRegistry shutdown should not wait indefinitely on outstanding network requests Created: 02/Aug/23  Updated: 29/Oct/23  Resolved: 14/Aug/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0, 7.0.1, 6.0.10

Type: Bug Priority: Major - P3
Reporter: Allison Easton Assignee: Allison Easton
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0, v6.0
Sprint: Sharding EMEA 2023-08-07, Sharding EMEA 2023-08-21
Participants:
Linked BF Score: 120

 Description   

There is a detailed description in the last comment of the BF linked.

Summary:

When we shutdown the shard registry, we shut down and join the thread pool used by the shard registry. Shutting down the thread pool just changes the state to be joinRequired. This stops any new tasks from being scheduled, but requires the join to still wait for all ongoing tasks to complete without interrupting them. In the case shown in the BF, one of the outgoing requests never finished, thus causing the join to stall.

Some options to fix this:

  1. Shut down the shard registry after we interrupt ongoing operations, rather than before. This way we would know that there are no ongoing operations that need waiting for when we call join. I am not sure what other implications moving this shutdown may have, though.
  2. Interrupt ongoing lookups during shard registry shutdown. This would likely imply keeping cancellation tokens so that operations can be interrupted during shutdown.
  3. Add a timeout to the network calls done by the shard registry. This solution, though, could cause shard registry refresh failures during times other than shutdown.


 Comments   
Comment by Githook User [ 22/Aug/23 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-79607 ShardRegistry shutdown should not wait indefinitely on outstanding network requests

(cherry picked from commit cce7e48a789e973cb647639f4d482d64c1c87b23)
Branch: v6.0
https://github.com/mongodb/mongo/commit/ead4217e9f9e4e56cf30b022bc14633cafd25fcb

Comment by Githook User [ 22/Aug/23 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-79607 ShardRegistry shutdown should not wait indefinitely on outstanding network requests

(cherry picked from commit cce7e48a789e973cb647639f4d482d64c1c87b23)
Branch: v7.0
https://github.com/mongodb/mongo/commit/88ff3f397de2c151e4c5cc31f802cc7df1587b8d

Comment by Githook User [ 14/Aug/23 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-79607 ShardRegistry shutdown should not wait indefinitely on outstanding network requests
Branch: master
https://github.com/mongodb/mongo/commit/cce7e48a789e973cb647639f4d482d64c1c87b23

Generated at Thu Feb 08 06:41:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.