[SERVER-34582] AsyncRequestsSender can block network threads during construction Created: 19/Apr/18  Updated: 29/Oct/23  Resolved: 24/Apr/18

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: 3.6.4
Fix Version/s: 3.6.5, 3.7.7

Type: Bug Priority: Critical - P2
Reporter: Mira Carey Assignee: Mira Carey
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-35167 AsyncResultsMerger can block networki... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Platforms 2018-04-23, Platforms 2018-05-07
Participants:
Linked BF Score: 0

 Description   
  • The AsyncRequestsSender holds a lock during construction and work scheduling.
  • This lock prevents callbacks from running if their response comes back during scheduling.
  • Scheduling can take a long time (up to 20 seconds per shard) if a read preference cannot be satisfied. This is done by a blocking call into the ReplicaSetMonitor

The bad sequence of events is:

  1. Scatter gather request to two shards is dispatched
  2. The first host suceeds in targetting and runs
  3. The second host cannot satisfy it's read pref, blocking holding a lock
  4. The first request suceeds, blocking in running _handleResponse

If you have enough of those, you can saturate all background networking workers, making your mongos completely unresponsive until targeting can succeed.



 Comments   
Comment by Githook User [ 08/May/18 ]

Author:

{'email': 'jcarey@argv.me', 'name': 'Jason Carey', 'username': 'hanumantmk'}

Message: SERVER-34582 Replace object level lock for ARS

ARS holds a lock during scheduling, to prevent notification during
scheduling. As an unfortunate side effect, this prevents callbacks from
resolving during scheduling. (which can cause background executors to
block in executing a callback).

This replaces the mutex with a producer consumer queue which handles
responses, and moves response handling into calls to next().

(cherry picked from commit ab112a029bca9d575379d42450ea2a7e9254c6de)
Branch: v3.6
https://github.com/mongodb/mongo/commit/b9be9ba0418cb94430cc5df3524580df6fdc7903

Comment by Githook User [ 24/Apr/18 ]

Author:

{'email': 'jcarey@argv.me', 'username': 'hanumantmk', 'name': 'Jason Carey'}

Message: SERVER-34582 Replace object level lock for ARS

ARS holds a lock during scheduling, to prevent notification during
scheduling. As an unfortunate side effect, this prevents callbacks from
resolving during scheduling. (which can cause background executors to
block in executing a callback).

This replaces the mutex with a producer consumer queue which handles
responses, and moves response handling into calls to next().
Branch: master
https://github.com/mongodb/mongo/commit/ab112a029bca9d575379d42450ea2a7e9254c6de

Generated at Thu Feb 08 04:37:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.