[SERVER-35167] AsyncResultsMerger can block networking threads in callbacks Created: 22/May/18  Updated: 27/Oct/23  Resolved: 06/Feb/20

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 3.2.20, 3.4.15
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Mira Carey Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-34582 AsyncRequestsSender can block network... Closed
Assigned Teams:
Sharding
Operating System: ALL
Participants:

 Description   
  • The AsyncResultsMerger holds a lock while it runs callbacks, and while it calls nextEvent()
  • running askForNextBatch_inlock does repl targetting
  • repl targetting can hang for up to 20 seconds

The bad scenario is:

  • Bunch of remotes are established
  • The first host suceeds in targetting and runs
  • The second host cannot satisfy it's read pref, blocking holding a lock in nextEvent()
  • The first request comes back, blocking on the mutex waiting to call handleBatchResponse

If you have enough of those, you saturate all the background networking threads and hang your mongos.

This isn't a problem in 3.6 and later because targetting was moved into the ARS and SERVER-34582 fixes the problem there



 Comments   
Comment by Sheeri Cabral (Inactive) [ 06/Feb/20 ]

minimum supported version is 3.6, and this issue is only <3.6

Comment by Kaloian Manassiev [ 29/May/18 ]

Unfortunately, this is very hard to fix in the 3.4 code base since it would require making the replica set monitor event-driven. Given that 3.2 and 3.4 have been out for a while and since it is an extremely rare condition, I am putting it on the Backlog.

Generated at Thu Feb 08 04:39:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.