[SERVER-75279] Exchange producer/consumer may get stuck if number of producers is close to number of read tickets Created: 24/Mar/23  Updated: 29/Oct/23  Resolved: 29/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Gregory Noma Assignee: Svilen Mihaylov (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
Related
is related to SERVER-75423 [CQF] Allow exchange to work independ... Backlog
Assigned Teams:
Query Optimization
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Run jstests/cqf_parallel/index.js or jstests/cqf_parallel/groupby.js with the storageEngineConcurrencyAdjustmentAlgorithm server parameter set to "" and the storageEngineConcurrentReadTransactions server parameter set to either 5 or 6.

Participants:
Linked BF Score: 149

 Description   

When we spawn exchange producer threads, each one takes a global IS lock (and thus also takes a read ticket). If the number of read tickets is less than, equal to, or just slightly above the number of producers we're creating, we may get stuck; some of the producers will queue for a read ticket and block at the global lock acquisition forever.



 Comments   
Comment by Githook User [ 29/Mar/23 ]

Author:

{'name': 'Svilen Mihaylov', 'email': 'svilen.mihaylov@mongodb.com', 'username': 'svilen-mihaylov'}

Message: SERVER-75279 Exchange producer/consumer may get stuck if number of producers is close to number of read tickets
Branch: master
https://github.com/mongodb/mongo/commit/dd4f313efac46093e92d8a5fe0049df69b4d549d

Comment by Alya Berciu [ 28/Mar/23 ]

david.storch@mongodb.com I created a dedicated ticket (SERVER-75383) to disable the tests, and marked the BF as dependent on that, so that we can resolve the BF. Given that this is a real bug, could we move it back to the QE backlog and schedule it whenever we prioritize intra-query parallelism? At that point, we could also try re-enabling the failing tests.

CC svilen.mihaylov@mongodb.com 

Comment by Alya Berciu [ 27/Mar/23 ]

Reassigning to QE, since this appears to happen in SBE code.

Generated at Thu Feb 08 06:29:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.