[SERVER-75205] Deadlock between stepdown and restoring locks after yielding when all read tickets exhausted Created: 23/Mar/23  Updated: 20/Dec/23  Resolved: 29/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.0.0, 4.4.15, 5.0.10, 6.3.0-rc2
Fix Version/s: 7.0.0-rc0, 4.4.20, 5.0.16, 6.0.6, 6.3.0-rc3

Type: Bug Priority: Blocker - P1
Reporter: Samyukta Lanka Assignee: Matt Kneiser
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
is caused by SERVER-65821 Deadlock during setFCV when there are... Closed
Related
related to SERVER-84353 The test for stepDown deadlock with r... Closed
related to SERVER-75262 Add a passthrough test that exercises... Open
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.3
Sprint: Execution Team 2023-04-03
Participants:
Case:

 Description   

After yielding, operations will restore their lock state via restoreLockState. This function will iterate over each lock that was previously held and try to reacquire it in sorted order. However, we don't actually try to reacquire the FCV lock, which should be reacquired after the PBWM. When we go to try to reacquire the RSTL, we fail the check since the lock in question is actually the FCV lock (but we never checked for it). We will then acquire the global lock (including a acquiring read ticket) without having the FCV lock or the RSTL.

Once that is done, we will reacquire all the other locks we held, which in this case includes the RSTL (but now out of order).

When the stepdown thread starts, it enqueues the RSTL in X mode, which jumps to the top of the queue. At the same time, there will operations that are holding the RSTL in IX mode, but are waiting to acquire read tickets, which is preventing the stepdown thread from proceeding. If we have exhausted all read tickets in the system, then these threads are stuck waiting while holding the RSTL but the threads holding the read tickets cannot progress since they are stuck behind the stepdown thread waiting for the RSTL.

There is also a variation of this that can happen on step up when we are holding the RSTL and waiting on ticket acquisition. 

We should be accounting for the FCV lock when we restore locks.



 Comments   
Comment by Githook User [ 13/Apr/23 ]

Author:

{'name': 'Matt Kneiser', 'email': 'matt.kneiser@mongodb.com', 'username': 'themattman'}

Message: SERVER-75205 Forward port of ticket exhaustion test tweaks
Branch: master
https://github.com/mongodb/mongo/commit/2ad1e9ac43f4427fbcd6690c64c47e6bc26cce94

Comment by Githook User [ 29/Mar/23 ]

Author:

{'name': 'Matt Kneiser', 'email': 'matt.kneiser@mongodb.com', 'username': 'themattman'}

Message: SERVER-75205 Fix deadlock involving FCV lock

With minor jstest amendments.

(cherry picked from commit e74f9c2fdf73ad707419fa4a8ae57aec70423ca6)
Branch: v6.3
https://github.com/mongodb/mongo/commit/62488ce91951aaf0ac35df145779778219261c0a

Comment by Githook User [ 29/Mar/23 ]

Author:

{'name': 'Matt Kneiser', 'email': 'matt.kneiser@mongodb.com', 'username': 'themattman'}

Message: SERVER-75205 Fix deadlock involving FCV lock

With minor jstest amendments.

(cherry picked from commit e74f9c2fdf73ad707419fa4a8ae57aec70423ca6)
Branch: v6.0
https://github.com/mongodb/mongo/commit/065114a92afecff15d89cf4de35132e0c68893ab

Comment by Githook User [ 29/Mar/23 ]

Author:

{'name': 'Matt Kneiser', 'email': 'matt.kneiser@mongodb.com', 'username': 'themattman'}

Message: SERVER-75205 Fix deadlock involving FCV lock

With minor jstest amendments.

(cherry picked from commit e74f9c2fdf73ad707419fa4a8ae57aec70423ca6)
Branch: v5.0
https://github.com/mongodb/mongo/commit/b2e62a90e39c2ede741c5c92e78f85d192bb9d68

Comment by Githook User [ 29/Mar/23 ]

Author:

{'name': 'Matt Kneiser', 'email': 'matt.kneiser@mongodb.com', 'username': 'themattman'}

Message: SERVER-75205 Fix deadlock involving FCV lock

With minor jstest amendments.

(cherry picked from commit e74f9c2fdf73ad707419fa4a8ae57aec70423ca6)
Branch: v4.4
https://github.com/mongodb/mongo/commit/6b9275d8629aac59912bcee8cb3cf4dbe86da2b9

Comment by Githook User [ 29/Mar/23 ]

Author:

{'name': 'Matt Kneiser', 'email': 'matt.kneiser@mongodb.com', 'username': 'themattman'}

Message: SERVER-75205 Fix deadlock involving FCV lock
Branch: master
https://github.com/mongodb/mongo/commit/e74f9c2fdf73ad707419fa4a8ae57aec70423ca6

Generated at Thu Feb 08 06:29:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.