[SERVER-15750] Deadlock cycle in replication among oplog producer, oplog application and replication executor threads Created: 20/Oct/14  Updated: 11/Jul/16  Resolved: 21/Oct/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.7.8

Type: Bug Priority: Major - P3
Reporter: Andy Schwerin Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

What follows is a description of a deadlock cycle observed while running maintenance_non-blocking.js. In summary, the error is that one cannot wait on the replication executor while holding the bgsync mutex.

The oplog application thread periodically calls tryToGoLiveAsSecondary(), which acquires the global lock in shared (S) mode, and then calls getMaintenanceMode() on the replication coordinator, which schedules and waits for a callback on the replication executor.

The oplog producer thread locks the bgsync mutex (BackgroundSync::_mutex), and then tries to acquire the global intent exclusive (IX) lock, blocking behind the oplog application thread.

A third thread runs setMaintenanceMode, which blocks in the replication executor trying to clear the sync source in the producer thread, which requires the bgsync mutex.

So, the executor is blocked in the setMaintenanceModeHelper waiting for the bgsync mutex, but the bgsync mutex is held by the oplog producer, which is waiting for the global lock in IX mode which is blocked by the oplog application thread, which holds the global lock in S mode and is waiting for a callback to run through the replication executor.



 Comments   
Comment by Githook User [ 21/Oct/14 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-15750 Break deadlock cycle by not holding the global lock while acquiring the bgsync mutex.
Branch: master
https://github.com/mongodb/mongo/commit/da9927d08b550c4bec17ffc1b1d93ca3519285f6

Generated at Thu Feb 08 03:38:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.