Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.7.8
Affects Version/s: None
Component/s: Replication
Labels:
None

Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

What follows is a description of a deadlock cycle observed while running maintenance_non-blocking.js. In summary, the error is that one cannot wait on the replication executor while holding the bgsync mutex.

The oplog application thread periodically calls tryToGoLiveAsSecondary(), which acquires the global lock in shared (S) mode, and then calls getMaintenanceMode() on the replication coordinator, which schedules and waits for a callback on the replication executor.

The oplog producer thread locks the bgsync mutex (BackgroundSync::_mutex), and then tries to acquire the global intent exclusive (IX) lock, blocking behind the oplog application thread.

A third thread runs setMaintenanceMode, which blocks in the replication executor trying to clear the sync source in the producer thread, which requires the bgsync mutex.

So, the executor is blocked in the setMaintenanceModeHelper waiting for the bgsync mutex, but the bgsync mutex is held by the oplog producer, which is waiting for the global lock in IX mode which is blocked by the oplog application thread, which holds the global lock in S mode and is waiting for a callback to run through the replication executor.

Assignee:: Andy Schwerin (Inactive)
Reporter:: Andy Schwerin (Inactive)
Participants:: Andy Schwerin, Githook User
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Oct 20 2014 09:54:47 PM UTC
Updated:: Jul 11 2016 05:16:37 PM UTC
Resolved:: Oct 21 2014 06:06:02 PM UTC

Details

Description

Attachments

Activity

People

Dates