[SERVER-28181] Deadlock involving the mutexes of oplog fetcher and replication coordinator Created: 03/Mar/17  Updated: 07/Sep/17  Resolved: 22/Mar/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.4.4, 3.5.5

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Siyuan Zhou
Resolution: Done Votes: 0
Labels: bkp
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-27120 Increase synchronization between prod... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Sprint: Repl 2017-03-27
Participants:
Linked BF Score: 0

 Description   

Replication coordinator stops the bgsync, which stops the running oplog fetcher, if there's a running oplog fetcher. Oplog fetcher needs the current term and the last committed optime to make new requests. As a result, they create an deadlock.

  • Replication coordinator, while holding replCoord's mutex, waits on oplog fetcher's mutex to stop it.
  • Oplog fetcher, while holding its mutex, waits on replCoord's mutex to get the current term and the last committed optime.

To fix this, we need move the current term and last committed optime out of oplog fetcher's mutex.



 Comments   
Comment by Githook User [ 04/Apr/17 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

Message: SERVER-28181 Fix the deadlock in OplogFetcher's constructor.

(cherry picked from commit 7361e94af31142ec018ccad70b0398e3a472eba5)
Branch: v3.4
https://github.com/mongodb/mongo/commit/9818bae9a907ef224972470e461a590422149df8

Comment by Githook User [ 04/Apr/17 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

Message: SERVER-28181 Deadlock involving the mutexes of oplog fetcher and replication coordinator

(cherry picked from commit 231e760c744013fe68fe863a7d0315148c69047a)
Branch: v3.4
https://github.com/mongodb/mongo/commit/672583baac51af7c9e3e92c658d1880fd2b035b6

Comment by Githook User [ 22/Mar/17 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

Message: SERVER-28181 Fix the deadlock in OplogFetcher's constructor.
Branch: master
https://github.com/mongodb/mongo/commit/7361e94af31142ec018ccad70b0398e3a472eba5

Comment by Githook User [ 15/Mar/17 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

Message: SERVER-28181 Deadlock involving the mutexes of oplog fetcher and replication coordinator
Branch: master
https://github.com/mongodb/mongo/commit/231e760c744013fe68fe863a7d0315148c69047a

Comment by Siyuan Zhou [ 03/Mar/17 ]

This is found in judah.schvimer's patch build.

Generated at Thu Feb 08 04:17:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.