[SERVER-27555] ReplicationCoordinatorImpl::_memberState is read without proper locking Created: 30/Dec/16  Updated: 05/Apr/17  Resolved: 09/Mar/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.5.5

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-18807 rollback should drop dblock when fetc... Closed
is related to SERVER-27892 Clarify locking rules for _canAcceptN... Closed
is related to SERVER-27911 Merge all stepdown code paths Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2017-03-06, Repl 2017-03-27
Participants:

 Description   

ReplicationCoordinatorImpl::canAcceptWritesFor and checkCanServeReadsFor both read _memberState and make decisions on it, but neither lock _mutex or _topoMutex.



 Comments   
Comment by Spencer Brody (Inactive) [ 09/Mar/17 ]

Even though the fix adds a potential mutex acquisition into canAcceptWritesFor and checkCanServerReadsFor, I don't believe it comes with a real performance cost. The mutex acquisition can only possibly happen if those methods are called on the oplog namespace. Normal writes on the primary and secondary do not ever call into canAcceptWritesFor() for the oplog write, so the only cost from acquiring the mutex on oplog writes in canAcceptWritesFor() is for user-initiated writes to the oplog. Reads done for replication will call checkCanServerReadsFor against the oplog namespace, but usually against a node that is in PRIMARY or SECONDARY states, and the fix does not lock the mutex if the node is primary or secondary. The only time I think this change results in a mutex acquisition that wasn't there previously is for oplog reads from a node that is in RECOVERING, which shouldn't pose a practical performance problem.

Comment by Githook User [ 09/Mar/17 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-27555 Use proper locking when reading _memberState
Branch: master
https://github.com/mongodb/mongo/commit/4e9d665b8c368f9de9d8bdd3e098197bffc3abd8

Comment by Eric Milkie [ 31/Dec/16 ]

Unfortunately, I don't believe that we can fix this by simply locking mutexes, as the perf impact will be too great. However, for canAcceptWritesFor(), we might be able to check ns.isOplog() first, and then lock the mutex and check _memberState if true.
Similarly for checkCanServeReadsFor().

Generated at Thu Feb 08 04:15:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.