[SERVER-31534] `SessionsCollectionRS` acquires locks unnecessarily Created: 12/Oct/17  Updated: 30/Oct/23  Resolved: 18/Oct/17

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 3.6.0-rc0
Fix Version/s: 3.6.0-rc1

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Samantha Ritter (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Platforms 2017-10-23
Participants:
Linked BF Score: 0

 Description   

All usages of dispatch in the sessions collection management utility acquire database/collection locks. My suspicion is that these locks are taken for replica set primary check further down.

Acquiring these locks has the potential of causing the code executed by DBDirectClient further down to deadlock if it happens to cause lock mode upgrade.

All of these lock usages other than findRemovedSessions already issue write commands which will fail if the node is not primary, so this is not a problem. For findRemovedSessions, the read will fail because it doesn't have the slaveOk bit set, so this is not a problem either.

Based on this I think it is safe to remove the lock acquisitions from dispatch altogether, which would simplify the code.



 Comments   
Comment by Githook User [ 18/Oct/17 ]

Author:

{'email': 'samantha.ritter@10gen.com', 'name': 'samantharitter', 'username': 'samantharitter'}

Message: SERVER-31534 Do not call callbacks under locks in SessionsCollectionRS
Branch: master
https://github.com/mongodb/mongo/commit/44c55f0c94a5e6229c3d2eeabe1708222c3173e4

Comment by Samantha Ritter (Inactive) [ 16/Oct/17 ]

I think the changes Kal proposes are fine. For findRemovedSessions, the read won't fail, because we always allow dbdirect client to run find on secondaries (here). So, we may do a stale read because our server transitions from primary to secondary under the call to dispatch. However, this is ok, because findRemovedSessions is only called in two places:

From the logical session cache background job:

  • stale positives do nothing
  • stale negatives could cause us to kill a cursor local to this server when we wouldn't otherwise, but this seems ok, because it still means the session went 30 minutes without being used (and was then immediately used on the new primary, before we saw the write).

From the transaction reaper:

  • stale positives do nothing
  • stale negatives also do nothing, because the write that the reaper runs immediately after calling findRemovedSessions will fail with a NotMaster error.
Generated at Thu Feb 08 04:27:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.