[SERVER-2645] Potential distributed lock forcing inconsistency Created: 01/Mar/11  Updated: 12/Jul/16  Resolved: 02/Mar/11

Status: Closed
Project: Core Server
Component/s: Concurrency, Sharding
Affects Version/s: None
Fix Version/s: 1.9.0

Type: Bug Priority: Major - P3
Reporter: Greg Studer Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

When a distributed lock is forced:

from distlock.cpp>

log() << "dist_lock forcefully taking over from: " << o << " elapsed minutes: " << elapsed << endl;
conn->update( _ns , _id , BSON( "$set" << BSON( "state" << 0 ) ) );

the update takes into account only the name of the lock, not it's current state or the unique ts value associated with every active lock entry. If processes interleave such that:

Process 0 crashes with lock
Process 1 detects forcing required
Process 2 detects forcing required
Process 1 forces Process 0 lock by name, creates and acquires new lock with same name
Process 2 forces Process 1 lock by name, which is bad because Process 1 is still using that lock

Fix: Check for ts value and state

conn->update( _ns , BSON( "_id" << _id["_id"].String() << "state" << o["state"].numberInt() << "ts" << o["ts"] ), BSON( "$set" << BSON( "state" << 0 ) ) );

Difficult to reproduce without new test cases which modify timeout.


Generated at Thu Feb 08 03:00:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.