Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
None
-
ALL
Description
When a distributed lock is forced:
from distlock.cpp>
log() << "dist_lock forcefully taking over from: " << o << " elapsed minutes: " << elapsed << endl;
conn->update( _ns , _id , BSON( "$set" << BSON( "state" << 0 ) ) );
the update takes into account only the name of the lock, not it's current state or the unique ts value associated with every active lock entry. If processes interleave such that:
Process 0 crashes with lock
Process 1 detects forcing required
Process 2 detects forcing required
Process 1 forces Process 0 lock by name, creates and acquires new lock with same name
Process 2 forces Process 1 lock by name, which is bad because Process 1 is still using that lock
Fix: Check for ts value and state
conn->update( _ns , BSON( "_id" << _id["_id"].String() << "state" << o["state"].numberInt() << "ts" << o["ts"] ), BSON( "$set" << BSON( "state" << 0 ) ) );
Difficult to reproduce without new test cases which modify timeout.