[SERVER-2645] Potential distributed lock forcing inconsistency Created: 01/Mar/11 Updated: 12/Jul/16 Resolved: 02/Mar/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Concurrency, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 1.9.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Greg Studer | Assignee: | Greg Studer |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
When a distributed lock is forced: from distlock.cpp> log() << "dist_lock forcefully taking over from: " << o << " elapsed minutes: " << elapsed << endl; the update takes into account only the name of the lock, not it's current state or the unique ts value associated with every active lock entry. If processes interleave such that: Process 0 crashes with lock Fix: Check for ts value and state conn->update( _ns , BSON( "_id" << _id["_id"].String() << "state" << o["state"].numberInt() << "ts" << o["ts"] ), BSON( "$set" << BSON( "state" << 0 ) ) ); Difficult to reproduce without new test cases which modify timeout. |