Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2645

Potential distributed lock forcing inconsistency

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • 1.9.0
    • None
    • Concurrency, Sharding
    • None
    • ALL

    Description

      When a distributed lock is forced:

      from distlock.cpp>

      log() << "dist_lock forcefully taking over from: " << o << " elapsed minutes: " << elapsed << endl;
      conn->update( _ns , _id , BSON( "$set" << BSON( "state" << 0 ) ) );

      the update takes into account only the name of the lock, not it's current state or the unique ts value associated with every active lock entry. If processes interleave such that:

      Process 0 crashes with lock
      Process 1 detects forcing required
      Process 2 detects forcing required
      Process 1 forces Process 0 lock by name, creates and acquires new lock with same name
      Process 2 forces Process 1 lock by name, which is bad because Process 1 is still using that lock

      Fix: Check for ts value and state

      conn->update( _ns , BSON( "_id" << _id["_id"].String() << "state" << o["state"].numberInt() << "ts" << o["ts"] ), BSON( "$set" << BSON( "state" << 0 ) ) );

      Difficult to reproduce without new test cases which modify timeout.

      Attachments

        Activity

          People

            greg_10gen Greg Studer
            greg_10gen Greg Studer
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: