Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10494

Deadlock holding global write lock when mongod started without --fork

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker - P1 Blocker - P1
    • None
    • 2.5.1
    • Concurrency
    • None
    • Amazon Linux 64-bit, Jenkins CI slaves.
    • Linux
    • Hide

      Start a 3-node replica set with mongod version 2.5.1 using our rs_manager tool. rs_manager starts mongods as subprocesses, calls replSetInitiate, and then it exits. If you pass --fork to the subprocesses the hang does not occur.

      Run PyMongo's unittest suite against the replica set. The Python and PyMongo version don't matter.

      Show
      Start a 3-node replica set with mongod version 2.5.1 using our rs_manager tool. rs_manager starts mongods as subprocesses, calls replSetInitiate , and then it exits. If you pass --fork to the subprocesses the hang does not occur. Run PyMongo's unittest suite against the replica set. The Python and PyMongo version don't matter.

    Description

      Running the PyMongo unittest suite leads to a hang with some op hold the global write lock. The server responds to currentOp and rs.status but any operation requiring a lock hangs indefinitely.

      The point in the suite where we reach this hang is unpredictable, but most commonly in tests with 10s or 100s of Python threads.

      Server 2.5.0 never hangs, 2.5.1 almost always hangs at some point in PyMongo's test run. A successful run lasts about 3 minutes and executes about 515 tests. Interestingly, 2.5.1 doesn't hang when started with --fork.

      Example ops holding the write lock at the point where 2.5.1 hangs. In one test run:

      		{
      			"opid" : 30145,
      			"active" : true,
      			"secs_running" : 265,
      			"op" : "query",
      			"ns" : "pymongo_test",
      			"query" : {
      				"create" : "test",
      				"capped" : true,
      				"size" : 1000
      			},
      			"client" : "127.0.0.1:57030",
      			"desc" : "conn1509",
      			"threadId" : "0x7f5f76c81700",
      			"connectionId" : 1509,
      			"locks" : {
      				"^" : "w",
      				"^pymongo_test" : "W"
      			},
      			"waitingForLock" : false,
      			"numYields" : 0,
      			"lockStats" : {
      				"timeLockedMicros" : {
       
      				},
      				"timeAcquiringMicros" : {
      					"r" : NumberLong(0),
      					"w" : NumberLong(3)
      				}
      			}
      		}

      In another:

      		{
      			"opid" : 35786,
      			"active" : true,
      			"secs_running" : 377,
      			"op" : "insert",
      			"ns" : "pymongo-pooling-tests.unique",
      			"insert" : {
       
      			},
      			"client" : "127.0.0.1:54746",
      			"desc" : "conn497",
      			"threadId" : "0x7f9612829700",
      			"connectionId" : 497,
      			"locks" : {
      				"^" : "w",
      				"^pymongo-pooling-tests" : "W"
      			},
      			"waitingForLock" : false,
      			"msg" : "index: (1/3) external sort",
      			"numYields" : 0,
      			"lockStats" : {
      				"timeLockedMicros" : {
       
      				},
      				"timeAcquiringMicros" : {
      					"r" : NumberLong(0),
      					"w" : NumberLong(3)
      				}
      			}
      		}

      Backtraces from the latter run:

      https://gist.github.com/ajdavis/275df7f2967ba63bb9ea

      Attachments

        1. repro-0-backtrace.txt
          122 kB
        2. repro-0-currentOp.txt
          2 kB
        3. repro-0-pymongo-tests.txt
          28 kB
        4. repro-1-backtrace.txt
          35 kB
        5. repro-1-currentOp.txt
          2 kB
        6. repro-1-pymongo-tests.txt
          28 kB

        Activity

          People

            Unassigned Unassigned
            jesse@mongodb.com A. Jesse Jiryu Davis
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: