Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-39187

Rerunning commitTransaction on a new mongos blocks forever

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.1.8
    • Affects Version/s: 4.1.7
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL
    • Sharding 2019-02-11

      Rerunning commitTransaction, with the recoveryToken added in SERVER-37344, on a new mongos blocks forever. It also seems to get the cluster into a state where it cannot accept any writes (even to other databases) but the shard still reports itself as the primary. Also, both the shard server and config server do not shutdown normally and need to be killed with SIGKILL.

      To reproduce start a sharded cluster with at least two mongoses (my cluster a one config server and a one node shard). Run the repro script: reproHangingCommit.js

      $ mongo reproHangingCommit.js
      MongoDB shell version v4.0.1
      connecting to: mongodb://127.0.0.1:27017
      MongoDB server version: 4.1.7
      WARNING: shell and server versions do not match
      Starting transaction on mongos #1: {
      	"insert" : "test",
      	"documents" : [
      		{
      			"_id" : ObjectId("5c4a55e0542fbbcc137ad1cd")
      		}
      	],
      	"lsid" : {
      		"id" : UUID("6f579bae-6919-4e07-ac80-fe056861b2b9")
      	},
      	"txnNumber" : NumberLong(1),
      	"autocommit" : false,
      	"startTransaction" : true
      }
      Commit transaction on mongos #1: {
      	"commitTransaction" : 1,
      	"lsid" : {
      		"id" : UUID("6f579bae-6919-4e07-ac80-fe056861b2b9")
      	},
      	"txnNumber" : NumberLong(1),
      	"autocommit" : false,
      	"recoveryToken" : {
      		"shardId" : "demo-set-0"
      	}
      }
      Commit transaction on mongos #2: {
      	"commitTransaction" : 1,
      	"lsid" : {
      		"id" : UUID("6f579bae-6919-4e07-ac80-fe056861b2b9")
      	},
      	"txnNumber" : NumberLong(1),
      	"autocommit" : false,
      	"recoveryToken" : {
      		"shardId" : "demo-set-0"
      	}
      }
      // Hangs forever waiting for the commit on mongos #2
      

      db.currentOp() reports an ongoing coordinateCommitTransaction command that never ends. I've attached an example currentOp output at the bottom of the repro script.

        1. reproHangingCommit.js
          4 kB
          Shane Harvey

            Assignee:
            matthew.saltz@mongodb.com Matthew Saltz (Inactive)
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: