Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-39187

Rerunning commitTransaction on a new mongos blocks forever

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.1.7
    • Fix Version/s: 4.1.8
    • Component/s: Sharding
    • Labels:
      None

      Description

      Rerunning commitTransaction, with the recoveryToken added in SERVER-37344, on a new mongos blocks forever. It also seems to get the cluster into a state where it cannot accept any writes (even to other databases) but the shard still reports itself as the primary. Also, both the shard server and config server do not shutdown normally and need to be killed with SIGKILL.

      To reproduce start a sharded cluster with at least two mongoses (my cluster a one config server and a one node shard). Run the repro script: reproHangingCommit.js

      $ mongo reproHangingCommit.js
      MongoDB shell version v4.0.1
      connecting to: mongodb://127.0.0.1:27017
      MongoDB server version: 4.1.7
      WARNING: shell and server versions do not match
      Starting transaction on mongos #1: {
      	"insert" : "test",
      	"documents" : [
      		{
      			"_id" : ObjectId("5c4a55e0542fbbcc137ad1cd")
      		}
      	],
      	"lsid" : {
      		"id" : UUID("6f579bae-6919-4e07-ac80-fe056861b2b9")
      	},
      	"txnNumber" : NumberLong(1),
      	"autocommit" : false,
      	"startTransaction" : true
      }
      Commit transaction on mongos #1: {
      	"commitTransaction" : 1,
      	"lsid" : {
      		"id" : UUID("6f579bae-6919-4e07-ac80-fe056861b2b9")
      	},
      	"txnNumber" : NumberLong(1),
      	"autocommit" : false,
      	"recoveryToken" : {
      		"shardId" : "demo-set-0"
      	}
      }
      Commit transaction on mongos #2: {
      	"commitTransaction" : 1,
      	"lsid" : {
      		"id" : UUID("6f579bae-6919-4e07-ac80-fe056861b2b9")
      	},
      	"txnNumber" : NumberLong(1),
      	"autocommit" : false,
      	"recoveryToken" : {
      		"shardId" : "demo-set-0"
      	}
      }
      // Hangs forever waiting for the commit on mongos #2
      

      db.currentOp() reports an ongoing coordinateCommitTransaction command that never ends. I've attached an example currentOp output at the bottom of the repro script.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: