Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-39187

Rerunning commitTransaction on a new mongos blocks forever

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • 4.1.7
    • 4.1.8
    • Sharding
    • None
    • Fully Compatible
    • ALL
    • Sharding 2019-02-11

    Description

      Rerunning commitTransaction, with the recoveryToken added in SERVER-37344, on a new mongos blocks forever. It also seems to get the cluster into a state where it cannot accept any writes (even to other databases) but the shard still reports itself as the primary. Also, both the shard server and config server do not shutdown normally and need to be killed with SIGKILL.

      To reproduce start a sharded cluster with at least two mongoses (my cluster a one config server and a one node shard). Run the repro script: reproHangingCommit.js

      $ mongo reproHangingCommit.js
      MongoDB shell version v4.0.1
      connecting to: mongodb://127.0.0.1:27017
      MongoDB server version: 4.1.7
      WARNING: shell and server versions do not match
      Starting transaction on mongos #1: {
      	"insert" : "test",
      	"documents" : [
      		{
      			"_id" : ObjectId("5c4a55e0542fbbcc137ad1cd")
      		}
      	],
      	"lsid" : {
      		"id" : UUID("6f579bae-6919-4e07-ac80-fe056861b2b9")
      	},
      	"txnNumber" : NumberLong(1),
      	"autocommit" : false,
      	"startTransaction" : true
      }
      Commit transaction on mongos #1: {
      	"commitTransaction" : 1,
      	"lsid" : {
      		"id" : UUID("6f579bae-6919-4e07-ac80-fe056861b2b9")
      	},
      	"txnNumber" : NumberLong(1),
      	"autocommit" : false,
      	"recoveryToken" : {
      		"shardId" : "demo-set-0"
      	}
      }
      Commit transaction on mongos #2: {
      	"commitTransaction" : 1,
      	"lsid" : {
      		"id" : UUID("6f579bae-6919-4e07-ac80-fe056861b2b9")
      	},
      	"txnNumber" : NumberLong(1),
      	"autocommit" : false,
      	"recoveryToken" : {
      		"shardId" : "demo-set-0"
      	}
      }
      // Hangs forever waiting for the commit on mongos #2
      

      db.currentOp() reports an ongoing coordinateCommitTransaction command that never ends. I've attached an example currentOp output at the bottom of the repro script.

      Attachments

        Issue Links

          Activity

            People

              matthew.saltz@mongodb.com Matthew Saltz (Inactive)
              shane.harvey@mongodb.com Shane Harvey
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: