Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65633

withTxnAndAutoRetry() retries entire transaction when commitTransaction fails with FailedToSatisfyReadPreference

    • ALL
    • Sharding 2022-05-02
    • 3

      The changes from 50caf49 as part of SERVER-62377 added the following logic to the withTxnAndAutoRetry() helper function. Setting hasTransientError causes a new transaction to be started. However, FailedToSatisfyReadPreference doesn't guarantee the commitTransaction command wasn't sent because mongos may have retried the commitTransaction command internally due a retryable error and then encountered the FailedToSatisfyReadPreference error.

      // FailedToSatisfyReadPreference errors are not retryable.
      // However, they should be because if there is no primary, there should be one soon.
      // TODO SERVER-60706: Make FailedToSatisfyReadPreference a transient error
      if (e.code == ErrorCodes.FailedToSatisfyReadPreference) {
          print("Retrying transaction due to a FailedToSatisfyReadPreference error.");
          hasTransientError = true;
          continue;
      }
      

      The withTxnAndAutoRetry() helper function must retry only the commitTransaction command to learn the definitive result of the transaction. Upon seeing an error labeled with TransientTransactionError may the withTxnAndAutoRetry() helper function retry the entire transaction. Retrying the commitTransaction command on FailedToSatisfyReadPreference is perhaps best done inside commitTransactionWithKilledSessionRetries().


      This has been observed to cause a spurious duplicate key error due to the mongo shell when running the create_and_drop_collection.js FSM workload in the concurrency_sharded_multi_stmt_txn_with_stepdowns Evergreen task.

      [j0:s0] | 2022-03-25T16:07:09.162+00:00 D3 EXECUTOR 22607   [conn1685] "Scheduling remote command request","attr":{"request":"RemoteCommand 25962 -- target:[localhost:20003] db:test18_fsmdb0 cmd:{ insert: \"fsmcoll0\", bypassDocumentValidation: false, ordered: true, documents: [ { _id: 14.0 } ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000'), Timestamp(0, 0) ], databaseVersion: { uuid: UUID(\"f2af1ae1-5f1a-41eb-b2ab-40f6eb21b128\"), timestamp: Timestamp(1648224429, 9), lastMod: 1 }, lsid: { id: UUID(\"f5ff2e85-d579-4529-919c-675912543218\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 33, readConcern: { level: \"snapshot\", atClusterTime: Timestamp(1648224429, 9) }, startTransaction: true, coordinator: true, autocommit: false }"}
      [j0:s0] | 2022-03-25T16:07:09.181+00:00 D3 EXECUTOR 22607   [conn1685] "Scheduling remote command request","attr":{"request":"RemoteCommand 25964 -- target:[localhost:20003] db:admin cmd:{ commitTransaction: 1, writeConcern: { w: \"majority\", wtimeout: 300321, provenance: \"clientSupplied\" }, coordinator: true, autocommit: false, txnNumber: 33, lsid: { id: UUID(\"f5ff2e85-d579-4529-919c-675912543218\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) } }"}
      ...
      [j0:s0] | 2022-03-25T16:07:09.194+00:00 D2 ASIO     22597   [conn1685] "Request finished with response","attr":{"requestId":25964,"isOK":true,"response":"{ readOnly: false, writeConcernError: { code: 11602, codeName: \"InterruptedDueToReplStateChange\", errmsg: \"operation was interrupted\", errInfo: { writeConcern: { w: \"majority\", wtimeout: 300321, provenance: \"clientSupplied\" } } }, ok: 1.0, topologyVersion: { processId: ObjectId('623de6b9393bc43269e4007c'), counter: 95 }, lastCommittedOpTime: Timestamp(1648224429, 11), $clusterTime: { clusterTime: Timestamp(1648224429, 13), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $configTime: Timestamp(1648224429, 10), $topologyTime: Timestamp(1648223955, 17), operationTime: Timestamp(1648224429, 12) }"}
      ...
      [j0:s0] | 2022-03-25T16:07:24.195+00:00 D1 ASSERT   23074   [conn1685] "User assertion","attr":{"error":"FailedToSatisfyReadPreference: Could not find host matching read preference { mode: \"primary\" } for set shard-rs0","file":"src/mongo/s/transaction_router.cpp","line":207}
      [j0:s0] | 2022-03-25T16:07:24.195+00:00 D3 TXN      22897   [conn1685] "Implicitly aborting transaction on all participant shards","attr":{"sessionId":{"id":{"$uuid":"f5ff2e85-d579-4529-919c-675912543218"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":33,"txnRetryCounter":0,"numParticipantShards":1,"error":"FailedToSatisfyReadPreference: Could not find host matching read preference { mode: \"primary\" } for set shard-rs0"}
      ...
      [j0:s0] | 2022-03-25T16:07:31.747+00:00 D3 EXECUTOR 22607   [conn1685] "Scheduling remote command request","attr":{"request":"RemoteCommand 26007 -- target:[localhost:20004] db:test18_fsmdb0 cmd:{ insert: \"fsmcoll0\", bypassDocumentValidation: false, ordered: true, documents: [ { _id: 14.0 } ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000'), Timestamp(0, 0) ], databaseVersion: { uuid: UUID(\"f2af1ae1-5f1a-41eb-b2ab-40f6eb21b128\"), timestamp: Timestamp(1648224429, 9), lastMod: 1 }, lsid: { id: UUID(\"f5ff2e85-d579-4529-919c-675912543218\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 34, readConcern: { level: \"snapshot\", atClusterTime: Timestamp(1648224451, 2) }, startTransaction: true, coordinator: true, autocommit: false }"}
      [fsm_workload_test:create_and_drop_collection] [tid:0] Retrying transaction due to a FailedToSatisfyReadPreference error.
      ...
      [fsm_workload_test:create_and_drop_collection] uncaught exception: Error: 1 thread with tids [0] threw
      [fsm_workload_test:create_and_drop_collection]         	"errmsg" : "E11000 duplicate key error collection: test18_fsmdb0.fsmcoll0 index: _id_ dup key: { _id: 14.0 } found value: RecordId(1)",
      [fsm_workload_test:create_and_drop_collection]         	"op" : {
      [fsm_workload_test:create_and_drop_collection]         		"_id" : 14
      [fsm_workload_test:create_and_drop_collection]         	}
      

      https://logkeeper.mongodb.org/lobster/build/bdd28368e04d3a19371ebfc49c1dbc1d/test/623de8a2be07c479035d686c#bookmarks=0%2C8060%2C8124%2C8167%2C10092%2C10093%2C11685%2C11696%2C15388%2C15394%2C15395%2C15396%2C15397%2C16446&f~=100~Scheduling%20remote%20command%20request.%2A%28insert%7CcommitTransaction%29&l=1

            Assignee:
            andrew.witten@mongodb.com Andrew Witten (Inactive)
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: