Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40596

Test that change streams and retryable writes fail gracefully when reading oplog entries from >= 2 versions ago

    XMLWordPrintableJSON

Details

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • None
    • Replication
    • None
    • Query Execution

    Description

      See some discussion below. We should add some testing around a scenario where a user quickly upgrades two major versions but some of our machinery such as change streams or Retryable writes still needs to examine older oplog entries from 2 versions ago.

      Original Description.

      sharded_txn_downgrade_cluster.js expects retryable writes read oplog entries from higher FCV after downgrade. When I tried to remove the "h" field of oplog format in FCV 4.2, the test failed with the following error.

      assert: command failed: {
      	"n" : 0,
      	"writeErrors" : [
      		{
      			"index" : 0,
      			"code" : 40414,
      			"codeName" : "Location40414",
      			"errmsg" : "BSON field 'OplogEntryBase.h' is missing but a required field"
      		}
      	],
      	"ok" : 1,
      	"operationTime" : Timestamp(1553956389, 1),
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1553956423, 1),
      		"signature" : {
      			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      			"keyId" : NumberLong(0)
      		}
      	}
      }
      _getErrorWithCode@src/mongo/shell/utils.js:25:13
      doassert@src/mongo/shell/assert.js:18:14
      _assertCommandWorked@src/mongo/shell/assert.js:584:17
      assert.commandWorked@src/mongo/shell/assert.js:674:16
      assertMultiShardRetryableWriteCanBeRetried@jstests/multiVersion/libs/sharded_txn_upgrade_downgrade_cluster_shared.js:93:5
      @jstests/multiVersion/sharded_txn_downgrade_cluster.js:76:5
      @jstests/multiVersion/sharded_txn_downgrade_cluster.js:12:2
      

      The parsing of oplog entry in TransactionHistoryIterator is done using IDL parser, which failed on the missing "h" field. This implies we cannot introduce any oplog format change that fails the old IDL parsing. It's less of a problem if we want to remove a field, because we could keep the field as optional for a release and remove it in the following release, assuming users don't upgrade / downgrade across multiple releases very soon. However, if we add new fields to the oplog entries used by retryable writes, this will be a problem.

      It would be better to reserve the possibility to add new fields to oplog format. There are 3 options.
      1. Make OplogEntryBase IDL not strict.
      2. Add the ability to parse and IDL in a non-strict way.
      3. Disallow retryable writes across FCV boundary.

      Change stream may have the same issue unless it parses the oplog entries manually. CC charlie.swanson.

      Attachments

        Activity

          People

            backlog-query-execution Backlog - Query Execution
            siyuan.zhou@mongodb.com Siyuan Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: