Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40596

Test that change streams and retryable writes fail gracefully when reading oplog entries from >= 2 versions ago

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Query Execution

      See some discussion below. We should add some testing around a scenario where a user quickly upgrades two major versions but some of our machinery such as change streams or Retryable writes still needs to examine older oplog entries from 2 versions ago.

      Original Description.

      sharded_txn_downgrade_cluster.js expects retryable writes read oplog entries from higher FCV after downgrade. When I tried to remove the "h" field of oplog format in FCV 4.2, the test failed with the following error.

      assert: command failed: {
      	"n" : 0,
      	"writeErrors" : [
      		{
      			"index" : 0,
      			"code" : 40414,
      			"codeName" : "Location40414",
      			"errmsg" : "BSON field 'OplogEntryBase.h' is missing but a required field"
      		}
      	],
      	"ok" : 1,
      	"operationTime" : Timestamp(1553956389, 1),
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1553956423, 1),
      		"signature" : {
      			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      			"keyId" : NumberLong(0)
      		}
      	}
      }
      _getErrorWithCode@src/mongo/shell/utils.js:25:13
      doassert@src/mongo/shell/assert.js:18:14
      _assertCommandWorked@src/mongo/shell/assert.js:584:17
      assert.commandWorked@src/mongo/shell/assert.js:674:16
      assertMultiShardRetryableWriteCanBeRetried@jstests/multiVersion/libs/sharded_txn_upgrade_downgrade_cluster_shared.js:93:5
      @jstests/multiVersion/sharded_txn_downgrade_cluster.js:76:5
      @jstests/multiVersion/sharded_txn_downgrade_cluster.js:12:2
      

      The parsing of oplog entry in TransactionHistoryIterator is done using IDL parser, which failed on the missing "h" field. This implies we cannot introduce any oplog format change that fails the old IDL parsing. It's less of a problem if we want to remove a field, because we could keep the field as optional for a release and remove it in the following release, assuming users don't upgrade / downgrade across multiple releases very soon. However, if we add new fields to the oplog entries used by retryable writes, this will be a problem.

      It would be better to reserve the possibility to add new fields to oplog format. There are 3 options.
      1. Make OplogEntryBase IDL not strict.
      2. Add the ability to parse and IDL in a non-strict way.
      3. Disallow retryable writes across FCV boundary.

      Change stream may have the same issue unless it parses the oplog entries manually. CC charlie.swanson.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            siyuan.zhou@mongodb.com Siyuan Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: