[SERVER-53298] Tests that call "getMore" can fail in replica_sets_multi_stmt_txn_jscore_passthrough suite Created: 09/Dec/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor - P4
Reporter: A. Jesse Jiryu Davis Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Participants:

 Description   

Although most (all?) of the other multi_stmt_txn passthrough suites exclude tests tagged "requires_getmore", the replica_sets_multi_stmt_txn_jscore_passthrough suite does not, and we must manually exclude all tests that use the "getMore" command from it. If we do not, the test errors with: "assert: [getMore] != [getMore] are equal", here:

https://github.com/mongodb/mongo/blob/783e113bbb1bfa83630222de5b74fe95530692f0/jstests/libs/override_methods/network_error_and_txn_override.js#L643-L642

Consider excluding all requires_getmore tests from this suite, and see if there are other multi_stmt_txn passthrough suites which should also exclude them.



 Comments   
Comment by A. Jesse Jiryu Davis [ 10/Dec/20 ]

Thanks, that sounds like a possible solution. For now, I'm leaving this ticket on the backlog, and I'll manually exclude list_indexes_invalidation.js from replica_sets_multi_stmt_txn_jscore_passthrough.yml as part of SERVER-52545. Whoever resolves this ticket should consider un-excluding it.

Comment by Max Hirschhorn [ 09/Dec/20 ]

The listIndexes command cannot be run in a multi-document transaction so it'll be run outside of one. I think the issue is that the network_error_and_txn_override.js override doesn't know the cursor the getMore command is being run on came from outside a multi-document transaction. (A contradiction of the third bullet point in my earlier comment.)

I see the changes from b3c2613 as part of SERVER-41183 made it possible to run getMore on change stream aggregation cursors outside of the multi-document transaction. Perhaps we could add an exemption for listIndexes, listCollections, listDatabases, etc. too and avoid excluding them from the replica_sets_multi_stmt_txn_jscore_passthrough.yml test suite when they specify a batchSize?

Comment by A. Jesse Jiryu Davis [ 09/Dec/20 ]

Sorry, I should've mentioned that it's the list_indexes_invalidation.js test I'm having trouble with. I noticed in the course of SERVER-52545 that that test incorrectly calls {listIndexes: 1, batchSize: 2}, and hence it does a one-shot query and doesn't call getMore. Once I fixed it to call {listIndexes: 1, cursor: {batchSize: 2}}, it executed a getMore, and triggered the error "assert: [getMore] != [getMore] are equal".

Would another means of fixing this be to call an additional dummy command after getMore, just to make sure it isn't the transaction's final command?

Comment by Max Hirschhorn [ 09/Dec/20 ]

jesse, my recollection is that it should be safe to run getMore commands in the replica_sets_multi_stmt_txn_jscore_passthrough.yml test suite because we never expect a transient transaction error or network error to occur. It sounds like you're seeing failures when trying to run some tests that do getMores under that suite. I think it could be helpful to post some explicit steps to reproduce. There's ~30 tests which are tagged as requires_getmore that currently run (and pass) in the replica_sets_multi_stmt_txn_jscore_passthrough.yml test suite.

https://github.com/mongodb/mongo/blob/783e113bbb1bfa83630222de5b74fe95530692f0/jstests/libs/override_methods/network_error_and_txn_override.js#L643-L642

637
if (commandSupportsTransaction && !isSystemDotProfile &&
638
    driverSession.getSessionId() !== null && !isCommandNonTxnGetMore(cmdName, cmdObj)) {
639
    if (isNested()) {
640
        // Nested commands should never start a new transaction.
641
    } else if (ops.length === 0) {
642
        // We should never end a transaction on a getMore.
643
        assert.neq(cmdName, "getMore", cmdObj);
644
        startNewTransaction(conn, cmdObj);
645
    } else if (cmdName === "getMore") {
646
        // If the command is a getMore, we cannot consider ending the transaction.
647
    } else if (ops.length >= maxOpsInTransaction) {
648
        logMsgFull('setupTransactionCommand',
649
                    `Committing transaction ${txnOptions.txnNumber} on session` +
650
                        ` ${tojsononeline(lsid)} because we have hit max ops length`);
651
        commitTransaction(conn, lsid, txnOptions.txnNumber);
652
        startNewTransaction(conn, cmdObj);
653
    }
654
    continueTransaction(conn, dbName, cmdName, cmdObj);

I also think the assertion is trying to express

  • we cannot be attempting to start a new transaction from running the getMore command because getMore requires the cursor created in the active transaction
  • it should be an impossible situation because we would never call commitTransaction() before running the getMore command
  • it is furthermore impossible to be running a getMore command without having earlier cursor-generating command (e.g. find, aggregate) in the transaction already (ops.length === 0)
Generated at Thu Feb 08 05:30:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.