Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55676

[SBE][replica_sets] Tests trigger fatal assertion: NotPrimaryOrSecondary: Oplog collection reads are not allowed while in the rollback or startup state

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      Query Execution 2021-04-19

      Description

      In these tests either the ReplCoordExtern-1/2, BackgroundSync or ReplWriterWorker thread hits this fatal assertion. Most of them have this failure with a stack trace involving TransactionHistoryIterator:

      [ReplWriterWorker-0] "Fatal assertion","attr":{"msgid":31145,"error":"NotPrimaryOrSecondary: Oplog collection reads are not allowed while in the rollback or startup state.","file":"src/mongo/db/transaction_history_iterator.cpp","line":139}                               
      [ReplWriterWorker-0] "\n\n***aborting after fassert() failure\n\n"
      [ReplWriterWorker-0] "Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}
      [ReplWriterWorker-0] "BACKTRACE","attr":{"bt":{"backtrace":[{"a":"7F3C396B0572","b":"7F3C394BD000","o":"1F3572","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.630","s+":"212"},{"a":"7F3C396B1FA9","b":
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C396B0572","b":"7F3C394BD000","o":"1F3572","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.630","s+":"212"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C396B1FA9","b":"7F3C394BD000","o":"1F4FA9","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C396AD4DF","b":"7F3C394BD000","o":"1F04DF","s":"abruptQuit","s+":"6F"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C37ACBD80","b":"7F3C37AB9000","o":"12D80","s":"funlockfile","s+":"50"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C373AA93F","b":"7F3C37373000","o":"3793F","s":"gsignal","s+":"10F"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C37394C95","b":"7F3C37373000","o":"21C95","s":"abort","s+":"127"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C395DCCA2","b":"7F3C394BD000","o":"11FCA2","s":"_ZN5mongo35fassertFailedWithStatusWithLocationEiRKNS_6StatusEPKcj","s+":"178"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C351E652A","b":"7F3C351B6000","o":"3052A","s":"_ZN5mongo26TransactionHistoryIterator17nextFatalOnErrorsEPNS_16OperationContextE.cold.702","s+":"30"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C338609CA","b":"7F3C33816000","o":"4A9CA","s":"_ZN5mongo40_readTransactionOperationsFromOplogChainEPNS_16OperationContextERKNS_4repl10OplogEntryERKSt6vectorIPS3_SaIS7_EEb","s+":"16A"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C338617B2","b":"7F3C33816000","o":"4B7B2","s":"_ZN5mongo39readTransactionOperationsFromOplogChainEPNS_16OperationContextERKNS_4repl10OplogEntryERKSt6vectorIPS3_SaIS7_EE","s+":"32"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C338621D7","b":"7F3C33816000","o":"4C1D7","s":"_ZN5mongo22applyCommitTransactionEPNS_16OperationContextERKNS_4repl10OplogEntryENS2_16OplogApplication4ModeE","s+":"147"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C3384818C","b":"7F3C33816000","o":"3218C","s":"_ZNSt17_Function_handlerIFN5mongo6StatusEPNS0_16OperationContextERKNS0_4repl10OplogEntryENS4_16OplogApplication4ModeEENS4_12_GLOBAL__N_1UlS3_S7_S9_E22_EE9_M_invokeERKSt9
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C33854751","b":"7F3C33816000","o":"3E751","s":"_ZN5mongo4repl19applyCommand_inlockEPNS_16OperationContextERKNS0_10OplogEntryENS0_16OplogApplication4ModeE","s+":"B51"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C355AB360","b":"7F3C35581000","o":"2A360","s":"_ZN5mongo4repl17OplogApplierUtils37applyOplogEntryOrGroupedInsertsCommonEPNS_16OperationContextERKNS0_26OplogEntryOrGroupedInsertsENS0_16OplogApplication4ModeESt8functio
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C3559DD0A","b":"7F3C35581000","o":"1CD0A","s":"_ZN5mongo4repl31applyOplogEntryOrGroupedInsertsEPNS_16OperationContextERKNS0_26OplogEntryOrGroupedInsertsENS0_16OplogApplication4ModeE","s+":"BA"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C355A07EC","b":"7F3C35581000","o":"1F7EC","s":"_ZNSt17_Function_handlerIFN5mongo6StatusEPNS0_16OperationContextERKNS0_4repl26OplogEntryOrGroupedInsertsENS4_16OplogApplication4ModeEEPSA_E9_M_invokeERKSt9_Any_dataOS3_S
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C355A9CBE","b":"7F3C35581000","o":"28CBE","s":"_ZN5mongo4repl17OplogApplierUtils21applyOplogBatchCommonEPNS_16OperationContextEPSt6vectorIPKNS0_10OplogEntryESaIS7_EENS0_16OplogApplication4ModeEbSt8functionIFNS_6Statu
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C3559D8E6","b":"7F3C35581000","o":"1C8E6","s":"_ZN5mongo4repl16OplogApplierImpl24applyOplogBatchPerWorkerEPNS_16OperationContextEPSt6vectorIPKNS0_10OplogEntryESaIS7_EEPS4_INS_16MultikeyPathInfoESaISB_EE","s+":"F6"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C3559C2B5","b":"7F3C35581000","o":"1B2B5","s":"_ZZN5mongo15unique_functionIFvNS_6StatusEEE8makeImplIZNS_4repl16OplogApplierImpl16_applyOplogBatchEPNS_16OperationContextESt6vectorINS5_10OplogEntryESaISA_EEEUlT_E3_EEDa
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C314C6AA2","b":"7F3C314B5000","o":"11AA2","s":"_ZN5mongo10ThreadPool4Impl10_doOneTaskEPSt11unique_lockINS_12latch_detail5LatchEE","s+":"122"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C314C6D43","b":"7F3C314B5000","o":"11D43","s":"_ZN5mongo10ThreadPool4Impl13_consumeTasksEv","s+":"83"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C314C7835","b":"7F3C314B5000","o":"12835","s":"_ZN5mongo10ThreadPool4Impl17_workerThreadBodyERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE","s+":"B5"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C314C7B50","b":"7F3C314B5000","o":"12B50","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_10ThreadPool4Impl25_startWorkerThread_inlockEvEUlvE2_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C328F63CF","b":"7F3C328E9000","o":"D3CF","s":"execute_native_thread_routine","s+":"F"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C37AC12DE","b":"7F3C37AB9000","o":"82DE","s":"start_thread","s+":"FE"}}
      [ReplWriterWorker-0] "Frame","attr":{"frame":{"a":"7F3C3746FA63","b":"7F3C37373000","o":"FCA63","s":"clone","s+":"43"}}
      

      Sometimes it happens during background sync while rolling back to recover to a checkpoint, or during initial sync, or when oplog batches are being applied.

      The full list of tests failing:

      • initial_sync_test_fixture_test.js
      • prepare_failover_rollback_commit.js
      • initial_sync_replicates_prepare_received_during_another_initial_sync.js
      • rollback_recovery_commit_transaction_before_stable_timestamp.js
      • commit_transaction_initial_sync_data_already_applied.js
      • initial_sync_fetch_from_oldest_active_transaction_timestamp.js
      • reconstruct_prepared_transactions_initial_sync_index_build.js
      • rollback_reconstructs_transactions_prepared_before_stable.js
      • initial_sync_fetch_from_oldest_active_transaction_timestamp_no_oplog_application.js
      • initial_sync_reset_oldest_timestamp_after_failed_attempt.js
      • initial_sync_commit_prepared_transaction.js
      • recover_committed_aborted_prepared_transactions.js
      • initial_sync_preserves_active_txns.js
      • reconstruct_prepared_transactions_initial_sync_no_oplog_application.js
      • reconstruct_prepared_transactions_initial_sync_change_oldest_active_txn_timestamp.js
      • reconstruct_prepared_transactions_initial_sync.js
      • reconstruct_prepared_transactions_initial_sync_on_oplog_seed.js
      • initial_sync_update_missing_doc_with_prepare.js
      • recover_prepared_txn_with_multikey_write_initial_sync.js

      Tests observed failing with the same symptom in concurrency passthrough suites:

      • jstests/concurrency/fsm_workloads/multi_statement_transaction_atomicity_isolation_repeated_reads.js in concurrency_sharded_multi_stmt_txn_kill_primary (task)
      • jstests/concurrency/fsm_workloads/update_check_index.js in concurrency_sharded_multi_stmt_txn_kill_primary and concurrency_sharded_multi_stmt_txn_with_stepdowns (task, task)
      • jstests/concurrency/fsm_workloads/update_multifield_multiupdate.js in concurrency_sharded_multi_stmt_txn_with_stepdowns (task)
      • jstests/concurrency/fsm_workloads/multi_statement_transaction_simple_majority_writes.js in concurrency_sharded_multi_stmt_txn_with_stepdowns (task)
      • jstests/concurrency/fsm_workloads/multi_statement_transaction_kill_sessions_atomicity_isolation.js in¬†concurrency_sharded_with_stepdowns (task)

      The same asseer was also observed in initial_sync_fuzzer (task).

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              mindaugas.malinauskas Mindaugas Malinauskas
              Reporter:
              kyle.suarez Kyle Suarez
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: