Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59775

ReshardingDonorOplogIterator triggers an fassert() when it continues to run in member state SECONDARY following a stepdown

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • 5.0.0
    • 5.0.4, 5.1.0-rc0
    • Sharding
    • None
    • Fully Compatible
    • ALL
    • v5.0
    • Sharding 2021-09-06, Sharding 2021-09-20
    • 1

    Description

      The design for PrimaryOnlyService has the cancellation token for the Instances canceled on stepdown and their task executor shut down. However, a currently running task can continue running (briefly) in member state SECONDARY. ReshardingDonorOplogIterator reads from the oplog buffer collection locally using the default RecoveryUnit::ReadSource of kNoTimestamp. This leads to the node hitting this fassert() in AutoGetCollectionForReadBase.

      Moreover, ReshardingDonorOplogIterator depends on being guaranteed to read the write committed by the ReshardingOplogFetcher thread after being notified via awaitInsert(). This means RecoveryUnit::ReadSource::kNoOverlap isn't a suitable alternative. Instead, we'll have ReshardingDonorOplogIterator use ShouldNotConflictWithSecondaryBatchApplicationBlock.


      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.607+00:00 I  REPL     21358   [ReplCoord-1] "Replica set state transition","attr":{"newState":"SECONDARY","oldState":"PRIMARY"}
      ...
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.608+00:00 F  STORAGE  4728700 [ReshardingRecipientService-1] "Reading from replicated collection on a secondary without read timestamp or PBWM lock","attr":{"collection":"config.localReshardingOplogBuffer.cea06672-2ba3-4d95-8b23-a4cfc596f4df.shard1"}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.608+00:00 F  ASSERT   23089   [ReshardingRecipientService-1] "Fatal assertion","attr":{"msgid":4728700,"file":"src/mongo/db/db_raii.cpp","line":334}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.608+00:00 F  ASSERT   23090   [ReshardingRecipientService-1] "\n\n***aborting after fassert() failure\n\n"
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.608+00:00 F  CONTROL  4757800 [ReshardingRecipientService-1] "Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}
      ...
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:22.614+00:00 I  REPL     5123007 [ReplCoord-1] "Interrupting PrimaryOnlyService due to stepDown","attr":{"service":"ReshardingRecipientService","numInstances":1,"numOperationContexts":3}
      ...
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"557020E99237","b":"55700D006000","o":"13E93237","s":"_ZN5mongo25fassertFailedWithLocationEiPKcj","s+":"D7"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F13DA0B","b":"55700D006000","o":"12137A0B","s":"_ZN5mongo28AutoGetCollectionForReadBaseINS_25AutoGetCollectionLockFreeENS_32AutoGetCollectionForReadLockFree13EmplaceHelperEEC1EPNS_16OperationContextERKS3_b","s+":"15FB"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F13F8D2","b":"55700D006000","o":"121398D2","s":"_ZN5boost15optional_detail13optional_baseIN5mongo28AutoGetCollectionForReadBaseINS2_25AutoGetCollectionLockFreeENS2_32AutoGetCollectionForReadLockFree13EmplaceHelperEEEE9constructIJRPNS2_16OperationContextERS6_RbEEEvNS_11optional_ns15in_place_init_tEDpOT_","s+":"42"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F135399","b":"55700D006000","o":"1212F399","s":"_ZN5mongo12_GLOBAL__N_138acquireCollectionAndConsistentSnapshotIZNS_32AutoGetCollectionForReadLockFreeC1EPNS_16OperationContextERKNS_21NamespaceStringOrUUIDENS_25AutoGetCollectionViewModeENS_6Date_tEE3$_1ZNS2_C1ES4_S7_S8_S9_E3$_2ZNS2_C1ES4_S7_S8_S9_E3$_3EEDaS4_bRNS_24CollectionCatalogStasherET_T0_T1_","s+":"199"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F134DD4","b":"55700D006000","o":"1212EDD4","s":"_ZN5mongo32AutoGetCollectionForReadLockFreeC1EPNS_16OperationContextERKNS_21NamespaceStringOrUUIDENS_25AutoGetCollectionViewModeENS_6Date_tE","s+":"1F4"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F13ECFF","b":"55700D006000","o":"12138CFF","s":"_ZN5mongo35AutoGetCollectionForReadCommandBaseINS_32AutoGetCollectionForReadLockFreeEEC2EPNS_16OperationContextERKNS_21NamespaceStringOrUUIDENS_25AutoGetCollectionViewModeENS_6Date_tENS_16AutoStatsTracker7LogModeE","s+":"4F"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F1402EF","b":"55700D006000","o":"1213A2EF","s":"_ZN5boost15optional_detail13optional_baseIN5mongo39AutoGetCollectionForReadCommandLockFreeEE9constructIJRPNS2_16OperationContextERKNS2_21NamespaceStringOrUUIDERNS2_25AutoGetCollectionViewModeERNS2_6Date_tERNS2_16AutoStatsTracker7LogModeEEEEvNS_11optional_ns15in_place_init_tEDpOT_","s+":"4F"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F1368FB","b":"55700D006000","o":"121308FB","s":"_ZN5mongo44AutoGetCollectionForReadCommandMaybeLockFreeC2EPNS_16OperationContextERKNS_21NamespaceStringOrUUIDENS_25AutoGetCollectionViewModeENS_6Date_tENS_16AutoStatsTracker7LogModeE","s+":"8B"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701E4493CF","b":"55700D006000","o":"114433CF","s":"_ZN5boost15optional_detail13optional_baseIN5mongo44AutoGetCollectionForReadCommandMaybeLockFreeEE9constructIJRPNS2_16OperationContextERKNS2_21NamespaceStringOrUUIDENS2_25AutoGetCollectionViewModeENS2_6Date_tENS2_16AutoStatsTracker7LogModeEEEEvNS_11optional_ns15in_place_init_tEDpOT_","s+":"4F"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.157+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701E43C9C7","b":"55700D006000","o":"114369C7","s":"_ZN5mongo28CommonMongodProcessInterface40attachCursorSourceToPipelineForLocalReadEPNS_8PipelineE","s+":"4F7"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701E58982B","b":"55700D006000","o":"1158382B","s":"_ZN5mongo17shardVersionRetryIZNS_19sharded_agg_helpers22attachCursorToPipelineEPNS_8PipelineENS_20ShardTargetingPolicyEN5boost8optionalINS_7BSONObjEEEE3$_4EEDaPNS_16OperationContextEPNS_12CatalogCacheENS_15NamespaceStringENS_10StringDataEOT_","s+":"37B"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701E5890BD","b":"55700D006000","o":"115830BD","s":"_ZN5mongo19sharded_agg_helpers22attachCursorToPipelineEPNS_8PipelineENS_20ShardTargetingPolicyEN5boost8optionalINS_7BSONObjEEE","s+":"53D"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701E3FA8DF","b":"55700D006000","o":"113F48DF","s":"_ZN5mongo27ShardServerProcessInterface28attachCursorSourceToPipelineEPNS_8PipelineENS_20ShardTargetingPolicyEN5boost8optionalINS_7BSONObjEEE","s+":"5F"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F488150","b":"55700D006000","o":"12482150","s":"_ZN5mongo20DocumentSourceLookUp13buildPipelineERKNS_8DocumentE","s+":"E90"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F48581B","b":"55700D006000","o":"1247F81B","s":"_ZN5mongo20DocumentSourceLookUp12unwindResultEv","s+":"AAB"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F483B56","b":"55700D006000","o":"1247DB56","s":"_ZN5mongo20DocumentSourceLookUp9doGetNextEv","s+":"F6"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701CFC28BC","b":"55700D006000","o":"FFBC8BC","s":"_ZN5mongo14DocumentSource7getNextEv","s+":"21C"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701F53C6EE","b":"55700D006000","o":"125366EE","s":"_ZN5mongo8Pipeline7getNextEv","s+":"DE"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701D698F3C","b":"55700D006000","o":"10692F3C","s":"_ZN5mongo28ReshardingDonorOplogIterator10_fillBatchERNS_8PipelineE","s+":"AC"}}
      [js_test:resharding_fuzzer-79234-1630149184498-1] d20022| 2021-08-28T11:16:23.158+00:00 I  CONTROL  31445   [ReshardingRecipientService-1] "Frame","attr":{"frame":{"a":"55701D699E2C","b":"55700D006000","o":"10693E2C","s":"_ZN5mongo28ReshardingDonorOplogIterator12getNextBatchESt10shared_ptrINS_8executor12TaskExecutorEENS_17CancellationTokenENS_33CancelableOperationContextFactoryE","s+":"54C"}}
      

      https://evergreen.mongodb.com/lobster/build/c6979c6e3c82b5fa2586cea47ff21636/test/612a1acec2ab686fd51b1f68#bookmarks=0%2C37457%2C37460%2C37506%2C37797%2C160261%2C160414&f~=100~d20022%5C%7C

      Attachments

        Issue Links

          Activity

            People

              max.hirschhorn@mongodb.com Max Hirschhorn
              max.hirschhorn@mongodb.com Max Hirschhorn
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: