Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61473

Resharding coordinator calls ReshardingMetrics::onCompletion() multiple times on transient errors, leading to config server crash

    XMLWordPrintable

Details

    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Sharding 2021-11-29
    • 1

    Description

      The ReshardingCoordinator calls ReshardingMetrics::onCompletion() within its resharding::WithAutomaticRetry blocks

      SERVER-56923 had commented out an invariant related to ReshardingMetrics::onCompletion() being called multiple times. Until the TODO comment can be addressed by refactoring to ReshardingMetrics class altogether, we should make ReshardingMetrics::onCompletion() itself safe to be called multiple times.

      {"t":{"$date":"2021-11-14T15:37:37.617+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"ReshardingCoordinatorService-2","msg":"Writing fatal message","attr":{"message":"Invalid access at address: 0x2d0"}}
      {"t":{"$date":"2021-11-14T15:37:37.617+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"ReshardingCoordinatorService-2","msg":"Writing fatal message","attr":{"message":"Got signal: 11 (Segmentation fault).\n"}}
      

      (gdb) bt
      #0  mongo::ReshardingMetrics::onCompletion (this=0x557fbfb95b00, role=role@entry=mongo::ReshardingMetrics::kCoordinator, status=mongo::ReshardingOperationStatusEnum::kCanceled, runningOperationEndTime=...) at src/third_party/boost/boost/optional/optional.hpp:1453
      #1  0x00007fd16d7cb97a in mongo::markCompleted (status=...) at src/mongo/db/s/resharding/resharding_coordinator_service.cpp:1023
      #2  0x00007fd16d7e8df0 in mongo::ReshardingCoordinatorService::ReshardingCoordinator::<lambda(const auto:58&)>::operator()<std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> > >(const std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> > &) const (__closure=<optimized out>, coordinatorDocsChangedOnDisk=...) at /opt/mongodbtoolchain/revisions/ba5f698948588cb5da922d3cadee990f5f9f48cd/stow/gcc-v3.pPo/include/c++/8.5.0/bits/atomic_base.h:512
      #3  0x00007fd16d7fe9a5 in mongo::unique_function<void(std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >)>::callRegularVoid<mongo::ReshardingCoordinatorService::ReshardingCoordinator::_awaitAllParticipantShardsDone(const std::shared_ptr<mongo::executor::ScopedTaskExecutor>&)::<lambda(const auto:58&)> > (args#0=..., f=..., isVoid=...) at src/mongo/util/functional.h:158
      #4  mongo::unique_function<void(std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >)>::SpecificImpl::call (args#0=..., this=<optimized out>) at src/mongo/util/functional.h:159
      #5  mongo::unique_function<void (std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >)>::operator()(std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >) const (args#0=..., this=<optimized out>) at src/mongo/util/functional.h:109
      

      Attachments

        Issue Links

          Activity

            People

              max.hirschhorn@mongodb.com Max Hirschhorn
              max.hirschhorn@mongodb.com Max Hirschhorn
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: