[SERVER-61473] Resharding coordinator calls ReshardingMetrics::onCompletion() multiple times on transient errors, leading to config server crash Created: 15/Nov/21  Updated: 29/Oct/23  Resolved: 16/Nov/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.0, 5.1.0
Fix Version/s: 5.2.0, 5.0.5, 5.1.1

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
is caused by SERVER-56739 Rewrite resharding metrics duration c... Closed
is caused by SERVER-57153 Support co-existing donors/recipients... Closed
Related
related to SERVER-61483 Resharding coordinator fails to recov... Closed
is related to SERVER-56923 Temporarily comment out resharding me... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Sprint: Sharding 2021-11-29
Participants:
Story Points: 1

 Description   

The ReshardingCoordinator calls ReshardingMetrics::onCompletion() within its resharding::WithAutomaticRetry blocks

SERVER-56923 had commented out an invariant related to ReshardingMetrics::onCompletion() being called multiple times. Until the TODO comment can be addressed by refactoring to ReshardingMetrics class altogether, we should make ReshardingMetrics::onCompletion() itself safe to be called multiple times.

{"t":{"$date":"2021-11-14T15:37:37.617+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"ReshardingCoordinatorService-2","msg":"Writing fatal message","attr":{"message":"Invalid access at address: 0x2d0"}}
{"t":{"$date":"2021-11-14T15:37:37.617+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"ReshardingCoordinatorService-2","msg":"Writing fatal message","attr":{"message":"Got signal: 11 (Segmentation fault).\n"}}

(gdb) bt
#0  mongo::ReshardingMetrics::onCompletion (this=0x557fbfb95b00, role=role@entry=mongo::ReshardingMetrics::kCoordinator, status=mongo::ReshardingOperationStatusEnum::kCanceled, runningOperationEndTime=...) at src/third_party/boost/boost/optional/optional.hpp:1453
#1  0x00007fd16d7cb97a in mongo::markCompleted (status=...) at src/mongo/db/s/resharding/resharding_coordinator_service.cpp:1023
#2  0x00007fd16d7e8df0 in mongo::ReshardingCoordinatorService::ReshardingCoordinator::<lambda(const auto:58&)>::operator()<std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> > >(const std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> > &) const (__closure=<optimized out>, coordinatorDocsChangedOnDisk=...) at /opt/mongodbtoolchain/revisions/ba5f698948588cb5da922d3cadee990f5f9f48cd/stow/gcc-v3.pPo/include/c++/8.5.0/bits/atomic_base.h:512
#3  0x00007fd16d7fe9a5 in mongo::unique_function<void(std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >)>::callRegularVoid<mongo::ReshardingCoordinatorService::ReshardingCoordinator::_awaitAllParticipantShardsDone(const std::shared_ptr<mongo::executor::ScopedTaskExecutor>&)::<lambda(const auto:58&)> > (args#0=..., f=..., isVoid=...) at src/mongo/util/functional.h:158
#4  mongo::unique_function<void(std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >)>::SpecificImpl::call (args#0=..., this=<optimized out>) at src/mongo/util/functional.h:159
#5  mongo::unique_function<void (std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >)>::operator()(std::vector<mongo::ReshardingCoordinatorDocument, std::allocator<mongo::ReshardingCoordinatorDocument> >) const (args#0=..., this=<optimized out>) at src/mongo/util/functional.h:109



 Comments   
Comment by Githook User [ 16/Nov/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-61473 Make ReshardingMetrics::onCompletion() idempotent.

ReshardingMetrics::onCompletion() can be called multiple times within
the resharding::WithAutomaticRetry blocks of the ReshardingCoordinator.

(cherry picked from commit 5d18bd88c941964e19622282cd040eadbb0db23d)
Branch: v5.0
https://github.com/mongodb/mongo/commit/6c3707f9e2b460efc3641d83d9c6138204e1c1ba

Comment by Githook User [ 16/Nov/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-61473 Make ReshardingMetrics::onCompletion() idempotent.

ReshardingMetrics::onCompletion() can be called multiple times within
the resharding::WithAutomaticRetry blocks of the ReshardingCoordinator.

(cherry picked from commit 5d18bd88c941964e19622282cd040eadbb0db23d)
Branch: v5.1
https://github.com/mongodb/mongo/commit/2921ff2372c9d05f86eb7614a07b731cd5b5b544

Comment by Githook User [ 15/Nov/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-61473 Make ReshardingMetrics::onCompletion() idempotent.

ReshardingMetrics::onCompletion() can be called multiple times within
the resharding::WithAutomaticRetry blocks of the ReshardingCoordinator.
Branch: master
https://github.com/mongodb/mongo/commit/5d18bd88c941964e19622282cd040eadbb0db23d

Generated at Thu Feb 08 05:52:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.