-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: 5.0.0, 5.1.0-rc0
-
Component/s: None
-
Service Arch
-
ALL
-
1
PrimaryOnlyService::onStepUp() has logic to wait for any Instances constructed in a previous term have finished executing before constructing any new Instances in a higher term.
// This ensures that all instances from previous term have joined. for (auto& instance : savedInstances) { instance.second.waitForCompletion(); }
The logic in PrimaryOnlyService::onStepUp() only applies to Instances which are still tracked in PrimaryOnlyService::_activeInstances. When the state document for the Instance is removed, the Instance is also removed from PrimaryOnlyService::_activeInstances. However, PrimaryOnlyServiceOpObserver::onDelete() also run on secondaries as part of oplog application.
This leads to a situation where an Instance can be constructed in term 7 despite an (untracked) Instance with a different ID from term 5 not having its future returned by run() become ready. I suspect the solution here is to have PrimaryOnlyServiceOpObserver check whether the current node is primary when doing the delete/drop before removing the ActiveInstance from the map.
Acceptance criteria: Investigate the root cause and propose possible solutions for triage.
- is depended on by
-
SERVER-57686 We need test coverage that runs resharding in the face of elections
- Closed
- is related to
-
SERVER-54460 Resharding may delete the state document before fully completing
- Closed