Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Story Points:
3
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Reproduced with jstests/sharding/delete_range_deletion_tasks_on_stepup_after_drop_collection.js.

The test is doing step down and up on catalog shard, while doing a chunk migration that is supposed to fail. After step up, the former primary becomes primary again, and ReplicationCoordinatorImpl::signalDrainComplete() is invoked and never completes until the test ends.

The side-effect of this is that the _makeHelloResponse() will always return "i am secondary", which makes the Hello reply consumer to drop it.

There is a logical deadlock during the chunk migration logic resuming on step up:
1. ReplicationCoordinatorExternalStateImpl::onTransitionToPrimary
2. ReplicationCoordinatorExternalStateImpl::_shardingOnTransitionToPrimaryHook()
3. ShardingStateRecovery::recover()
4. // Need to fetch the latest uptime from the config server, so do a logging write
ShardingLogging::get(opCtx)->logChangeChecked(..., kMajorityWriteConcern)
at this point it should be clear that majority write concern during step up before the writes are allowed will not work...
5. ShardingLogging::_log()
6. Grid::get(opCtx) ->catalogClient()->insertConfigDocument(..., kMajorityWriteConcern)
7. ShardLocal::_runCommand()

My opinion the step 4 to write the recoveryDoc and fetch the latest uptime cannot use the majority. Just do the local write if you are a primary.

Assignee:: Andrew Shuvalov (Inactive)
Reporter:: Andrew Shuvalov (Inactive)
Participants:: Andrew Shuvalov
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Oct 25 2022 09:21:45 PM UTC
Updated:: Oct 29 2023 09:31:28 PM UTC
Resolved:: Oct 27 2022 10:05:16 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates