Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.6.0-rc1
Affects Version/s: None
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Sharding 2017-10-23
Linked BF Score:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If a shard primary steps down, it will trigger the ShardServerCatalogCacheLoader::onStepDown function which sets the PrimarySteppedDown error code interrupt on the OperationContextGroup used by the ShardServerCatalogCacheLoader. This means that all operation contexts added to this group will be automatically interrupted until resetInterrupt() is called in the ShardServerCatalogCacheLoader::onStepUp hook, when/if the node transitions back to primary.

The problem is that when a secondary that used to be primary receives a command that forces it to refresh its routing table, it will add an operation context to its OperationContextGroup and call _runSecondaryGetChunksSince, which will attempt to send a "forceRoutingTableRefresh" command to the primary, but because the operation context was automatically interrupted upon being added to the group, this will fail with PrimarySteppedDown, and the primary will never actually receive the refresh command.

I think ~~SERVER-30148~~ will fix this, because the ShardingState's refresh logic shouldn't use an operation context from the ShardServerCatalogCacheLoader's interrupted OperationContextGroup.

Assignee:: Dianna Hohensee (Inactive)
Reporter:: Jack Mulrow
Participants:: Dianna Hohensee, Githook User, Jack Mulrow, Kaloian Manassiev
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Oct 11 2017 09:19:45 PM UTC
Updated:: Oct 30 2023 11:12:52 PM UTC
Resolved:: Oct 19 2017 08:37:05 PM UTC

Details

Description

Attachments

Activity

People

Dates

PagerDuty