Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.4.2, 3.5.2
Component/s: Sharding
Labels:
None

Operating System:
ALL
Sprint:
Sharding 2017-03-27, Sharding 2017-04-17
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The ShardRegistry::reload call spawns a thread to refresh the list of shards from the config server. Because this thread runs with its own OperationContext, it ends up calling ReplicationCoordinatorImpl::waitUntilOpTimeForRead without any timeout.

Because of this, the shutdown sequence gets stuck since replication cannot make progress and update the opTime due to the server shutting down and the reload operation cannot proceed because it is waiting on the opTime to advance.

The reason for this is that replication is the last entry in the shutdown sequence, so it never gets to be invoked in the scenario above and because of this waitUntilOpTimeForRead becomes permanently stuck.

duplicates

SERVER-27691 ServiceContext::setKillAllOperations should be replaced with an operation that interrupts running operations

Closed

Assignee:: Kaloian Manassiev
Reporter:: Kaloian Manassiev
Participants:: Judah Schvimer, Kaloian Manassiev
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Feb 02 2017 10:50:30 PM UTC
Updated:: Apr 05 2017 04:57:39 PM UTC
Resolved:: Apr 05 2017 04:57:39 PM UTC
Confidence Status Last Update:: 05/Apr/17 4:56 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates