Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: 4.1 Desired
Affects Version/s: None
Component/s: Sharding, Testing Infrastructure
Labels:

Assigned Teams:

Cluster Scalability
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Given the changes from ~~SERVER-34665~~ which exposes a Mongo.prototype._markHostAsFailed() function to call ReplicaSetMonitor::failedHost(), it shouldn't be necessary to use multiple retry attempts as a way to wait for the ReplicaSetMonitor to discover a new primary has been elected because retargeting can be triggered explicitly. The auto_retry_on_network_error.js override could then use this mechanism rather than setting kMaxNumRetries=3 and could similarly remove TestData.overrideRetryAttempts=3 from the YAML suite definition.

Note: ~~SERVER-34608~~ describes a case where after receiving an InterruptedDueToReplStateChange error response that an "isMaster" command could still observe ismaster=true and could therefore cause server selection to pick a node which is still in the midst of stepping down. We could avoid decrementing the numRetries counter in this case of an InterruptedDueToReplStateChange error response because the first retry (i.e. the second attempt) will synchronize with the stepdown to finish and the mongo shell would observe a network error. A second retry (i.e. a third attempt) would be successfully targeted at whichever node is then elected the new primary.

depends on

SERVER-36128 ReplicationCoordinatorImpl::fillIsMasterForReplSet should return isMaster:false while in shutdown

Closed

SERVER-34665 The mongo shell should retry writes on a WriteConcernFailure error response from the server

Closed

is duplicated by

SERVER-35225 retryOnNetworkErrors does not subtract from number of retries

Closed

is related to

SERVER-34608 Drivers may still see ismaster=true from primary in midst of stepping down immediately after operations are killed with InterruptedDueToReplStateChange

Closed

Assignee:: [DO NOT USE] Backlog - Cluster Scalability
Reporter:: Max Hirschhorn
Participants:: [DO NOT USE] Backlog - Cluster Scalability, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Apr 25 2018 03:28:05 AM UTC
Updated:: Dec 12 2023 03:50:26 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates