Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.2.10
Affects Version/s: None
Component/s: Sharding
Labels:
- code-and-test

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Sharding 2016-09-19
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Issue Status as of Oct 07, 2016

ISSUE DESCRIPTION AND IMPACT
As a workaround for ~~SERVER-23192~~, MongoDB 3.2.10 introduced an option where a node never stops monitoring a replica set once it has started, no matter how long it appears to be down for. Using this option means you can encounter problems if you remove a shard then add back a shard with the same replica set name.

This parameter is set to false by default, and can be set by executing following command:

db.adminCommand( {setParameter: 1, 'timeOutMonitoringReplicaSets': true} )

DIAGNOSIS AND AFFECTED VERSIONS
This option is included MongoDB 3.2.10 and subsequent releases of MongoDB 3.2. Please note that it is not included in MongoDB 3.4.

REMEDIATION AND WORKAROUNDS
If the operator wishes to re-add the shard using different hosts at a later date, the operator has two choices:

Restart all the affected nodes.
Toggle the timeOutMonitoringReplicaSets server parameter introduced in ~~SERVER-25516~~ from false to true on each affected node. Once the the shard is discovered, switch timeOutMonitoringReplicaSets back to false, usually this process takes about two minutes.

Original description

As a workaround for ~~SERVER-23192~~ on 3.2 we can introduce an option where we never stop monitoring a replica set once we've started, no matter how long it appears to be down for. Using this option means you can encounter problems if you remove a shard then add back a shard with the same replica set name.

related to

SERVER-23192 mongos and shards will become unusable if contact is lost with all CSRS config server nodes for more than 30 consecutive failed attempts to contact

Closed

Assignee:: Andy Schwerin
Reporter:: Spencer Brody (Inactive)
Participants:: Andy Schwerin, Githook User, Spencer Brody
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Aug 09 2016 09:04:28 PM UTC
Updated:: Apr 03 2019 03:19:11 PM UTC
Resolved:: Sep 12 2016 11:00:41 PM UTC
Confidence Status Last Update:: 08/Sep/16 10:06 PM

Details

Description

Original description

Attachments

Issue Links

Forms

Activity

People

Dates