[SERVER-26719] Improve logging when config server does not support CSRS Created: 21/Oct/16  Updated: 06/Dec/22  Resolved: 21/Mar/18

Status: Closed
Project: Core Server
Component/s: Logging, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Akira Kurogane Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-23192 mongos and shards will become unusabl... Closed
Assigned Teams:
Sharding
Sprint: Sharding 2016-11-21, Sharding 2016-12-12, Sharding 2017-01-02, Sharding 2017-02-13, Sharding 2017-03-06
Participants:
Case:

 Description   

See comment below for additional details

Original Description

If a SCCC, MMAPv1-using config server is restarted with a replicaset name but lacks the configsvrMode: "sccc" it will be a replicaset node, but will not take up primary role because MMAPv1 does not support read concern. So it goes into REMOVED state as the first and only rsstate change.

Without waiting for a primary to appear in the new CSRS the shard mongod nodes and mongos that connect to that config server switch over the replicaset-using CatalogManager and kill their legacy CatalogManager. There is no valid primary to read from, so they enter SERVER-23192.

Another way of describing this is: If a person follows the SCCC -> CSRS migration documentation (link) and makes the one mistake at step #3 of failing to:

[set] the --configsvrMode option to the legacy config server mode Sync Cluster Connection Config (sccc),

Then they will silently enter SERVER-23192 after 30 secs, and that will stick until the mongod and mongos nodes are restarted. Even if they restart the config server with --configsvrMode="sccc" once.



 Comments   
Comment by Akira Kurogane [ 21/Oct/16 ]

Related request:

When the config server recognizes that it must take REMOVED rs state in the following code, can it log a warning such as _ "CSRS config server mode without storage engine read-concern support is not possible. This node is now removing itself from the replica set."_

MemberState TopologyCoordinatorImpl::getMemberState() const {
    if (_selfIndex == -1) {
        if (_rsConfig.isInitialized()) {
            return MemberState::RS_REMOVED;
        }
        return MemberState::RS_STARTUP;
    }
 
    if (_rsConfig.isConfigServer()) {
        if (_options.configServerMode == CatalogManager::ConfigServerMode::NONE) {
            return MemberState::RS_REMOVED;
        }
        if (_options.configServerMode == CatalogManager::ConfigServerMode::CSRS) {
            invariant(_storageEngineSupportsReadCommitted != ReadCommittedSupport::kUnknown);
            if (_storageEngineSupportsReadCommitted == ReadCommittedSupport::kNo) {
                return MemberState::RS_REMOVED;
            }
        }
    } else {
        if (_options.configServerMode != CatalogManager::ConfigServerMode::NONE) {
            return MemberState::RS_REMOVED;
        }
    }
 
    if (_role == Role::leader) {
        invariant(_currentPrimaryIndex == _selfIndex);
        return MemberState::RS_PRIMARY;
    }
    const MemberConfig& myConfig = _selfConfig();
    if (myConfig.isArbiter()) {
        return MemberState::RS_ARBITER;
    }
    if (((_maintenanceModeCalls > 0) || (_hasOnlyAuthErrorUpHeartbeats(_hbdata, _selfIndex))) &&
        (_followerMode == MemberState::RS_SECONDARY)) {
        return MemberState::RS_RECOVERING;
    }
    return _followerMode;
}

Generated at Thu Feb 08 04:13:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.