[SERVER-20025] mongos fails to start if CSRS primary is not ismaster Created: 19/Aug/15  Updated: 25/Jan/17  Resolved: 29/Sep/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.1.7
Fix Version/s: 3.1.9

Type: Bug Priority: Major - P3
Reporter: Jonathan Abrahams Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.js    
Issue Links:
Depends
depends on SERVER-19855 Operations that convey shard version ... Closed
depends on SERVER-20494 Enable reading from CSRS secondaries Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

#!/bin/bash
dbRoot=/tmp
mongo=$(which mongo)
mongod=$(which mongod)
mongos=$(which mongos)
cfgPort=29017
mongosPort=30000
hostname=$(hostname)
replset="configServers"
mongoCmds=""
members="[ "
configServers=
storageEngine="wiredTiger"
 
waitForPrimary="assert.soon(function() { return rs.isMaster().primary })"
 
# Start mongods
for i in $(seq 0 2)
do
    port=$((cfgPort+i))
    dbpath="$dbRoot/config$i"
    rm -fr $dbpath
    mkdir -p $dbpath
    logpath="mongod-config$i.log"
    rm ${logpath}*
 
    # Save each mongod configsvr
    cfgSrvr[$i]="$mongod \
        --storageEngine $storageEngine \
        --smallfiles \
        --configsvr \
        --port $port \
        --dbpath $dbpath \
        --replSet $replset \
        --logpath $logpath \
        --fork"    
 
    # Start each mongod configsvr
    ${cfgSrvr[$i]}
    members="$members {_id: $i, host: '$hostname:$port'},"
    if [ ! -z "$configServers" ]; then
        configServers="$configServers,$hostname:$port"
    else
        configServers="$hostname:$port"
    fi
done
members="$members ]"
 
# Initiate the replica set as configsvr (CSRS)
mongoCmds="var cfg={_id: '$replset', configsvr: true, members: $members};
    print('initializing with', tojson(cfg));
    print('rs.initiate', tojson(rs.initiate(cfg)));
    $waitForPrimary;
    printjson(rs.status());"
echo "$mongoCmds" | $mongo --port $cfgPort
 
# Start the mongos
mongosCmd="$mongos \
    --port $mongosPort \
    --configdb configServers/$configServers \
    --fork \
    --logpath mongos.log"
 
$mongosCmd

Sprint: Sharding 8 08/28/15, Sharding 9 (09/18/15), Sharding A (10/09/15)
Participants:

 Description   

mongos fails to start if the primary node does not have ismaster set to true (see SERVER-20017). The CSRS is in a startup state and mongos should retry before failing immediately:

{
	"hosts" : [
		"rhel64.mongotest.com:29017",
		"rhel64.mongotest.com:29018",
		"rhel64.mongotest.com:29019"
	],
	"setName" : "configServers",
	"setVersion" : 1,
	"ismaster" : false,
	"secondary" : true,
	"primary" : "rhel64.mongotest.com:29017",
	"me" : "rhel64.mongotest.com:29017",
	"electionId" : ObjectId("55d38ce30000000000000000"),
	"configsvr" : 1,
	"maxBsonObjectSize" : 16777216,
	"maxMessageSizeBytes" : 48000000,
	"maxWriteBatchSize" : 1000,
	"localTime" : ISODate("2015-08-18T19:52:03.160Z"),
	"maxWireVersion" : 4,
	"minWireVersion" : 0,
	"ok" : 1
}

2015-08-18T19:52:03.764+0000 I NETWORK  [mongosMain] starting new replica set monitor for replica set configServers with seeds
2015-08-18T19:52:03.764+0000 I NETWORK  [mongosMain] rhel64.mongotest.com:29017
2015-08-18T19:52:03.764+0000 I NETWORK  [mongosMain] ,
2015-08-18T19:52:03.764+0000 I NETWORK  [mongosMain] rhel64.mongotest.com:29018
2015-08-18T19:52:03.764+0000 I NETWORK  [mongosMain] ,
2015-08-18T19:52:03.764+0000 I NETWORK  [mongosMain] rhel64.mongotest.com:29019
2015-08-18T19:52:03.764+0000 I NETWORK  [ReplicaSetMonitorWatcher] starting
2015-08-18T19:52:03.810+0000 W NETWORK  [mongosMain] No primary detected for set configServers
2015-08-18T19:52:03.810+0000 E SHARDING [mongosMain] Error initializing sharding system: NotMaster No master found for set configServers



 Comments   
Comment by Spencer Brody (Inactive) [ 20/Jan/16 ]

tung@misfit.com Can you elaborate on what you doing when you encountered this?

Worth noting is that you cannot start up a mongos for the first time in a new cluster unless the config server has a primary. Existing clusters should have no problem starting new mongoses, however, even if there is no config server primary, so long as at least one config server is up.

Comment by Tung Nguyen [ 20/Jan/16 ]

Looks like this is happening in 3.2.1 when the config cluster has already ready. Is anyone else experiencing the same issue?

Comment by Spencer Brody (Inactive) [ 29/Sep/15 ]

Confirmed that this was fixed by SERVER-20494, and added a jstest.

Comment by Githook User [ 29/Sep/15 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-20025 Test that starting mongos with a config server primary works
Branch: master
https://github.com/mongodb/mongo/commit/05e9080be96f05fcf4ed74b996308edd14f2cec1

Comment by Randolph Tan [ 19/Aug/15 ]

Yes. I will rerun the attached test.js I wrote to make sure it passes once SERVER-19855 is resolved.

Comment by Andy Schwerin [ 19/Aug/15 ]

So this will go away once SERVER-19855 is resolved?

Comment by Randolph Tan [ 19/Aug/15 ]

This is because all operations in the catalog manager has the read preference setting set to PrimaryOnly. Confirmed that you can start a mongos once I switched to nearest in the code. Confirmed that you can also do queries on config collections as long as you pass slaveOk true or a read preference that allows reading secondaries. SERVER-19855 will be switching the read preference.

Comment by Andy Schwerin [ 19/Aug/15 ]

Actually, looks like we've separately confirmed this bug exists even for already initialized clusters.

Comment by Andy Schwerin [ 19/Aug/15 ]

jonathan.abrahams, when a new sharded cluster is started, the first mongos (or some early mongos) initializes the config collections on the config servers. I suspect that the bug you've reported here only applies to a mongos contacting a not-yet-initialized cluster. To help me understand the scope of this bug, could you try the following?

  1. Start a three node config server replica set
  2. Once there's a primary, start a mongos
  3. Shut down two of the config server replica set members
  4. Wait for the remaining member to become a secondary
  5. Try to start another mongos.

Does the mongos started in the last step stay up, or does it shut down?

Generated at Thu Feb 08 03:52:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.