[SERVER-27070] Multiple mongos created when CSRS is not available during startup Created: 16/Nov/16  Updated: 28/Nov/16  Resolved: 28/Nov/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Darshan Shah Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongos_chi-ppe-oas001.log    
Issue Links:
Duplicate
Operating System: ALL
Steps To Reproduce:

In a sharded cluster running MongoDb 3.2.5 with WT, force the CSRS nodes to not startup but still go ahead with starting mongos processes.

Participants:

 Description   

On a sharded cluster running MongoDb 3.2.5 with WT, multiple mongos processes are created when the CSRS nodes are not up and ready at that time.
During starting the cluster, if the CSRS nodes are not available, I see multiple mongos processes being created which just hang around without serving any client.

[oasprod@chi-ppe-oas001 ~]$ ps -aef | grep mongos
oasprod   8728     1  0 13:41 ?        00:00:00 /oas_alias/mongodb/mongodb/bin/mongos --fork --port 29101 --logpath /data_mount/mongodata/mongos/mongos.log --pidfilepath /data_mount/mongodata/mongos/mongos.pid --configdb csReplSet/bxb-ppe-oas002:29102,chi-ppe-oas019:29102,bxb-ppe-oas012:29102 --quiet
oasprod   8729  8728  0 13:41 ?        00:00:01 /oas_alias/mongodb/mongodb/bin/mongos --fork --port 29101 --logpath /data_mount/mongodata/mongos/mongos.log --pidfilepath /data_mount/mongodata/mongos/mongos.pid --configdb csReplSet/bxb-ppe-oas002:29102,chi-ppe-oas019:29102,bxb-ppe-oas012:29102 --quiet
oasprod  22278 16931  0 13:57 pts/2    00:00:00 grep --color=auto mongos
[oasprod@chi-ppe-oas001 ~]$

My expectation is that the mongos process should simply fail instead of exhibiting this behavior.
Attaching the log file from the hanging mongos process.

I had created an issue for the same on the google user forums as well:
https://groups.google.com/forum/#!searchin/mongodb-user/mongos%7Csort:relevance/mongodb-user/b5K9enNl0Cs/PuNszaSQBgAJ



 Comments   
Comment by Kelsey Schubert [ 28/Nov/16 ]

Hi darshan.shah@interactivedata.com,

As Kevin explained on the mongodb-user group, there is no MongoDB process that spawns itself multiple times. It is expected that a mongos will launch if the the CSRS is not available. For further discussion around this behavior please post to the mongodb-users group.

Thank you,
Thomas

Comment by Darshan Shah [ 17/Nov/16 ]

Here is the code snippet of the simple python script that I wrote to start the sharded cluster:

def startMongos(server, port, logPath, pidPath, options, name):
  try:
    cmd = ""
    wt = " --wiredTigerCacheSizeGB 20 "
    cmd = "ssh " + server + " \"numactl --interleave=all /mongodb/bin/" + exe + " --fork --quiet --port " + port + " --logpath " + logPath + " --pidfilepath " + pidPath + options + wt + " \""
    output = subprocess.check_output(cmd, stderr=subprocess.STDOUT, shell=True)
    status = ""
    if "child process started successfully, parent exiting" in output:
      lines = output.splitlines()
      status = lines[1]
    else:
      status = repr(output)
    print(name, ";", status.strip())
  except Exception as e:
    print(name, "Error in ", cmd, "\n", repr(e))

So the script is definitely not trying to start multiple instances of mongos.
And when this happens, I cannot connect to the mongos from another shell - connection is refused.

Comment by Andy Schwerin [ 16/Nov/16 ]

The mongos node stays up trying to contact the config servers indefinitely, to simplify the process of starting up whole clusters. That, at least, is as designed. I don't know why your script is launching multiple mongos nodes. Where did you get it?

I believe that mongos should not return from --fork before it starts listening on the assigned port, but I'm not sure that it does. If it does, the way to check that mongos is done is to wait for the start of mongos --fork to return, and then to connect to the port and wait for an answer.

samantha.ritter, I believe made some changes to the fork-and-listen behavior in the last six months. Perhaps she can supply more detail.

Generated at Thu Feb 08 04:14:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.