The crux of problem occurs when I try to add the user using a localhost connection to mongos I get:
mongos> db.addUser('z','z')
{
"user" : "z",
"readOnly" : false,
"pwd" : "899fa315594cebad2592f18d1ef50f79",
"_id" : ObjectId("504670f2705dfdc07c42070a")
}
Tue Sep 04 21:21:54 uncaught exception: couldn't add user: SyncClusterConnection
::insert prepare failed: ShardRole0:20000:
ShardRole1:20000:
{ errmsg: "need to login", ok: 0.0 }ShardRole2:20000:
{ errmsg: "need to login", ok: 0.0 }At this point, we are running with security mode enabled (e.g., KeyFile switch), but have not yet added any admin users anywhere.
Here are some key details:
- This happened with MongoDB for Windows 2008plus versions 2.1.1 and 2.2.0
- Our topology is we have three servers, each running a MongoS, a Config and MongoD Replica member.
- ShardServer0: MongoS, MongoD Config, MongoD replica
- ShardServer1: MongoS, MongoD Config, MongoD replica
- ShardServer2: MongoS, MongoD Config, MongoD replica
- All replica instances belong to a single ReplicaSet, "rs"
- This error occurs with or without a Shard first being added.
- The problem ONLY seems to surface when the servers are actually separate physical machines, or separate virtual machine instances. In other words, if all three sets of processes run on one machine then this error is not encountered,
Reproducing the Problem:
We have built a set of command line batch scripts for Windows that replicate the problem (see the attachments).
The scripts just need to be copied next to your Mongo binaries. For example, here's how we run it (follow the prompts at the command line for each):
– ShardServer0: execute "RunA.bat"
– ShardServer1: execute "RunB.bat"
– ShardServer2: execute "RunC.bat"
RunA.bat is identical to RunB and RunC except that it will also create a MongoS process and attempt to add a user.
If you will be trying it out, be sure to update folder paths, IP addresses and ports as appropriate for your environment by modifying all three BAT scripts and the initShardABC.js script.
For comparison purposes, if you just run RunShardInfrastructureWithAuth.bat, it will run everything on a single machine and you will not encounter this error.
- duplicates
-
SERVER-6591 Localhost authentication exception doesn't work right on sharded cluster
- Closed