[SERVER-5082] Queries with slaveOk() do not reauthenticate when credentials change and an admin user exists. Created: 24/Feb/12  Updated: 15/Aug/12  Resolved: 22/May/12

Status: Closed
Project: Core Server
Component/s: Internal Client, Replication, Security
Affects Version/s: 2.0.2, 2.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Rob LaRubbio Assignee: Ben Becker
Resolution: Duplicate Votes: 4
Labels: authentication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu, C++ driver


Attachments: File authQueryTest.lua     File authTest.cpp     File authTest.lua     File primary.log     File query.lua     File query.lua     File query.lua     File query.lua     File querySlaveOkTest.cpp     File querySlaveOkTest.cpp     File secondary.log    
Issue Links:
Related
is related to SERVER-5405 mongos does not send reads to seconda... Closed
Operating System: ALL
Participants:

 Description   

I don't have a reproducible test case, but this issue was seen in production with two different user/database combination two distinct times. We can not run with slave_ok in production because of it. Each time the issue persisted until slave_ok was removed as an option to query.

We are using the luamongo (https://github.com/moai/luamongo) driver which is a thin wrapper around the c++ driver. We never saw the issue when on mongodb 1.8, we first ran into the issue after updating to 2.0.2. Originally we thought it was because we were still using a lua driver compiled against the 1.8.X C++ driver but the issue continued after upgrading to the 2.0.2 version of the driver.

When a query is sent to a secondary this is the return:

[

{"code":10057,"$err":"unauthorized db:<db name> lock type:-1 client:50.18.245.77"}

]

Any lua query, even basic one's like the following will generate the error:

local cursor = mongodb:query('mydb.datastore', '

{key:"value"}

', nil, nil, nil, 4)



 Comments   
Comment by Tad Marshall [ 25/Jun/12 ]

Setting Backport to No because this ticket was marked as a duplicate of SERVER-5405 and SERVER-5405 has already been backported.

Comment by Ben Becker [ 22/May/12 ]

This appears to have been fixed because of SERVER-5405 and commit 06e99af8. Attached lua tests work as expected when testing on master.

Comment by Ben Becker [ 03/Apr/12 ]

It looks like we can replace the call to replSetGetStats with isMaster to avoid requiring admin credentials for each secondary connection. checkSlave() may also need to reauth even if it was the last slave selected.

Comment by Ben Becker [ 02/Apr/12 ]

Issue appears to be in the replica set secondary connection logic. We never attempt to reauthenticate when using a secondary. Existing behavior: on the second query, the auth message is sent over connection #6, and the query itself is sent over connection #7. Note that neither of these are actually sent to the secondary server; only the primary.

Mon Apr  2 14:36:32 [initandlisten] connection accepted from 127.0.0.1:57061 #6
Mon Apr  2 14:36:32 [conn6] runQuery called a.$cmd { getnonce: 1 }
Mon Apr  2 14:36:32 [conn6] run command a.$cmd { getnonce: 1 }
Mon Apr  2 14:36:32 [conn6] command a.$cmd command: { getnonce: 1 } ntoreturn:1 reslen:65 0ms
Mon Apr  2 14:36:32 [conn6] runQuery called a.$cmd { authenticate: 1, nonce: "7aa7df454cfe5633", user: "a", key: "c813a42e61134e954becaecce7734522" }
Mon Apr  2 14:36:32 [conn6] run command a.$cmd { authenticate: 1, nonce: "7aa7df454cfe5633", user: "a", key: "c813a42e61134e954becaecce7734522" }
Mon Apr  2 14:36:32 [conn6] Accessing: a for the first time
Mon Apr  2 14:36:32 [conn6]  authenticate: { authenticate: 1, nonce: "7aa7df454cfe5633", user: "a", key: "c813a42e61134e954becaecce7734522" }
Mon Apr  2 14:36:32 [conn6] command a.$cmd command: { authenticate: 1, nonce: "7aa7df454cfe5633", user: "a", key: "c813a42e61134e954becaecce7734522" } ntoreturn:1 reslen:37 33ms
Mon Apr  2 14:36:32 [conn6] runQuery called admin.$cmd { getlasterror: 1 }
Mon Apr  2 14:36:32 [conn6] run command admin.$cmd { getlasterror: 1 }
Mon Apr  2 14:36:32 [conn6] command admin.$cmd command: { getlasterror: 1 } ntoreturn:1 reslen:83 0ms
Mon Apr  2 14:36:32 [initandlisten] connection accepted from 127.0.0.1:57062 #7
Mon Apr  2 14:36:32 [conn7] runQuery called a.$cmd { getnonce: 1 }
Mon Apr  2 14:36:32 [conn7] run command a.$cmd { getnonce: 1 }
Mon Apr  2 14:36:32 [conn7] command a.$cmd command: { getnonce: 1 } ntoreturn:1 reslen:65 0ms
Mon Apr  2 14:36:32 [conn7] runQuery called a.$cmd { authenticate: 1, nonce: "52892b3be63f2894", user: "a", key: "72bbdb48e3e3088a7d565afbe9bf2a6c" }
Mon Apr  2 14:36:32 [conn7] run command a.$cmd { authenticate: 1, nonce: "52892b3be63f2894", user: "a", key: "72bbdb48e3e3088a7d565afbe9bf2a6c" }
Mon Apr  2 14:36:32 [conn7]  authenticate: { authenticate: 1, nonce: "52892b3be63f2894", user: "a", key: "72bbdb48e3e3088a7d565afbe9bf2a6c" }
Mon Apr  2 14:36:32 [conn7] command a.$cmd command: { authenticate: 1, nonce: "52892b3be63f2894", user: "a", key: "72bbdb48e3e3088a7d565afbe9bf2a6c" } ntoreturn:1 reslen:37 0ms
Mon Apr  2 14:36:32 [conn7] runQuery called a.values { a: 10 }
Mon Apr  2 14:36:32 [conn7] query a.values reslen:20 0ms
Mon Apr  2 14:36:32 [conn6] runQuery called admin.$cmd { getlasterror: 1 }
Mon Apr  2 14:36:32 [conn6] run command admin.$cmd { getlasterror: 1 }
Mon Apr  2 14:36:32 [conn6] command admin.$cmd command: { getlasterror: 1 } ntoreturn:1 reslen:83 0ms
Mon Apr  2 14:36:32 [conn6] runQuery called b.$cmd { getnonce: 1 }
Mon Apr  2 14:36:32 [conn6] run command b.$cmd { getnonce: 1 }
Mon Apr  2 14:36:32 [conn6] command b.$cmd command: { getnonce: 1 } ntoreturn:1 reslen:65 0ms
Mon Apr  2 14:36:32 [conn6] runQuery called b.$cmd { authenticate: 1, nonce: "cfb1a1bc1a27a5cf", user: "b", key: "f0f0cb7421d6824097367bd0583f5fe2" }
Mon Apr  2 14:36:32 [conn6] run command b.$cmd { authenticate: 1, nonce: "cfb1a1bc1a27a5cf", user: "b", key: "f0f0cb7421d6824097367bd0583f5fe2" }
Mon Apr  2 14:36:32 [conn6] Accessing: b for the first time
Mon Apr  2 14:36:32 [conn6]  authenticate: { authenticate: 1, nonce: "cfb1a1bc1a27a5cf", user: "b", key: "f0f0cb7421d6824097367bd0583f5fe2" }
Mon Apr  2 14:36:32 [conn6] command b.$cmd command: { authenticate: 1, nonce: "cfb1a1bc1a27a5cf", user: "b", key: "f0f0cb7421d6824097367bd0583f5fe2" } ntoreturn:1 reslen:37 21ms
Mon Apr  2 14:36:32 [conn6] runQuery called admin.$cmd { getlasterror: 1 }
Mon Apr  2 14:36:32 [conn6] run command admin.$cmd { getlasterror: 1 }
Mon Apr  2 14:36:32 [conn6] command admin.$cmd command: { getlasterror: 1 } ntoreturn:1 reslen:83 0ms
Mon Apr  2 14:36:32 [conn7] runQuery called b.values { a: 10 }
Mon Apr  2 14:36:32 [conn7] User Assertion: 10057:unauthorized db:b lock type:-1 client:127.0.0.1
Mon Apr  2 14:36:32 [conn7] assertion 10057 unauthorized db:b lock type:-1 client:127.0.0.1 ns:b.values query:{ a: 10 }
Mon Apr  2 14:36:32 [conn7] query b.values exception: unauthorized db:b lock type:-1 client:127.0.0.1 code:10057 reslen:93 0ms

Tested a quick work-around to confirm the cause of the bug, and now query.lua successfully authenticates:

diff --git a/client/dbclient_rs.cpp b/client/dbclient_rs.cpp
index c57a52d..512c001 100644
--- a/client/dbclient_rs.cpp
+++ b/client/dbclient_rs.cpp
@@ -544,8 +544,10 @@ namespace mongo {
         HostAndPort h = _monitor->getSlave( _slaveHost );
 
         if ( h == _slaveHost && _slave ) {
-            if ( ! _slave->isFailed() )
+            if ( ! _slave->isFailed() ) {
+                _auth( _slave.get() );
                 return _slave.get();
+            }
             _monitor->notifySlaveFailure( _slaveHost );
             _slaveHost = _monitor->getSlave();
         } 
bbecker-imac:mongo ben$ 

Comment by Rob LaRubbio [ 29/Mar/12 ]

Got it. Thanks for the update and thanks for your work on this, I appreciate it.

Comment by siddharth.singh@10gen.com [ 29/Mar/12 ]

Sorry, I should have been more explicit. So I tried to reproduce it multiple times over several days and saw it happen only once. I am afraid, I don't have a good answer as to why I can't reproduce it consistently. It would be great if I saw it happen reliably cause then I could fix it.

When the database collection is empty I see this:

Thu Mar 29 16:09:59 starting new replica set monitor for replica set setname with seed of 127.0.0.1:27017,127.0.0.1:27018
Thu Mar 29 16:09:59 successfully connected to seed 127.0.0.1:27017 for replica set setname
Thu Mar 29 16:09:59 changing hosts to { 0: "127.0.0.1:27017", 1: "127.0.0.1:27018" } from setname/
Thu Mar 29 16:09:59 trying to add new host 127.0.0.1:27017 to replica set setname
Thu Mar 29 16:09:59 successfully connected to new host 127.0.0.1:27017 in replica set setname
Thu Mar 29 16:09:59 trying to add new host 127.0.0.1:27018 to replica set setname
Thu Mar 29 16:09:59 successfully connected to new host 127.0.0.1:27018 in replica set setname
Thu Mar 29 16:09:59 replica set monitor for replica set setname started, address is setname/127.0.0.1:27017,127.0.0.1:27018
Thu Mar 29 16:09:59 [ReplicaSetMonitorWatcher] starting
No results for a
No results for a

After I add a document to the values collection to both database a and b, I see this:

Thu Mar 29 16:11:30 starting new replica set monitor for replica set setname with seed of 127.0.0.1:27017,127.0.0.1:27018
Thu Mar 29 16:11:30 successfully connected to seed 127.0.0.1:27017 for replica set setname
Thu Mar 29 16:11:30 changing hosts to { 0: "127.0.0.1:27017", 1: "127.0.0.1:27018" } from setname/
Thu Mar 29 16:11:30 trying to add new host 127.0.0.1:27017 to replica set setname
Thu Mar 29 16:11:30 successfully connected to new host 127.0.0.1:27017 in replica set setname
Thu Mar 29 16:11:30 trying to add new host 127.0.0.1:27018 to replica set setname
Thu Mar 29 16:11:30 successfully connected to new host 127.0.0.1:27018 in replica set setname
Thu Mar 29 16:11:30 replica set monitor for replica set setname started, address is setname/127.0.0.1:27017,127.0.0.1:27018
Thu Mar 29 16:11:30 [ReplicaSetMonitorWatcher] starting
A { _id: ObjectId('4f74c1ec28ae1eeb03b3ed89'), a: 10.0 }
B { _id: ObjectId('4f74c1f028ae1eeb03b3ed8a'), a: 10.0 }

In case you were curious I have also attached the server logs primary.log and secondary.log while I run the client.

I am going to continue trying to reproduce this and will update the ticket as soon as I find something. I edited the Lua tag from the bug issue.

Comment by Rob LaRubbio [ 29/Mar/12 ]

Also one other request can we remove "Lua (luamongo)" from the environment since I can reproduce it directly from c++ code without using any Lua. Thanks.

Comment by Rob LaRubbio [ 29/Mar/12 ]

I'm not sure I'm completely understand your comment. When you say "reliable reproduce" does that mean you've seen the issue sporadically but not consistently? What happened when you ran the latest version of the attached c++ code? I can try that again with a version of 2.0.4, but in the latest repro cases for me I'm pretty much using a completely clean server with downloads directly from 10gen (and no lua), so I'd really like to understand why it happens reliable for me but you guys can't reproduce it.

Comment by siddharth.singh@10gen.com [ 29/Mar/12 ]

Hi Rob,

I haven't been able to reliably reproduce this. I tried taking a diff between 1.8.5 and 2.0.2 code to see if we made any changes. The diff, however is big and its difficult to reverse engineer the issue. I also asked another engineer to try to reproduce this but he didn't see it happen either. You can try our recently released 2.0.4 or checkout mongo (r2.1.0 tag) from our master branch on github and see if that helps.

Comment by Rob LaRubbio [ 28/Mar/12 ]

I wanted to see if you are able to reproduce this issue with the latest attached c++ code. Thanks.

Comment by Rob LaRubbio [ 19/Mar/12 ]

It's possible I may have forgotten to delete that line before adding it to the ticket. The servers were up and running in a replica set since I would connect to the shell and run rs.status() before running the test to make sure all servers had joined. I don't have the output anymore but I had two servers in the set, one was primary and the other secondary.

Comment by siddharth.singh@10gen.com [ 19/Mar/12 ]

Hi,

In the repro steps you sent me, there was this step :

  1. In a third terminal
    mongo admin
    > db.addUser('admin', 'admin')

mongo admin --port 27018
> db.addUser('admin', 'admin')

After which you kill the servers and restart them as replica sets. Can you confirm that you executed the above step. You should not have been able to start the replica set and should have received an error saying ""errmsg" : "couldn't initiate : member 127.0.0.1:27018 has data already, cannot initiate set. All members except initiator must be empty."

I figure that you might have mistakenly wrote that, but just to make sure. Meanwhile i will dig into the api calls.

Comment by Rob LaRubbio [ 19/Mar/12 ]

The client code you added to this ticket reproduces the error for me. When I first ran the code it throws an exception before getting to the line that has the issue:

Mon Mar 19 19:34:06 starting new replica set monitor for replica set setname with seed of 127.0.0.1:27017,127.0.0.1:27018
Mon Mar 19 19:34:06 successfully connected to seed 127.0.0.1:27017 for replica set setname
Mon Mar 19 19:34:06 changing hosts to { 0: "ip-10-171-135-64:27017", 1: "ip-10-171-135-64:27018" } from setname/
Mon Mar 19 19:34:06 trying to add new host ip-10-171-135-64:27017 to replica set setname
Mon Mar 19 19:34:06 successfully connected to new host ip-10-171-135-64:27017 in replica set setname
Mon Mar 19 19:34:06 trying to add new host ip-10-171-135-64:27018 to replica set setname
Mon Mar 19 19:34:06 successfully connected to new host ip-10-171-135-64:27018 in replica set setname
Mon Mar 19 19:34:06 successfully connected to seed 127.0.0.1:27018 for replica set setname
Mon Mar 19 19:34:06 replica set monitor for replica set setname started, address is setname/ip-10-171-135-64:27017,ip-10-171-135-64:27018
Mon Mar 19 19:34:06 [ReplicaSetMonitorWatcher] starting
terminate called after throwing an instance of 'mongo::UserException'
  what():  DBClientCursor next() called but more() is false
Aborted

I modified the code so if first checked more() on the cursor and when I reran it I reproduced the error:

Mon Mar 19 19:36:15 starting new replica set monitor for replica set setname with seed of 127.0.0.1:27017,127.0.0.1:27018
Mon Mar 19 19:36:15 successfully connected to seed 127.0.0.1:27017 for replica set setname
Mon Mar 19 19:36:15 changing hosts to { 0: "ip-10-171-135-64:27017", 1: "ip-10-171-135-64:27018" } from setname/
Mon Mar 19 19:36:16 trying to add new host ip-10-171-135-64:27017 to replica set setname
Mon Mar 19 19:36:16 successfully connected to new host ip-10-171-135-64:27017 in replica set setname
Mon Mar 19 19:36:16 trying to add new host ip-10-171-135-64:27018 to replica set setname
Mon Mar 19 19:36:16 successfully connected to new host ip-10-171-135-64:27018 in replica set setname
Mon Mar 19 19:36:16 successfully connected to seed 127.0.0.1:27018 for replica set setname
Mon Mar 19 19:36:16 replica set monitor for replica set setname started, address is setname/ip-10-171-135-64:27017,ip-10-171-135-64:27018
Mon Mar 19 19:36:16 [ReplicaSetMonitorWatcher] starting
No results for a
	 B { $err: "unauthorized db:b lock type:-1 client:127.0.0.1", code: 10057 }

The only change to the code you added to the ticket was checking more before calling next():

    if ( autocursor->more() ) {
      cout << "\t A " << autocursor->next() << endl;
    } else {
      cout << "No results for a" << endl;
    }

Comment by Rob LaRubbio [ 17/Mar/12 ]

Thanks, but stop working on the weekend. I'll give the above a try shortly.

Comment by siddharth.singh@10gen.com [ 17/Mar/12 ]

Hi Rob,

With the steps that you reported above I was able to reproduce the behavior. To make sure that the issue is with the C++ driver or with the server, I went ahead and wrote a C++ client (slaveQueryOkTest.cpp) and tested it against the database. The client code works fine which makes me believe that the issue is with the lua wrapper.

Can you please run this client in your environment. Configure your replica set pretty much the same way you did above and please run this and let me know if it fails.

This link might be of help : http://www.mongodb.org/pages/viewpage.action?pageId=133415#C%2B%2BTutorial-WritingClientCode

Thanks.

Comment by Rob LaRubbio [ 13/Mar/12 ]

I was able to reproduce this using several different setups. The issue occured in all configurations.

All servers are running ubuntu

First attempt used 64bit 11.04 client and 32bit (by mistake) 11.10 server running 2.0.3 installed via apt-get. The driver is the one we use in production (compiled from a tarball downloaded 1/24/12). This reproduced the problem.

Second attempt pulled master from git of luamongo and the latest version of the c++ driver. Same servers and it reproduced the problem.

Third attempt moved to a new 64bit 11.10 server. I pulled master of luamongo, the latest tarball of the driver and the 2.0.2 tarball of mongo on the server. I built all locally. Still exhibits the issue.

Here are the detailed steps I took:

Create EC2 machine with ami from ubuntu (ami-6da8f128 http://cloud-images.ubuntu.com/releases/11.10/release/)

# Download mongodb and driver
wget http://downloads.mongodb.org/cxx-driver/mongodb-linux-x86_64-v2.0-latest.tgz
wget http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-2.0.2.tgz
 
# Extract each
tar xvzf mongodb-linux-x86_64-2.0.2.tgz
tar xvzf mongodb-linux-x86_64-v2.0-latest.tgz
 
# Add mongo bin to PATH
export PATH=$PATH:~/mongodb-linux-x86_64-2.0.2/bin
 
# Grab dev tools and dependencies (May need to run apt-get update to download all)
sudo apt-get -y install tcsh scons libpcre++-dev libboost-dev libreadline-dev libboost-program-options-dev libboost-thread-dev libboost-filesystem-dev libboost-date-time-dev gcc g++ git lua5.1-dev make
 
# Grab latest luamongo (will need to add your github ssh key)
git clone git@github.com:moai/luamongo
 
# Compile mongo driver
cd mongo-cxx-driver-v2.0
sudo scons install
 
# Install where lua can load it
sudo cp libmongoclient.* /usr/lib
 
cd ~
 
# Compile luamongo
cd luamongo
sudo make
 
# Install luamongo
sudo mkdir -p /usr/lib/lua/5.1
sudo mv mongo.so /usr/lib/lua/5.1/
 
# Install lua
sudo apt-get install lua5.1
 
# Verify it's installed (Shouldn't see an error.  CTRL-C to exit)
lua -l mongo
 
# Set up servers and replica set
mkdir db1 db2
echo "thisismykey" > keyfile
chmod 600 keyfile
 
# Start the mongods in seperate terminals
mongod --dbpath ~/db1 --port 27017
mongod --dbpath ~/db2 --port 27018
 
# In a third terminal 
mongo admin
> db.addUser('admin', 'admin')
 
mongo admin --port 27018
> db.addUser('admin', 'admin')
 
# Kill and restart each server
mongod --dbpath ~/db1 --port 27017 --replSet setname --auth --keyFile ~/keyfile
mongod --dbpath ~/db2 --port 27018 --replSet setname --auth --keyFile ~/keyfile
 
# In a third terminal (You'll need to update the server names)
mongo admin -u admin -p admin
cfg = {
	"_id" : "setname",
	"members" : [
		{
			"_id" : 0,
			"host" : "ip-10-171-135-64:27017",
			"votes" : 2
		},
		{
			"_id" : 1,
			"host" : "ip-10-171-135-64:27018"
		}
	]
}
rs.initiate(cfg)
 
# Add the users for the test
mongo admin -u admin -p admin
> use a
> db.addUser('a', 'a')
> use b
> db.addUser('b', 'b')
 
# Run the test script
lua query.lua

I'm attaching the query.lua I used. You should be able to run it without changing it.

Comment by siddharth.singh@10gen.com [ 12/Mar/12 ]

I am running ubuntu 11.10 on a 64 bit machine. Mongo version is v2.0.2, Gcc version is 4.6.1, Lua version is 5.1.5 along with compat-5.1r5.

I compiled the latest C++ driver (http://dl.mongodb.org/dl/cxx-driver) to compile against lua.

I compiled mongo from source but it won't make any difference if you just picked up the binaries (as per your machine) from (http://www.mongodb.org/downloads) and executed it.

I then follow appropriate sections from (http://www.mongodb.org/display/DOCS/Security+and+Authentication#SecurityandAuthentication-ReplicaSetandShardingAuthentication) to setup a replica set authentication. I then added users for specific databases, started the replica set and executed the lua script.

Let me know how this goes.

On a separate note, the digestPassword in lua definitely needs some cleanup to make sure it is not doing the opposite of what it claims to do.

Comment by Rob LaRubbio [ 12/Mar/12 ]

Can you let me know more about your setup? Our production system is running ubuntu 11.04. The mongodb we are running is 2.0.2 installed from the 10gen package repository (package mongodb-10gen) I'm not aware what could be custom on the server so when setting up a test system I'd probably have the same setup as our production system. So if you pointers on what would be considered a clean system I can set that up.

Comment by Rob LaRubbio [ 12/Mar/12 ]

Correct, I was only looking at test case 1.

I don't think json has any affect on the issue, I'm just using it as a tool to inspect and print out lua tables. In our case when we issue the query we get back a lua table that contains the error message. You can also view it with this code:

      local r = q:next()
      -- When the auth is successful if no record is found then r is nil.  Convert to empty table
      r = r or {} 
      print("Code: " .. tostring(r.code))
      print("Err: " .. tostring(r['$err']))

If you are able to iterate over the results and those results contain the expect data not the error table then there is something else different between our setups. I'm not sure what it could be so if you have ideas I can collect that information for you.

Also in our setup we auth with digested passwords so we set digestPassword=false in the auth call. However I just reproduced the issue using plain text non-digested passwords.

Comment by siddharth.singh@10gen.com [ 12/Mar/12 ]

Hi Rob,

I included and printed the json and it shows everything as expected. I am even able to iterate over the cursor and print the results. As much as I want to see it break (so that I can fix it), it doesn't. Everything works as expected. I understand that this is reproducible in your production but can you please try to isolate this behavior on a test mongodb installation, separately.

Production environments typically have a complex customized setup and isolating any such behavior becomes hard, especially for someone like when I don't know the behind the scenes setup. It would be great if you can reproduce this behavior on a clean mongodb test install.

Comment by siddharth.singh@10gen.com [ 12/Mar/12 ]

Point (1) is not correct. If you see Test Case 2 and Test Case 3 in the code, I do authenticate the users against different database.

As for point 2, I was actually going a step ahead and was printing the values from the cursor, just to be very sure. If there would have been an issue the iterating over the cursor would not have returned any results. I removed that piece of code to minimize the output for you.

I will go ahead, include json and report back. I doubt if that's going to make any difference but I want to make sure to cover all the bases.

Comment by Rob LaRubbio [ 12/Mar/12 ]

Your script reproduces the error, but I notice two things that you need to adjust:

1) All of your users are authenticating against the same db. For us all of our users are in different dbs.

2) The query returns a cursor, so calling assert will not show the error. You need to inspect the return value. That's why I include json.

This will not show the error:

local q = assert(mongo:query('values','

{a:10}

' , nil, nil, nil, 4, nil))

Since mongo:query doesn't raise an error.

This will show you the error:

local q = assert(mongo:query('values','

{a:10}

' , nil, nil, nil, 4, nil))
print(json.encode(q:next())

That print will return this:

{"code":10057,"$err":"unauthorized db:db2 lock type:-1 client:50.18.15.22"}

Instead of the object I expected it to load.

So if I modify your script so it connects to our production servers and make the two changes mentioned above this reproduces the issue.

Comment by siddharth.singh@10gen.com [ 12/Mar/12 ]

Yes I did. And I ran it (with appropriate username/passwd) and it worked fine. I extended the script to test even more cases, all of which worked fine.

Comment by siddharth.singh@10gen.com [ 12/Mar/12 ]

I agree, that digestPassword should stop authentication completely. I still wanted to point that one out to make sure that you are aware about it.

Getting back to your issue, please see the attached script (authQueryTest.lua). Test Case 2 and 3 is how I believe you are using it. All the tests succeed for me and I am not able to repro the case.

Could you play with this script (perhaps on some test/prototype machine with mongo on it) and see if you can break it. If it does break, please feel free to report it on this thread. I did a quick search on JIRA trying to find if anybody else had reported a similar issue but nothing came up.

Comment by Rob LaRubbio [ 12/Mar/12 ]

Also I just want to see if you've looked at the query.lua script I attached that I used to reproduce the issue.

Comment by Rob LaRubbio [ 12/Mar/12 ]

One additional point, after we upgraded the database we also did not change the version of the c++ driver the lua driver was linked against. So truly the only thing to change was on the server side. I originally thought this issue was because of the mis-match in c++ driver versions but it persisted after I upgraded the client to match the server.

Comment by Rob LaRubbio [ 12/Mar/12 ]

If it was digestPassword then we wouldn't have been able to auth at all. I'm pretty sure this is something on the server side. (I'm almost positive). We were using 1.8.5 of the server with slaveOk and the system was running fine. We then upgraded our server to 2.0.2 without changing any lua code or the lua driver and we started to see this issue. The only piece of our stack that changed was the server.

In my testing it looks like if I try to re-auth an existing connection that had already been authed then the auth succeeds, but if a query is sent to the secondary it fails with an authorized exception. That is reproducible for me.

So what it looks like is happening is I have two users and one connection.

User A has Database 1
User B has Database 2

The do not have access to each others database. I have a replica set configured.

When my app starts up I create a single connection called conn. I do not immediately auth the connection.

In a loop, when I receive an http request I:
Figure out if it is user A or B
Auth conn using the users credentials
Issue a query

The loop succeeds on the auth regardless of the user, but the query will only work for the first user to make a request. So it seems like the second call to auth allows the new user to query the primary but not the secondary.

Comment by siddharth.singh@10gen.com [ 12/Mar/12 ]

Hi Rob,

There is something that caught my eye and I wanted to point that out to you and see if that could be the potential root cause.

If you look at lua documentation (https://github.com/moai/luamongo/wiki/MongoConnection) it says that if password is plain text digestPassword should
be set to true otherwise assumed to be pre-digested.

In the code (https://github.com/moai/luamongo/blob/master/mongo_dbclient.cpp) line 117 the comment says the opposite thing. It says set digestPassword to true if password is pre-digested.

Those seem like two contradicting statements.

I tried making some sense out of code, line 131 (https://github.com/moai/luamongo/blob/master/mongo_dbclient.cpp) but I am not familiar with lua semantics so I could not go far.

Passwords on my machine are not set in plain text.

Here's what I am seeing on my machine:

i) When I set my digestPassword as false I am not able to auth at all. Not even once.

ii) On setting it to true everything works fine for me. Multiple users, mulitple databases, multiple collections etc. everything ran as expected I ran quite a few tests trying different combinations and all of them passed successfully.

iii) Can you sanity check the digestPassword field with respect to your environment and see if that is causing the potential issue.

Thanks.

Comment by Rob LaRubbio [ 08/Mar/12 ]

I'm close to a repro case, it looks like the issue happens if you reauth a connection. I can repro it on our production servers, I don't have an isolated version of the script but I'll attach what I have with our usernames/passwords removed. Basically if you create two users on your test cluster and modify your query script so it essentially looks like this:

create connection

auth user1
query <-- Good

auth user2
query <-- Auth fails

Comment by siddharth.singh@10gen.com [ 07/Mar/12 ]

Hi Rob,

Please see the updated script (query.lua). I am not able to reproduce the behavior and the querying successfully works for me, with and without slaveOk. Are you seeing this happen on a more deterministic basis ?

Comment by Rob LaRubbio [ 07/Mar/12 ]

Sorry I didn't realize you were thinking the error came from authing the connection. For more information to hopefully resolve the confusion, we initial setup our connections like this:

 
-- I'm making these up, we actually load them from a config
local repl_set = "repl_set_1"
local servers = {"primary:10000","secondary:1000"}
 
mongodb = assert(mongo.ReplicaSet.New(repl_set, servers))
assert(mongodb:connect())  

The connect succeeds and we have no issue at this point. Now that our app has started up we start to listen for http connections and service each one. When a connection comes in we set up our sandbox and connect to a users database like this:

-- We first have to locate the user and their account information
local user = findUserInAdminDB(<info from request>)
 
local ret = assert(mongodb:auth({dbname=user.dbname, username=user.username, password=user.password, digestPassword=false}))
 
-- I did log the result and the arguments we sent in and they all appeared valid.  I believe the logs have been purged at this point.

That code succeeds. Eventually we load the users code and execute it. That code runs in our lua sandbox and eventually executes code like this:

-- Note this is calling 'mongo' not 'mongodb'
local cursor = mongo:query('datastore', '{key:"value"}')

In our sandbox we wrap 'mongodb' and expose it to users as 'mongo'. The modification is very slight. This is almost an exact copy of the sandboxing we do on top of luamongo (I've removed all methods except query):

function makeSandboxMongo(userDB)
  local mongodb = {}
 
  function mongodb:query(namespace, query, limit, skip, return_fields, options, batch_size)
    -- If we pass '4' in for options (which means slaveOk to the driver) it fails with an auth error
    -- If we remove that it succeeds
    return assert(mongo:query(user.dbname .. "." .. namespace, query, limit, skip, return_fields, options or 4, batch_size))
  end
 
  return mongodb
end
 
mongo = makeSandboxMongo(user.dbname)

So the issue isn't that the auth fails, it appears that valid auth isn't forwarded to a secondary when the drive sends a query their via slaveOk. Sorry for the confusion.

Comment by Rob LaRubbio [ 07/Mar/12 ]

Thanks for looking into this, but it looks like you've been digging into the wrong call.

We aren't getting the error when we make the auth call, we get the error when we attempt to issue a query.

Authing the connection works fine (and we don't direct the auth to a particular server, but rather at the replicaset in general)

Comment by siddharth.singh@10gen.com [ 07/Mar/12 ]

Hi Rob,

I tried reproducing this issue on my machine but with no success. I wrote a C++ client and a lua script (please see attached files) hoping to see the reported behavior and fix it. Unfortunately, I could not reproduce the reported behavior and authentication worked fine when run against, both, the primary and the secondary.

I understand that this doesn't help you with your production issue of not being able to run with slaveOk on secondaries. My best guess at this point of time is that it might have to do something with client key, that you reported earlier.

Thank for you for filing this ticket though. In future if we see a similar issue, it will be a good data point to refer to. Also, if you are able to get a reproducible test case for it, please feel free to file an updated ticket with us.

Thanks.

Comment by Rob LaRubbio [ 06/Mar/12 ]

We're using Lua 5.1.4

As far as the lua script, we building a paas that game developers can use to run lua code that servers as the backend for their games. We sandox our lua calls, the one for query looks like this:

    function mongodb:query(namespace, query, limit, skip, return_fields, options, batch_size)
    return assert(mongo:query(prefix .. namespace, query, limit, skip, return_fields, options or 4, batch_size))
end

prefix is set to the users database. A simple user script that was failing looked like this:

function query()
  local cursor = mongodb:query('datastore', '{key:"value"}')
end
 
function main(web, req)
	query()
end

The actual script had a little more to it like getting a param from the query string and doing something from the result, but the additionally code was very limited and above is an acurate representation of the mongodb interactions.

Going into as much detail as I can, within our backend system we create three connections to two seperate mongodb replica sets. One replica set (we'll call it "master") holds our master database that we use for user accounts. The second (We'll call it "users") holds users data and code within gridfs. We recent split our master database out into it's own server, but we had the same issue when it was a seperate db within a single replica set. The two connections we create to the users server have to do with a user accessing the server with a clientkey. In that case there are two logical users of the service, one is the user accessing it and the other is the user who wrote the service. In that example we'll auth each connection against each users database.

So the basic auth flow is as follows:

1) At startup time we create three connections. We auth the connection to our master database but not the two connections to our users database.
2) When a request comes in we discover the user(s)
3) We auth one or both connections against the appropiate users database
4) We construct the sandbox, load the users code and execute it.

The connection that was failing was to the users database in situations without a client key. I did not see the error in cases where the system was called with a client key but again since I'm not able to reliable reproduce this that might be important or just coincidental. We do cache the authentication so in the event a user calls the system twice in a row we don't re-auth. I a small percentage of uses use the clientkey feature so that connection was probably not being re-authed as frequently as the other. We added that cacheing since the log message for each auth was filling up our disk (I put a pull request in for that)

Comment by siddharth.singh@10gen.com [ 06/Mar/12 ]

Hi Rob,

I am trying to reproduce this issue. It will be helpful if you could please send me the following:

i) version of lua you are using.

ii) lua script showing exactly how are you using the luamongo binding. I understand that this is not definitely reproducible but it will give me a good starting point.

iii) If you could also describe your authentication setup in general, that would be great.

Thanks.

Generated at Thu Feb 08 03:07:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.