[SERVER-8039] Authentication repeatedly fails after major network issue Created: 27/Dec/12  Updated: 08/Mar/13  Resolved: 07/Jan/13

Status: Closed
Project: Core Server
Component/s: Networking, Security
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Anton V. Volokhov Assignee: Spencer Brody (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hello!
I use mongodb 2.2.1 with 1 mongos and 2 shards, 3 replicas each.
We had a major network issue, when mongos and two(one from each shard) mongod's became isolated. When network was up again, we experienced following problem:

Wed Dec 26 20:02:37 [conn1183] authenticate db: partition-validator

{ authenticate: 1, user: "auto", nonce: "5b3de430aaacf9a5", key: "6eb58aa336bfc1eef5779a04761f83d3" }

Wed Dec 26 20:02:37 [conn1183] auth: couldn't find user auto, partition-validator.system.users
Wed Dec 26 20:02:37 [conn1183] end connection 192.168.101.11:48742 (103 connections now open)
Wed Dec 26 20:02:47 [Balancer] distributed lock 'balancer/cs-mongodb01g:27000:1356537611:1804289383' acquired, ts : 50db1fa69f69900c8fba7b3b
Wed Dec 26 20:02:47 [Balancer] distributed lock 'balancer/cs-mongodb01g:27000:1356537611:1804289383' unlocked.
Wed Dec 26 20:02:56 [mongosMain] connection accepted from 192.168.101.11:48747 #1184 (104 connections now open)
Wed Dec 26 20:02:56 [conn1184] end connection 192.168.101.11:48747 (103 connections now open)
Wed Dec 26 20:02:56 [mongosMain] connection accepted from 192.168.101.11:48748 #1185 (104 connections now open)
Wed Dec 26 20:02:56 [conn1185] end connection 192.168.101.11:48748 (103 connections now open)
Wed Dec 26 20:02:56 [mongosMain] connection accepted from 192.168.101.11:48749 #1186 (104 connections now open)
Wed Dec 26 20:02:56 [conn1186] authenticate db: partition-validator

{ authenticate: 1, user: "auto", nonce: "8bb8fc07201771c5", key: "d5f331c56d20afdff4c80a95db068e8b" }

Wed Dec 26 20:02:56 [conn1186] auth: couldn't find user auto, partition-validator.system.users
Wed Dec 26 20:02:56 [conn1186] end connection 192.168.101.11:48749 (103 connections now open)

Network started about 19:00 and was fixed exactly in 20:00, authentication was ok before 20:00, during network fail too.

Some logs from the interval 19:00-20:00
Wed Dec 26 19:48:44 [conn16453835] Socket say send() errno:32 Broken pipe 192.168.101.16:27017
Wed Dec 26 19:48:44 [conn16453835] warning: could not get last error from a shard cs_mongodb-sh2/cs-mongodb02d:27017,cs-mongodb02f:27017,cs-mongodb02g:27017 :: caused by :: socket exception [SEND_ERROR] for 192.168.101.16:27017
Wed Dec 26 19:48:51 [Balancer] warning: distributed lock 'balancer/cs-mongodb01g:27000:1355404365:864101839 did not propagate properly. :: caused by :: 8017 update not consistent ns: config.locks query
:

{ _id: "balancer", state: 0, ts: ObjectId('50db1c5f3fa487ae2f5aff90') }

update: { $set:

{ state: 1, who: "cs-mongodb01g:27000:1355404365:864101839:Balancer:636453333", process: "cs-mongodb01g:27000:13 55404365:864101839", when: new Date(1356536930010), why: "doing balance round", ts: ObjectId('50db1c61326d9641ed1f7228') }

} gle1:

{ updatedExisting: true, n: 1, connectionId: 36369, waited: 34, err: nu ll, ok: 1.0 }

gle2:

{ updatedExisting: false, n: 0, connectionId: 38633, waited: 18, err: null, ok: 1.0 }

Wed Dec 26 19:48:51 [Balancer] lock update lost, lock 'balancer/cs-mongodb01g:27000:1355404365:864101839' not propagated.
Wed Dec 26 19:48:51 [Balancer] distributed lock 'balancer/cs-mongodb01g:27000:1355404365:864101839' was not acquired.
Wed Dec 26 19:48:55 [conn16452947] Assertion failure cursorCache.getRef( cursor ).size() src/mongo/s/request.cpp 182
0x787d86 0x754d81 0x6e94fd 0x707996 0x6e9eaf 0x5149c1 0x776c49 0x7f9283d779ca 0x7f928311e70d
/usr/bin/mongos(_ZN5mongo15printStackTraceERSo+0x26) [0x787d86]
/usr/bin/mongos(_ZN5mongo12verifyFailedEPKcS1_j+0xc1) [0x754d81]
/usr/bin/mongos(_ZN5mongo7Request5replyERNS_7MessageERKSs+0x19d) [0x6e94fd]
/usr/bin/mongos(_ZN5mongo13ShardStrategy7getMoreERNS_7RequestE+0x376) [0x707996]
/usr/bin/mongos(_ZN5mongo7Request7processEi+0x14f) [0x6e9eaf]
/usr/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x71) [0x5149c1]
/usr/bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x429) [0x776c49]
/lib/libpthread.so.0(+0x69ca) [0x7f9283d779ca]
/lib/libc.so.6(clone+0x6d) [0x7f928311e70d]
Wed Dec 26 19:49:01 [conn16453011] Assertion failure cursorCache.getRef( cursor ).size() src/mongo/s/request.cpp 182
0x787d86 0x754d81 0x6e94fd 0x707996 0x6e9eaf 0x5149c1 0x776c49 0x7f9283d779ca 0x7f928311e70d
/usr/bin/mongos(_ZN5mongo15printStackTraceERSo+0x26) [0x787d86]
/usr/bin/mongos(_ZN5mongo12verifyFailedEPKcS1_j+0xc1) [0x754d81]
/usr/bin/mongos(_ZN5mongo7Request5replyERNS_7MessageERKSs+0x19d) [0x6e94fd]
/usr/bin/mongos(_ZN5mongo13ShardStrategy7getMoreERNS_7RequestE+0x376) [0x707996]
/usr/bin/mongos(_ZN5mongo7Request7processEi+0x14f) [0x6e9eaf]
/usr/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x71) [0x5149c1]
/usr/bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x429) [0x776c49]
/lib/libpthread.so.0(+0x69ca) [0x7f9283d779ca]
/lib/libc.so.6(clone+0x6d) [0x7f928311e70d]



 Comments   
Comment by Anton V. Volokhov [ 07/Jan/13 ]

Thanks for help, anyway.

Comment by Spencer Brody (Inactive) [ 07/Jan/13 ]

Unfortunately, I don't believe we have enough information here to understand exactly what happened. If this happens again, please increase the logging verbosity on the mongos and the mongod with the problems by running

db.adminCommand({setParameter:1, logLevel:2});

and attach the logs to this ticket, after re-opening it, before you re-create the user.

Comment by Anton V. Volokhov [ 07/Jan/13 ]

Sorry, they died yesterday, due to logrotate configuration. I have only those, I've sent in a first post.
I know only, that they were like
Wed Dec 26 20:02:37 [conn1183] authenticate db: partition-validator

{ authenticate: 1, user: "auto", nonce: "5b3de430aaacf9a5", key: "6eb58aa336bfc1eef5779a04761f83d3" }

Wed Dec 26 20:02:37 [conn1183] auth: couldn't find user auto, partition-validator.system.users

Comment by Spencer Brody (Inactive) [ 07/Jan/13 ]

Can you please attach the full logs from the mongos and the primary of the shard in question from the time when authentication was failing?

Comment by Anton V. Volokhov [ 07/Jan/13 ]

Recreating the user via db.addUser("auto","foo") fixed the problem.
db.system.users.find():

{ "_id" : ObjectId("50a259c505684816b2f2b6e9"), "user" : "auto", "readOnly" : false, "pwd" : "d1b48490da7ff0ad12c01648ea0e7a78" }

It was the equally the same when the user was corrupted.
Now everything works fine, and I'm just trying to figure out the reason of a problem and how to avoid it in future.

Comment by Spencer Brody (Inactive) [ 07/Jan/13 ]

What's the current status of this? Are you still having problems authenticating to users in this database on this machine?

If so, can you attach the output of

use partition-validator;
db.system.users.find();

run against mongos as well as the node that's having the authentication issues?

Comment by Anton V. Volokhov [ 05/Jan/13 ]

Only mongos and isolated node were restarted. Same behavior experienced both through mongos and replica's master. Didn't test on replicas's secondary, unfortunately.

Comment by Spencer Brody (Inactive) [ 04/Jan/13 ]

Hmm, that seems strange. Have you tried restarting this node since the network problems? Are you connecting to the mongos or a mongod directly? If the latter, do you have the same problem with all nodes in the replica set, or just the one that had the network outage?

Comment by Anton V. Volokhov [ 27/Dec/12 ]

So, I couldn't login via shell. When I logon as admin, I saw this user "auto" in db.system.users collection, hash for this username was exactly the same as in the other databases, where this use/pwd pair exists. Logging in the other databases was successful.

Generated at Thu Feb 08 03:16:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.