Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8039

Authentication repeatedly fails after major network issue

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Networking, Security
    • Labels:
      None
    • ALL

      Hello!
      I use mongodb 2.2.1 with 1 mongos and 2 shards, 3 replicas each.
      We had a major network issue, when mongos and two(one from each shard) mongod's became isolated. When network was up again, we experienced following problem:

      Wed Dec 26 20:02:37 [conn1183] authenticate db: partition-validator

      { authenticate: 1, user: "auto", nonce: "5b3de430aaacf9a5", key: "6eb58aa336bfc1eef5779a04761f83d3" }

      Wed Dec 26 20:02:37 [conn1183] auth: couldn't find user auto, partition-validator.system.users
      Wed Dec 26 20:02:37 [conn1183] end connection 192.168.101.11:48742 (103 connections now open)
      Wed Dec 26 20:02:47 [Balancer] distributed lock 'balancer/cs-mongodb01g:27000:1356537611:1804289383' acquired, ts : 50db1fa69f69900c8fba7b3b
      Wed Dec 26 20:02:47 [Balancer] distributed lock 'balancer/cs-mongodb01g:27000:1356537611:1804289383' unlocked.
      Wed Dec 26 20:02:56 [mongosMain] connection accepted from 192.168.101.11:48747 #1184 (104 connections now open)
      Wed Dec 26 20:02:56 [conn1184] end connection 192.168.101.11:48747 (103 connections now open)
      Wed Dec 26 20:02:56 [mongosMain] connection accepted from 192.168.101.11:48748 #1185 (104 connections now open)
      Wed Dec 26 20:02:56 [conn1185] end connection 192.168.101.11:48748 (103 connections now open)
      Wed Dec 26 20:02:56 [mongosMain] connection accepted from 192.168.101.11:48749 #1186 (104 connections now open)
      Wed Dec 26 20:02:56 [conn1186] authenticate db: partition-validator

      { authenticate: 1, user: "auto", nonce: "8bb8fc07201771c5", key: "d5f331c56d20afdff4c80a95db068e8b" }

      Wed Dec 26 20:02:56 [conn1186] auth: couldn't find user auto, partition-validator.system.users
      Wed Dec 26 20:02:56 [conn1186] end connection 192.168.101.11:48749 (103 connections now open)

      Network started about 19:00 and was fixed exactly in 20:00, authentication was ok before 20:00, during network fail too.

      Some logs from the interval 19:00-20:00
      Wed Dec 26 19:48:44 [conn16453835] Socket say send() errno:32 Broken pipe 192.168.101.16:27017
      Wed Dec 26 19:48:44 [conn16453835] warning: could not get last error from a shard cs_mongodb-sh2/cs-mongodb02d:27017,cs-mongodb02f:27017,cs-mongodb02g:27017 :: caused by :: socket exception [SEND_ERROR] for 192.168.101.16:27017
      Wed Dec 26 19:48:51 [Balancer] warning: distributed lock 'balancer/cs-mongodb01g:27000:1355404365:864101839 did not propagate properly. :: caused by :: 8017 update not consistent ns: config.locks query
      :

      { _id: "balancer", state: 0, ts: ObjectId('50db1c5f3fa487ae2f5aff90') }

      update: { $set:

      { state: 1, who: "cs-mongodb01g:27000:1355404365:864101839:Balancer:636453333", process: "cs-mongodb01g:27000:13 55404365:864101839", when: new Date(1356536930010), why: "doing balance round", ts: ObjectId('50db1c61326d9641ed1f7228') }

      } gle1:

      { updatedExisting: true, n: 1, connectionId: 36369, waited: 34, err: nu ll, ok: 1.0 }

      gle2:

      { updatedExisting: false, n: 0, connectionId: 38633, waited: 18, err: null, ok: 1.0 }

      Wed Dec 26 19:48:51 [Balancer] lock update lost, lock 'balancer/cs-mongodb01g:27000:1355404365:864101839' not propagated.
      Wed Dec 26 19:48:51 [Balancer] distributed lock 'balancer/cs-mongodb01g:27000:1355404365:864101839' was not acquired.
      Wed Dec 26 19:48:55 [conn16452947] Assertion failure cursorCache.getRef( cursor ).size() src/mongo/s/request.cpp 182
      0x787d86 0x754d81 0x6e94fd 0x707996 0x6e9eaf 0x5149c1 0x776c49 0x7f9283d779ca 0x7f928311e70d
      /usr/bin/mongos(_ZN5mongo15printStackTraceERSo+0x26) [0x787d86]
      /usr/bin/mongos(_ZN5mongo12verifyFailedEPKcS1_j+0xc1) [0x754d81]
      /usr/bin/mongos(_ZN5mongo7Request5replyERNS_7MessageERKSs+0x19d) [0x6e94fd]
      /usr/bin/mongos(_ZN5mongo13ShardStrategy7getMoreERNS_7RequestE+0x376) [0x707996]
      /usr/bin/mongos(_ZN5mongo7Request7processEi+0x14f) [0x6e9eaf]
      /usr/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x71) [0x5149c1]
      /usr/bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x429) [0x776c49]
      /lib/libpthread.so.0(+0x69ca) [0x7f9283d779ca]
      /lib/libc.so.6(clone+0x6d) [0x7f928311e70d]
      Wed Dec 26 19:49:01 [conn16453011] Assertion failure cursorCache.getRef( cursor ).size() src/mongo/s/request.cpp 182
      0x787d86 0x754d81 0x6e94fd 0x707996 0x6e9eaf 0x5149c1 0x776c49 0x7f9283d779ca 0x7f928311e70d
      /usr/bin/mongos(_ZN5mongo15printStackTraceERSo+0x26) [0x787d86]
      /usr/bin/mongos(_ZN5mongo12verifyFailedEPKcS1_j+0xc1) [0x754d81]
      /usr/bin/mongos(_ZN5mongo7Request5replyERNS_7MessageERKSs+0x19d) [0x6e94fd]
      /usr/bin/mongos(_ZN5mongo13ShardStrategy7getMoreERNS_7RequestE+0x376) [0x707996]
      /usr/bin/mongos(_ZN5mongo7Request7processEi+0x14f) [0x6e9eaf]
      /usr/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x71) [0x5149c1]
      /usr/bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x429) [0x776c49]
      /lib/libpthread.so.0(+0x69ca) [0x7f9283d779ca]
      /lib/libc.so.6(clone+0x6d) [0x7f928311e70d]

            Assignee:
            spencer@mongodb.com Spencer Brody (Inactive)
            Reporter:
            gabol Anton V. Volokhov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: