Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40762

Mongos does not accept new connections after mongos restart

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Networking
    • Labels:
      None
    • ALL
      • upgrade cluster from 3.4 to 3.6 with enable FCV 3.6 followed by 2nd mongoSes roll-restart
      • set FCV to 3.4
      • restart one mongos

      After we did a MongoDB cluster upgrade from 3.4 to 3.6 we've noticed a high load on 1st shard. The load was related with high amount of updates made agains config.systems.sessions collection. To mitigate the high load on shard1 we've change FCV from 3.6 back to 3.4.

      Few days after we had an mongoS instance restarted (underlying hardware failure). We've noticed that after the reboot, although the mongos service was up and running, it wasn't handling any traffic. 

      The application is generating these log lines only if is set with debug mode on:

      Command failed with error 211 (KeyNotFound): 'Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1553758276, 1424) } with id: 6672952338308661763' on server mongos13:27017. The full response is { "ok" : 0.0, "errmsg" : "Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1553758276, 1424) } with id: 6672952338308661763", "code" : 211, "codeName" : "KeyNotFound" }
      

      These lines appear in mongoS log if we increase verbosity to 2 for network engine:

      2019-04-22T07:18:10.769+0000 D ASIO [monitoring keys for HMAC] startCommand: RemoteCommand 30519803 -- target:cfg02:27019 db:admin expDate:2019-04-22T07:18:40.769+0000 cmd:{ find: "system.keys", filter: { purpose: "HMAC", expiresAt: { $gt: Timestamp(0, 0) } }, sort: { expiresAt: 1 }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp(1555917490, 9), t: 13 } }, maxTimeMS: 30000 } 
      2019-04-22T07:18:10.769+0000 D ASIO [NetworkInterfaceASIO-ShardRegistry-0] Starting asynchronous command 30519803 on host cfg02:27019 
      2019-04-22T07:18:10.770+0000 D ASIO [NetworkInterfaceASIO-ShardRegistry-0] Request 30519803 finished with response: { cursor: { id: 0, ns: "admin.system.keys", firstBatch: [] }, ok: 1.0, $replData: { term: 13, lastOpCommitted: { ts: Timestamp(1555917490, 9), t: 13 }, lastOpVisible: { ts: Timestamp(1555917490, 9), t: 13 }, configVersion: 9, replicaSetId: ObjectId('59b0fb4ec5311a5aa291a03e'), primaryIndex: 0, syncSourceIndex: 0 }, $gleStats: { lastOpTime: Timestamp(0, 0), electionId: ObjectId('000000000000000000000000') } } 
      

       

        1. mgs13.log.gz
          6.66 MB
        2. test_mongo_connections.rb
          2 kB
        3. test_mongo.rb
          1 kB

            Assignee:
            daniel.hatcher@mongodb.com Danny Hatcher (Inactive)
            Reporter:
            velcea@adobe.com razvan velcea
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: