Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48709

signing key generator thread on config server not waken up as expected

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.10, 4.4.1, 3.6.20, 4.7.0, 4.0.21
    • Component/s: None
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4, v4.2, v4.0, v3.6
    • Sprint:
      Sharding 2020-07-27

      Description

      hello,

      we have encountered the issue SERVER-47553 in Alibaba Cloud hosted mongodb instances, we notice that the mongos crash issue has been fixed, which was caused by throwing exception in destructor function when use ON_BLOCK_EXIT to call appendRequiredFieldsToResponse, and that will cause mongos to call std::terminate().

      however, the fix about SERVER-47553  just avoid mongos to crash, but the root issue looks unresolved. because after we apply this patch, it seems mongos can't connect mongod server nodes yet,  so we dig out this problem.  seems like there is a bug in monitoring-keys-for-HMAC thread on the primary node of config server, would cause signing keys not generated  by the KeysRotationIntervalSec interval, and when mongos call KeysCollectionManager::refreshNow to ask config server for new signing keys, it will fail with a timeout exception, which cause this problem to happen.

      I am sure the root cause is a bug in "howMuchSleepNeedFor" function, which caculate the wake-up interval for monitoring-keys-for-HMAC thread on the primary node of config server:

      auto millisBeforeExpire = 1000 * (expiredSecs - currentSecs);

      here expiredSecs and currentSecs are type of unsigned int, and the default wake-up interval is 90days(7776000 seconds), after a unit conversion to mills, it will be 7776000000, which will be an overflow value since the max is 4294967295

      this will cause a serious problem, because mongos can't reconnect mongod server nodes even if after restart many times,  a feasible resolution is to restart config server nodes and this will trigger monitoring-keys-for-HMAC thread to generate new signing keys, and mongos can reconnect successfully after that.

       

       

       

        Attachments

          Activity

            People

            Assignee:
            jack.mulrow Jack Mulrow
            Reporter:
            jcli.china@gmail.com Jingcheng Li
            Participants:
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: