Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-52654

new signing keys not generated by the monitoring-keys-for-HMAC thread

    • Fully Compatible
    • ALL
    • v4.4, v4.2, v4.0, v3.6
    • Sharding 2020-12-14

      Issue Status as of Jan 7, 2021

      ISSUE DESCRIPTION AND IMPACT

      The bug causes a failure of the thread that creates new Hash-based Message Authentication Code (HMAC) signing keys every 90 days.

      New keys are generated when the Config Server Replica Set (CSRS) fails over. So, if a failover does not happen on the CSRS for 90 days, operations across the sharded cluster will start to fail and will not succeed again until the CSRS fails over.

      DIAGNOSIS AND AFFECTED VERSIONS

      MongoDB 4.2.2 to 4.2.11 and 4.4.0 to 4.4.2 are affected. The bug may exist in previous versions but mechanisms other than failover cause the CSRS primary to re-generate the HMAC keys successfully in those versions.

      To check the expiration date of the HMAC keys, use a mongo shell to connect to a mongos node, or the CSRS primary, authenticate as a user with admin privilege and run the following command to check the expiration date for the HMAC signing keys. The cluster will experience this issue when all the HMAC signing keys expire.

      db.getSiblingDB("admin").system.keys.find().map(k => { return { _id: k._id, purpose: k.purpose, expiresAt: new Date(k.expiresAt.getTime()*1000) }})
      

      To perform this check the database user must have permissions to query the admin.system.keys collection. To grant these permissions, create a new role with the find action on the admin.system.keys collection and grant this role to an admin user with the following commands, replacing ADMIN with the username:

      use admin;
      
      db.createRole({
        role: "query_keys",
        privileges: [
           { resource: { db: "admin", collection: "system.keys"}, actions: [ "find" ] },
        ],
        roles: [  ]
      });
      
      db.grantRolesToUser("ADMIN", ["query_keys"])
      

      REMEDIATION AND WORKAROUNDS

      The fix is included in the 3.6.22, 4.0.22, 4.2.12 and 4.4.3 production releases and later. To prevent the issue before upgrading to a fixed release, step down the CSRS primary to initiate a failover before the 90 days limit is reached.

      Original Description

      I see the overflow issue SERVER-48709 is fixed, but the problem already happens after we upgraded the config server to a version of 4.2.10, new signing keys not generated by the monitoring-keys-for-HMAC thread, after 90 or 180 days, when the signing keys are expired, mongos can't connect mongod server nodes successfully. we have to restart the config server, so that new signing keys will be generated when monitoring-keys-for-HMAC thread start, and then mongos successfully connect mongod server nodes again.
      I think the root cause of SERVER-47553 and SERVER-48709 maybe is the same, but it have not been digged out, as this issue may cause unexpected downtime for our service, it's a very serious problem, wish it can be fixed ASAP, Thanks!

            Assignee:
            jack.mulrow@mongodb.com Jack Mulrow
            Reporter:
            jcli.china@gmail.com Jingcheng Li
            Votes:
            2 Vote for this issue
            Watchers:
            29 Start watching this issue

              Created:
              Updated:
              Resolved: