[SERVER-52654] new signing keys not generated by the monitoring-keys-for-HMAC thread Created: 06/Nov/20 Updated: 22/Jan/24 Resolved: 10/Dec/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.2.10 |
| Fix Version/s: | 4.0.22, 3.6.22, 4.4.3, 4.2.12 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Jingcheng Li | Assignee: | Jack Mulrow |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.4, v4.2, v4.0, v3.6
|
||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2020-12-14 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Description |
|
Issue Status as of Jan 7, 2021 ISSUE DESCRIPTION AND IMPACT The bug causes a failure of the thread that creates new Hash-based Message Authentication Code (HMAC) signing keys every 90 days. New keys are generated when the Config Server Replica Set (CSRS) fails over. So, if a failover does not happen on the CSRS for 90 days, operations across the sharded cluster will start to fail and will not succeed again until the CSRS fails over. DIAGNOSIS AND AFFECTED VERSIONS MongoDB 4.2.2 to 4.2.11 and 4.4.0 to 4.4.2 are affected. The bug may exist in previous versions but mechanisms other than failover cause the CSRS primary to re-generate the HMAC keys successfully in those versions. To check the expiration date of the HMAC keys, use a mongo shell to connect to a mongos node, or the CSRS primary, authenticate as a user with admin privilege and run the following command to check the expiration date for the HMAC signing keys. The cluster will experience this issue when all the HMAC signing keys expire.
To perform this check the database user must have permissions to query the admin.system.keys collection. To grant these permissions, create a new role with the find action on the admin.system.keys collection and grant this role to an admin user with the following commands, replacing ADMIN with the username:
REMEDIATION AND WORKAROUNDS The fix is included in the 3.6.22, 4.0.22, 4.2.12 and 4.4.3 production releases and later. To prevent the issue before upgrading to a fixed release, step down the CSRS primary to initiate a failover before the 90 days limit is reached. Original DescriptionI see the overflow issue |
| Comments |
| Comment by Ilan M [ 13/Apr/21 ] | ||
|
"To prevent the issue before upgrading to a fixed release, step down the CSRS primary to initiate a failover before the 90 days limit is reached."
Could you clarify on the above statement whether we need to just restart the config primary or all config nodes ? as a workaround for this fix. Thank you. | ||
| Comment by jun park [ 06/Apr/21 ] | ||
|
While using version 4.4.1, read/write did not work and mongos was not connected. When I checked the expiration date of the key, it was the time when mongos could not connect, and it was also the time after the wrong query was called to mongodb. Could it be triggered by a wrong query? | ||
| Comment by Aayushi Mangal [ 01/Mar/21 ] | ||
|
Hi Jack Mulrow/ jcli.china@gmail.com. How to reproduce this issue, could you please share the steps. I tried by making system clock ahead but that will not work here. I would like to reproduce it for 4.2.10. | ||
| Comment by Githook User [ 10/Dec/20 ] | ||
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: (cherry picked from commit e804031ae4ea69c2cfbfcca47202fcc468d826b2) | ||
| Comment by Githook User [ 10/Dec/20 ] | ||
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: (cherry picked from commit e804031ae4ea69c2cfbfcca47202fcc468d826b2) | ||
| Comment by Githook User [ 10/Dec/20 ] | ||
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: (cherry picked from commit e804031ae4ea69c2cfbfcca47202fcc468d826b2) | ||
| Comment by Githook User [ 10/Dec/20 ] | ||
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: (cherry picked from commit e804031ae4ea69c2cfbfcca47202fcc468d826b2) | ||
| Comment by Githook User [ 10/Dec/20 ] | ||
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: | ||
| Comment by DEokhyun Lee [ 09/Dec/20 ] | ||
|
Hi~ We had the same problem while using 4.2. It seems that this issue may also occur in 3.6 and 4.0. Thank you~ | ||
| Comment by Jingcheng Li [ 12/Nov/20 ] | ||
|
Hello, When I try to reproduce this problem, I use pstack command to dump the call stack of monitoring-keys-for-HMAC thread, and then I do some text processes for the pstack result, I notice that the monitoring-keys-for-HMAC thread finally use poll to sleep and wait for a wake-up event until reaching a deadline time, Unfortunately, the third argument of the system call 'poll' is type of signed int and the unit of time is also millisecond, since the howMuchSleepNeedFor function use a timeout about 90days(7776000000 ms), as 7776000000 is an overflow value for signed int type, the result will be an negative value(-813934592) after a type conversion, which will cause an infinite time of sleep and the thread never be waken up. So I think the solution is simple, ajust the sleep interval to a less value than INT_MAX will fix this issue. FYI, Thanks!
| ||
| Comment by Kelsey Schubert [ 09/Nov/20 ] | ||
|
Thanks for the report, jcli.china@gmail.com. We'll investigate. |