[SERVER-48709] signing key generator thread on config server not waken up as expected Created: 11/Jun/20 Updated: 29/Oct/23 Resolved: 21/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.1, 3.6.20, 4.7.0, 4.2.10, 4.0.21 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jingcheng Li | Assignee: | Jack Mulrow |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.4, v4.2, v4.0, v3.6
|
||||||||||||
| Sprint: | Sharding 2020-07-27 | ||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Description |
|
hello, we have encountered the issue SERVER-47553 in Alibaba Cloud hosted mongodb instances, we notice that the mongos crash issue has been fixed, which was caused by throwing exception in destructor function when use ON_BLOCK_EXIT to call appendRequiredFieldsToResponse, and that will cause mongos to call std::terminate(). however, the fix about SERVER-47553 just avoid mongos to crash, but the root issue looks unresolved. because after we apply this patch, it seems mongos can't connect mongod server nodes yet, so we dig out this problem. seems like there is a bug in monitoring-keys-for-HMAC thread on the primary node of config server, would cause signing keys not generated by the KeysRotationIntervalSec interval, and when mongos call KeysCollectionManager::refreshNow to ask config server for new signing keys, it will fail with a timeout exception, which cause this problem to happen. I am sure the root cause is a bug in "howMuchSleepNeedFor" function, which caculate the wake-up interval for monitoring-keys-for-HMAC thread on the primary node of config server: auto millisBeforeExpire = 1000 * (expiredSecs - currentSecs); here expiredSecs and currentSecs are type of unsigned int, and the default wake-up interval is 90days(7776000 seconds), after a unit conversion to mills, it will be 7776000000, which will be an overflow value since the max is 4294967295 this will cause a serious problem, because mongos can't reconnect mongod server nodes even if after restart many times, a feasible resolution is to restart config server nodes and this will trigger monitoring-keys-for-HMAC thread to generate new signing keys, and mongos can reconnect successfully after that.
|
| Comments |
| Comment by Jingcheng Li [ 10/Nov/20 ] |
|
OK, I had reopen a new ticket SERVER-52654, Thanks ! |
| Comment by Garaudy Etienne [ 06/Nov/20 ] |
|
jcli.china@gmail.com can you reopen the ticket or file a new ticket mentioning the lingering issue you're observing please? |
| Comment by Jingcheng Li [ 05/Nov/20 ] |
|
hello @Carl Champain, I see the overflow issue is fixed, but the problem already happens after we upgraded the config server to a version of 4.2.10, new signing keys not generated by the monitoring-keys-for-HMAC thread, and mongos can't reconnect mongod server nodes. I think the root cause of SERVER-47553 and this issue is the same, but it have not been digged out, as this issue may cause unexpected downtime for our service, it's a very serious problem, wish it can be fixed ASAP, Thanks! |
| Comment by Githook User [ 12/Aug/20 ] |
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: (cherry picked from commit 0cb70e9577c46257798d0385b15ec6bff8dbd28d) |
| Comment by Githook User [ 12/Aug/20 ] |
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: (cherry picked from commit 0cb70e9577c46257798d0385b15ec6bff8dbd28d) |
| Comment by Githook User [ 12/Aug/20 ] |
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: (cherry picked from commit 0cb70e9577c46257798d0385b15ec6bff8dbd28d) |
| Comment by Githook User [ 12/Aug/20 ] |
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: (cherry picked from commit 0cb70e9577c46257798d0385b15ec6bff8dbd28d) |
| Comment by Githook User [ 21/Jul/20 ] |
|
Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}Message: |
| Comment by Carl Champain (Inactive) [ 15/Jun/20 ] |
|
Thanks for the report. Kind regards, |
| Comment by Jingcheng Li [ 11/Jun/20 ] |
|
Change the type of "millisBeforeExpire" to unsigned long long should fix it. And our instances are MongoDB 4.2.1 community edition.
Look forward for your feedback. thanks! |