[SERVER-40136] The background key generator can remain disabled on FCV upgrade after a downgrade Created: 14/Mar/19  Updated: 29/Oct/23  Resolved: 18/Apr/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.11
Fix Version/s: 3.6.13

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Misha Tyulenev
Resolution: Fixed Votes: 2
Labels: SWCW
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Sharding 2019-04-08, Sharding 2019-04-22
Participants:
Case:

 Description   

The background key generator thread is what generates signature keys for cluster time validation. The lifetime of the key generator is like this:

  1. Unconditionally enabled when a node initializes as a shard
  2. Disabled on step-down
  3. Enabled on step-up
  4. Disabled on FCV downgrade from 3.6 to 3.4

The following problems exist with these transitions:

  • Because of (4) above, an FCV change sequence from 3.6 -> 3.4 -> 3.6 will not re-enable the key generator, so it will not generate new keys and will cause the router to fail starting-up
    • This is not a major problem, because keys typically last for months and if a stall happens on router start-up, this can be worked around by stepping down the config server primary
  • Because of (1) above, a secondary replica set node will end up with the key generator running
    • This is mitigated because when that key generator tries to insert a new key, it will fail with a NotMaster error


 Comments   
Comment by Vinicius Grippa [ 28/Oct/19 ]

Just for future reference. The issue is not present anymore on 3.6.13. I repeated the tests and the thing is that it takes a while but the key generator is started:

2019-10-28T05:28:26.968-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:27.968-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:28.969-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:29.969-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:30.969-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:31.969-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:32.969-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:33.970-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:34.970-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:35.970-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:36.970-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:37.971-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:38.971-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:39.971-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:40.971-0400 I SHARDING [mongosMain] Waiting for signing keys, sleeping for 1s and trying again.
2019-10-28T05:28:41.976-0400 I FTDC     [mongosMain] Initializing full-time diagnostic data capture with directory '/home/vinicius.grippa/data/data/mongos.diagnostic.data'
2019-10-28T05:28:41.979-0400 I NETWORK  [mongosMain] waiting for connections on port 37017

Thanks for your time.

Comment by Kaloian Manassiev [ 21/Oct/19 ]

vgrippa@gmail.com, would it be possible to create a new SERVER ticket with the exact steps that you went through and the symptoms that you experienced? This will help us evaluate whether this is a different manifestation of the same bug or a different problem altogether.

Also, please include information on what environment did you run these steps agains.

Thank you in advance.

Best regards,
-Kal.

Comment by Vinicius Grippa [ 17/Oct/19 ]

Hi,

 

I tested version 3.6.13 and the issue still persists. The steps used are on the description of the issue:

 

FCV change sequence from 3.6 -> 3.4 -> 3.6 .

Comment by Githook User [ 18/Apr/19 ]

Author:

{'name': 'Misha Tyulenev', 'username': 'mikety', 'email': 'misha@mongodb.com'}

Message: SERVER-40136 enable key generator on setting FCV to 3.6
Branch: v3.6
https://github.com/mongodb/mongo/commit/fe29b62bc82c5455ddae672277227d6e827a90b6

Comment by Kaloian Manassiev [ 18/Apr/19 ]

The second bullet in the description applies to 4.2 as well, doesn't it (that the key generator will continue running on secondaries)? Will this result in unnecessary log spam with NotMaster errors for example?

Comment by Misha Tyulenev [ 18/Apr/19 ]

The issue affects only 3.6 as its on the 3.4 downgrade path

Generated at Thu Feb 08 04:54:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.