[SERVER-39213] Mongos does not wait for cluster time signing keys during startup Created: 25/Jan/19  Updated: 29/Oct/23  Resolved: 10/Apr/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.1.11

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Misha Tyulenev
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2019-04-08, Sharding 2019-04-22
Participants:
Case:
Linked BF Score: 6

 Description   

SERVER-30249 made mongos block during startup until it receives the keys used to sign cluster times from the config server, because it won't gossip $clusterTime to clients without them. This is skipped if the max wire version of mongos's RSM for the CSRS is less than the OP_MSG wire version or if the CSRS RSM's min and max wire version are different.

It seems the second part of this check was to avoid waiting on keys when the CSRS was in FCV 3.4, but it consistently fails in master (not 4.0 or 3.6) so mongos skips waiting for keys. This means mongos is not guaranteed to gossip $clusterTime and provide causal consistency immediately after starting up, although it will as soon as it receives keys from the config sever, which normally happens quickly.



 Comments   
Comment by Luke Chen [ 11/Apr/19 ]

Fixing up the fixVersion as this ticket was not included as part of 4.1.10 release.

Comment by Githook User [ 10/Apr/19 ]

Author:

{'email': 'misha@mongodb.com', 'name': 'Misha Tyulenev', 'username': 'mikety'}

Message: SERVER-39213 remove mongosWaitsForKeys from shardingtest
Branch: master
https://github.com/mongodb/mongo/commit/0bbed4c98ce45ee4ecca1675222abd6b7cdb67e9

Comment by Jack Mulrow [ 21/Mar/19 ]

Takeaways from further investigation on the impacts of this behavior:

  1. If mongos starts up without keys, it will not include cluster time metadata in any response (i.e. $clusterTime or operationTime) until it asynchronously discovers keys through a background thread. The thread refreshes every 200ms w/ linear backoff until keys are found w/ a max interval of 10 minutes, so in normal operation this period should be short. I'm not positive how a driver handles responses without clusterTime metadata, but given this has been the behavior in master for some time, it doesn't seem to have broken any driver tests.
  2. With auth on, I don't believe conforming drivers shouldn't experience any causal consistency violations, because either the driver has not received valid clusterTime metadata and cannot perform afterClusterTime reads, or it has received metadata from another mongos, which will be rejected by the mongos w/o keys because it cannot validate the proof included with $clusterTime.
  3. With auth off, however, I think it is possible to violate causal consistency because mongos does not return clusterTime metadata without keys, but skips validating proofs with received clusterTimes. So a mongos that has received valid clusterTimes at some point may perform a write against the mongos w/o keys, receive no new clusterTimes, then perform a read using the stale metadata, which will not reflect the earlier write.
  4. Without waiting for keys, it's unlikely we'll discover bugs with key generation / rotation, because of the "mongosWaitsForKeys" parameter in ShardingTest, which was added before it was decided mongos should wait for keys at startup and masked this bug by waiting for mongos to return keys in the ShardingTest constructor. If we do this ticket, I'd recommend removing that parameter as well.
Generated at Thu Feb 08 04:51:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.