Imagine you have a node whose system clock is out of sync and ahead of the rest of the replica set. That node gets elected primary and takes a w:1 write and returns an operationTime of 500. Before that primary ever commits any writes in its term, however, it crashes and a new primary is elected. The new primary is only at clusterTime 100. If the client then sends afterClusterTime to the new primary, it will get an error that "readConcern afterClusterTime must not be greater than clusterTime value".
Instead of getting an error, the new primary should perform a no-op write to advance its cluster time to 500. In order to do that, unsharded replica sets will need to sign and propagate a cluster time like shard servers do.
The implementation will use the existing machinery for generating and looking up cluster keys in a replica set.
As the keys are stored in the admin.system.keys database it will not require new database creation in the replica set.
1. Refactor KeyCollectionManager:
a. remove KeysCollectionManagerDirect and KeysCollectionManagerZero. It does not look that they are going to be used.
b. move code from KeysCollectionManagerSharded into KeysCollectionManagerImpl - therefore we keep the interface that may be needed.
c. Add an interface KeysCollectionClient
d. Add KeysCollectionClientSharded that uses ShardingCatalogClient and KeysCollectionClientDirect that uses DBDirectClient to get and insert keys.
2. in db/db.cpp: construct KeysCollectionManager with KeysCollectionClientDirect when mongod is being started with a --replSet flag
3. enable/disable keys generation when the mongod node becomes primary/secondary
It is already done at ReplicationCoordinatorExternalStateImpl::_shardingOnTransitionToPrimaryHook and ReplicationCoordinatorExternalStateImpl::shardingOnStepDownHook - the same place where its currently works for the config server.
The check should make sure that the new primary is not part of any sharded cluster.
4. When a replica set becomes a part of the shardedCluster its nodes must change the client to KeysCollectionClientSharded.
Need to clear the cache to avoid signing the data with the replica set keys that are being discarded.
As it comes to the clients that have clusterTime signed by replica set the protocol must ensure that the clusterTime they got is signed by the config server. It must be part of the addShard call. This imply that mongos need to refresh its keys right after addShard was processed by config server.
5. We should not invariant on the non-consistent user input: remove the invariant - in production it should log an error and return an error.