[SERVER-62486] Gossiping cluster time in replicaset with "--transitionToAuth" can cause KeyNotFound error Created: 10/Jan/22  Updated: 12/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: 4.0.3, 5.0.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matt Dale Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 3
Labels: sharding-product-sync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to DRIVERS-1904 Handle invalid $clusterTime documents... Backlog
Assigned Teams:
Cluster Scalability
Operating System: ALL
Steps To Reproduce:
  1. Create a 3-node replicaset with auth disabled.
    E.g. using mlaunch:

    mlaunch init --replicaset --dir ~/data/testing
    

  2. Connect to the replicaset and create a read-only user.

    use admin
    db.createUser({user: "read", pwd: "12345", roles: [ { role: "read", db: "admin" } ] })
    

  3. Generate a random keyfile.

    openssl rand -base64 768 > keyfile.txt
    

  4. Restart all replicaset processes with the transitionToAuth flag enabled (using the keyfile for internal auth) one at a time.

    # Kill the mongod process listening on port 27017, then start a new one.
    ps aux | grep 27017
    kill <pid>
    mongod --transitionToAuth --keyFile keyfile.txt --replSet replset --dbpath ~/data/testing/replset/rs1/db --logpath ~/data/testing/replset/rs1/mongod.log --port 27017 --fork
    # Repeat for the mongod processes listening on port 27018 and 27019.
    

  5. Restart each secondary process in the replicaset, removing the --transitionToAuth flag and enabling the --auth flag.
    For example, if the primary is listening on port 27017:

    # Kill the mongod process listening on port 27018, then start a new one.
    ps aux | grep 27018
    kill <pid>
    mongod --auth --keyFile keyfile.txt --replSet replset --dbpath ~/data/testing/replset/rs2/db --logpath ~/data/testing/replset/rs2/mongod.log --port 27018 --fork
    # Repeat for the mongod process listening on port 27019.
    

  6. Send a "ping" command to each mongod and observe the mixed "$clusterTime" responses using the same authentication parameters.
    For example, if the primary is listening on port 27017:

    # Returns dummy-signed $clusterTime from the primary with "keyId: 0".
    mongo --port 27017 -u "read" -p "12345" --authenticationDatabase "admin" --eval 'db.runCommand({ping:1})'
    # Returns signed $clusterTime with real keyId.
    mongo --port 27018 -u "read" -p "12345" --authenticationDatabase "admin" --eval 'db.runCommand({ping:1})'
    # Returns signed $clusterTime with real keyId.
    mongo --port 27019 -u "read" -p "12345" --authenticationDatabase "admin" --eval 'db.runCommand({ping:1})'
    

You've created a working replicaset with mixed authentication requirements that will return a dummy-signed $clusterTime with keyId: 0 from the primary and signed $clusterTime with a real keyId from the secondaries using the same auth parameters. When connecting to all replicaset nodes with a single client instance, that replicaset state creates a race condition between a client advancing its $clusterTime timestamp and the secondaries updating their $clusterTime timestamp. To actually observe the KeyNotFound error, the replicaset secondaries must have a $clusterTime timestamp behind the primary and behind the client that is gossiping $clusterTime. In that case, a client that writes to the primary and reads from a secondary will get a KeyNotFound error from the secondary until it "catches up" to the same $clusterTime timestamp as the client.

Note that I confirmed this works with server v4.0.3 and 5.0.0, but I believe this affects every server version that supports transitionToAuth.

Sprint: Security 2022-02-07, Security 2022-02-21, Security 2022-03-07, Security 2022-05-02, Security 2022-07-25, Security 2022-08-08, Security 2022-08-22, Security 2022-09-05, Security 2022-09-19, Security 2022-10-03, Security 2022-10-17, Security 2022-10-31, Security 2022-11-14, Security 2022-11-28, Security 2022-12-12, Security 2022-12-26, Security 2023-01-09, Security 2023-01-23, Security 2023-02-06
Participants:

 Description   

Using the transitionToAuth flag, it's possible to create a working replicaset with mixed authentication requirements that will return a dummy-signed $clusterTime document from the primary and actually signed $clusterTime documents from the secondaries using the same client authentication parameters (see this comment for more info). In that case, a client may attempt to gossip a dummy-signed $clusterTime to nodes that require a real $clusterTime signature. If that happens and the secondaries have a $clusterTime timestamp older than the client's, the secondaries will return a KeyNotFound error instead of the expected response.

See https://jira.mongodb.org/browse/DRIVERS-1904 for the related drivers ticket.



 Comments   
Comment by Ian Springer [ 15/May/23 ]

My team recently encountered this issue in one of our production clusters.

 

The Mongo cluster is a 30-shard v5.0.15 cluster with tlsMode set to required. X.509 authentication fully set up but still optional (via --transitionToAuth --tlsAllowConnectionsWithoutCertificates command line options). We attempted to roll out making X.509 auth required by removing the --transitionToAuth and --tlsAllowConnectionsWithoutCertificates command line options. After doing so, we restarted one of our mongoS nodes and started seeing hundreds of Mongo queries and updates fail with KeyNotFound errors, e.g.:

 

error 211: Command failed with error 211 (KeyNotFound): 'No keys found for HMAC that is valid for time: { ts: Timestamp(1683684922, 1821) } with id: 0' on server mongos1:27017 

This resulted in a significant number of customer-facing errors. We had to shut down the affected mongoS node until we determined the root cause (this issue).

 

We have paused our rollout of required X.509 authentication, as it does not currently appear to be possible without a huge number of query failures during the rollout.  

 

I hope you will consider prioritizing this issue.

 

Generated at Thu Feb 08 05:55:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.