Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62486

Gossiping cluster time in replicaset with "--transitionToAuth" can cause KeyNotFound error

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.0.3, 5.0.0
    • Component/s: None
    • Cluster Scalability
    • ALL
    • Hide
      1. Create a 3-node replicaset with auth disabled.
        E.g. using mlaunch:
        mlaunch init --replicaset --dir ~/data/testing
        
      2. Connect to the replicaset and create a read-only user.
        use admin
        db.createUser({user: "read", pwd: "12345", roles: [ { role: "read", db: "admin" } ] })
        
      3. Generate a random keyfile.
        openssl rand -base64 768 > keyfile.txt
        
      4. Restart all replicaset processes with the transitionToAuth flag enabled (using the keyfile for internal auth) one at a time.
        # Kill the mongod process listening on port 27017, then start a new one.
        ps aux | grep 27017
        kill <pid>
        mongod --transitionToAuth --keyFile keyfile.txt --replSet replset --dbpath ~/data/testing/replset/rs1/db --logpath ~/data/testing/replset/rs1/mongod.log --port 27017 --fork
        # Repeat for the mongod processes listening on port 27018 and 27019.
        
      5. Restart each secondary process in the replicaset, removing the --transitionToAuth flag and enabling the --auth flag.
        For example, if the primary is listening on port 27017:
        # Kill the mongod process listening on port 27018, then start a new one.
        ps aux | grep 27018
        kill <pid>
        mongod --auth --keyFile keyfile.txt --replSet replset --dbpath ~/data/testing/replset/rs2/db --logpath ~/data/testing/replset/rs2/mongod.log --port 27018 --fork
        # Repeat for the mongod process listening on port 27019.
        
      6. Send a "ping" command to each mongod and observe the mixed "$clusterTime" responses using the same authentication parameters.
        For example, if the primary is listening on port 27017:
        # Returns dummy-signed $clusterTime from the primary with "keyId: 0".
        mongo --port 27017 -u "read" -p "12345" --authenticationDatabase "admin" --eval 'db.runCommand({ping:1})'
        # Returns signed $clusterTime with real keyId.
        mongo --port 27018 -u "read" -p "12345" --authenticationDatabase "admin" --eval 'db.runCommand({ping:1})'
        # Returns signed $clusterTime with real keyId.
        mongo --port 27019 -u "read" -p "12345" --authenticationDatabase "admin" --eval 'db.runCommand({ping:1})'
        

      You've created a working replicaset with mixed authentication requirements that will return a dummy-signed $clusterTime with keyId: 0 from the primary and signed $clusterTime with a real keyId from the secondaries using the same auth parameters. When connecting to all replicaset nodes with a single client instance, that replicaset state creates a race condition between a client advancing its $clusterTime timestamp and the secondaries updating their $clusterTime timestamp. To actually observe the KeyNotFound error, the replicaset secondaries must have a $clusterTime timestamp behind the primary and behind the client that is gossiping $clusterTime. In that case, a client that writes to the primary and reads from a secondary will get a KeyNotFound error from the secondary until it "catches up" to the same $clusterTime timestamp as the client.

      Note that I confirmed this works with server v4.0.3 and 5.0.0, but I believe this affects every server version that supports transitionToAuth.

      Show
      Create a 3-node replicaset with auth disabled. E.g. using mlaunch : mlaunch init --replicaset --dir ~/data/testing Connect to the replicaset and create a read-only user. use admin db.createUser({user: "read" , pwd: "12345" , roles: [ { role: "read" , db: "admin" } ] }) Generate a random keyfile. openssl rand -base64 768 > keyfile.txt Restart all replicaset processes with the transitionToAuth flag enabled (using the keyfile for internal auth) one at a time. # Kill the mongod process listening on port 27017, then start a new one. ps aux | grep 27017 kill <pid> mongod --transitionToAuth --keyFile keyfile.txt --replSet replset --dbpath ~/data/testing/replset/rs1/db --logpath ~/data/testing/replset/rs1/mongod.log --port 27017 --fork # Repeat for the mongod processes listening on port 27018 and 27019. Restart each secondary process in the replicaset, removing the --transitionToAuth flag and enabling the --auth flag. For example, if the primary is listening on port 27017: # Kill the mongod process listening on port 27018, then start a new one. ps aux | grep 27018 kill <pid> mongod --auth --keyFile keyfile.txt --replSet replset --dbpath ~/data/testing/replset/rs2/db --logpath ~/data/testing/replset/rs2/mongod.log --port 27018 --fork # Repeat for the mongod process listening on port 27019. Send a "ping" command to each mongod and observe the mixed "$clusterTime" responses using the same authentication parameters. For example, if the primary is listening on port 27017: # Returns dummy-signed $clusterTime from the primary with "keyId: 0" . mongo --port 27017 -u "read" -p "12345" --authenticationDatabase "admin" --eval 'db.runCommand({ping:1})' # Returns signed $clusterTime with real keyId. mongo --port 27018 -u "read" -p "12345" --authenticationDatabase "admin" --eval 'db.runCommand({ping:1})' # Returns signed $clusterTime with real keyId. mongo --port 27019 -u "read" -p "12345" --authenticationDatabase "admin" --eval 'db.runCommand({ping:1})' You've created a working replicaset with mixed authentication requirements that will return a dummy-signed $clusterTime with keyId: 0 from the primary and signed $clusterTime with a real keyId from the secondaries using the same auth parameters. When connecting to all replicaset nodes with a single client instance, that replicaset state creates a race condition between a client advancing its $clusterTime timestamp and the secondaries updating their $clusterTime timestamp. To actually observe the KeyNotFound error, the replicaset secondaries must have a $clusterTime timestamp behind the primary and behind the client that is gossiping $clusterTime. In that case, a client that writes to the primary and reads from a secondary will get a KeyNotFound error from the secondary until it "catches up" to the same $clusterTime timestamp as the client. Note that I confirmed this works with server v4.0.3 and 5.0.0, but I believe this affects every server version that supports transitionToAuth .
    • Security 2022-02-07, Security 2022-02-21, Security 2022-03-07, Security 2022-05-02, Security 2022-07-25, Security 2022-08-08, Security 2022-08-22, Security 2022-09-05, Security 2022-09-19, Security 2022-10-03, Security 2022-10-17, Security 2022-10-31, Security 2022-11-14, Security 2022-11-28, Security 2022-12-12, Security 2022-12-26, Security 2023-01-09, Security 2023-01-23, Security 2023-02-06

      Using the transitionToAuth flag, it's possible to create a working replicaset with mixed authentication requirements that will return a dummy-signed $clusterTime document from the primary and actually signed $clusterTime documents from the secondaries using the same client authentication parameters (see this comment for more info). In that case, a client may attempt to gossip a dummy-signed $clusterTime to nodes that require a real $clusterTime signature. If that happens and the secondaries have a $clusterTime timestamp older than the client's, the secondaries will return a KeyNotFound error instead of the expected response.

      See https://jira.mongodb.org/browse/DRIVERS-1904 for the related drivers ticket.

            Assignee:
            backlog-server-cluster-scalability [DO NOT USE] Backlog - Cluster Scalability
            Reporter:
            matt.dale@mongodb.com Matt Dale
            Votes:
            3 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: