Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24630

Mongos erroneously advances config optime for writes that fail write concern

    • Fully Compatible
    • ALL
    • Repl 16 (06/24/16)

      Mongos advances config optime using the last visible optime from config servers. If the write commands fail due to timeout or NotMaster, the last visible optime is ahead of the last committed and may be rolled back.

      With the following patch, majority read fails due to timeout, which should never happen.

      diff --git a/jstests/sharding/auth_no_config_primary.js b/jstests/sharding/auth_no_config_primary.js
      index 3bb1ea1..d2c4b79 100644
      --- a/jstests/sharding/auth_no_config_primary.js
      +++ b/jstests/sharding/auth_no_config_primary.js
      @@ -8,6 +8,12 @@
       (function() {
           "use strict";
      
      +    function enableFailPoint(node) {
      +        assert.commandWorked(node.getDB('admin').runCommand(
      +                                 {configureFailPoint: 'rsSyncApplyStop', mode: 'alwaysOn'}),
      +                             'Failed to configure rsSyncApplyStop failpoint.');
      +    }
      +
           var st = new ShardingTest({shards: 1, keyFile: 'jstests/libs/key1'});
      
           st.s.getDB('admin').createUser({user: 'root', pwd: 'pass', roles: ['root']});
      @@ -15,9 +21,9 @@
           var testDB = st.s.getDB('test');
           testDB.user.insert({hello: 'world'});
      
      -    // Kill all secondaries, forcing the current primary to step down.
      +    // Pause all secondaries.
           st.configRS.getSecondaries().forEach(function(secondaryConn) {
      -        MongoRunner.stopMongod(secondaryConn.port);
      +        enableFailPoint(secondaryConn);
           });
      
           // Test authenticate through a fresh connection.
      diff --git a/src/mongo/s/sharding_egress_metadata_hook.cpp b/src/mongo/s/sharding_egress_metadata_hook.cpp
      index 2040955..4565a52 100644
      --- a/src/mongo/s/sharding_egress_metadata_hook.cpp
      +++ b/src/mongo/s/sharding_egress_metadata_hook.cpp
      @@ -43,6 +43,8 @@
       #include "mongo/s/sharding_egress_metadata_hook.h"
       #include "mongo/util/net/hostandport.h"
      
      +#include "mongo/util/log.h"
      +
       namespace mongo {
      
       namespace rpc {
      @@ -114,6 +116,7 @@ Status ShardingEgressMetadataHook::_advanceConfigOptimeFromShard(ShardId shardId
      
                       const auto& replMetadata = parseStatus.getValue();
                       auto opTime = replMetadata.getLastOpVisible();
      +                log() << "XXXSZ advance config optime " << metadataObj;
                       grid.advanceConfigOpTime(opTime);
                   }
               } else {
      
      // Advance config optime, which is never commited.
      s20015| 2016-06-16T18:26:23.818-0400 I SHARDING [NetworkInterfaceASIO-ShardRegistry-0] XXXSZ advance config optime { $replData: { term: 1, lastOpCommitted: { ts: Timestamp 1466115787000|11, t: 1 }, lastOpVisible: { ts: Timestamp 1466115968000|1, t: 1 }, configVersion: 1, replicaSetId: ObjectId('576326bd769d59b46d7a042b'), primaryIndex: 0, syncSourceIndex: -1 }, $gleStats: { lastOpTime: { ts: Timestamp 1466115968000|1, t: 1 }, electionId: ObjectId('7fffffff0000000000000001') } }
      s20015| 2016-06-16T18:27:38.614-0400 I SHARDING [Balancer] Operation timed out with status ExceededTimeLimit: Operation timed out, request was RemoteCommand 85 -- target:siyuan-mbp:20013 db:config expDate:2016-06-16T18:27:38.610-0400 cmd:{ find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1466115968000|1, t: 1 } }, maxTimeMS: 30000 }
      
      // Then a majority read fails.
      s20015| 2016-06-16T18:27:38.862-0400 I SHARDING [conn1] Operation timed out with status ExceededTimeLimit: Operation timed out, request was RemoteCommand 104 -- target:siyuan-mbp:20012 db:config expDate:2016-06-16T18:27:38.857-0400 cmd:{ find: "databases", filter: { _id: "test" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1466115968000|1, t: 1 } }, limit: 1, maxTimeMS: 30000 }
      	"errmsg" : "Operation timed out, request was RemoteCommand 104 -- target:siyuan-mbp:20012 db:config expDate:2016-06-16T18:27:38.857-0400 cmd:{ find: \"databases\", filter: { _id: \"test\" }, readConcern: { level: \"majority\", afterOpTime: { ts: Timestamp 1466115968000|1, t: 1 } }, limit: 1, maxTimeMS: 30000 }",
      c20012| 2016-06-16T18:27:38.869-0400 I COMMAND  [conn11] Command on database config timed out waiting for read concern to be satisfied. Command: { find: "databases", filter: { _id: "test" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1466115968000|1, t: 1 } }, limit: 1, maxTimeMS: 30000 }
      c20012| 2016-06-16T18:27:38.869-0400 I COMMAND  [conn11] command config.$cmd command: find { find: "databases", filter: { _id: "test" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1466115968000|1, t: 1 } }, limit: 1, maxTimeMS: 30000 } numYields:0 reslen:92 locks:{} protocol:op_command 30011ms
      

      This is found during implementing SERVER-24574 which returns NotMaster rather than closing connections.

            Assignee:
            siyuan.zhou@mongodb.com Siyuan Zhou
            Reporter:
            siyuan.zhou@mongodb.com Siyuan Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: