Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28057

isMaster timeout when using multiple mongos connection pools

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: 3.2.12
    • Component/s: Networking, Sharding
    • Environment:
      CentoOS 6
      MongoDB version 3.2.12
    • ALL
    • Hide

      Create a 3.2.12 sharded cluster with a single mongos (having 6 CPU), config servers on SCCC and 10 shards - 3 node replica set.
      Using sysbench-mongodb create six test collections sbtest<1-6> 40.000.000 docs each and shard on {_id:hashed}
      After balancing complete, execute four instances of sbench-mongodb against the mongos. I am running sbench-mongodb using defaults but NUM_WRITER_THREADS which i change it to 256.
      After few minutes ASIO timeouts will start appearing on the mongos log.

      Show
      Create a 3.2.12 sharded cluster with a single mongos (having 6 CPU), config servers on SCCC and 10 shards - 3 node replica set. Using sysbench-mongodb create six test collections sbtest<1-6> 40.000.000 docs each and shard on {_id:hashed} After balancing complete, execute four instances of sbench-mongodb against the mongos. I am running sbench-mongodb using defaults but NUM_WRITER_THREADS which i change it to 256. After few minutes ASIO timeouts will start appearing on the mongos log.

      I am testing 3.2.12 on a 10 nodes sharded cluster (using sysbench-mongodb) and I am getting a weird behavior. Whenever using mongos default settings I am receiving random ASIO timeouts for

      { isMaster: 1 }

      command from different connection pools.

      I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Failed to connect to (node) - ExceededTimeLimit: Operation timed out
      D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Failed to execute command: RemoteCommand 23628777 -- target:(node) db:admin cmd:{ isMaster: 1 } reason: ExceededTimeLimit: Operation timed out
      

      When I set "taskExecutorPoolSize"=1, which I believe set a single connection pool, I am not getting the above errors.

      My mongos has 6 CPUs so I assume it creates 6 connection pools with defaults. Using a smaller value like "taskExecutorPoolSize"=2 reduces the timeouts so it seems the more connection pools I use the more timeouts I get during the benchmark.

      I am trying to understand what may cause the above behavior.

      Thanks in advance,
      Antonis

            Assignee:
            Unassigned Unassigned
            Reporter:
            antogiann Antonis Giannopoulos
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: