Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30768

Primary queries using maxTimeMS cause temporary shard write unavailability if ExceededTimeLimit

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.4.19, 3.6.1, 3.7.1
    • Affects Version/s: 3.4.6, 3.6.0-rc4
    • Component/s: Sharding
    • Fully Compatible
    • ALL
    • v3.6, v3.4
    • Hide
      1. Set up a sharded cluster with replica sets
      2. Issue a find(query).maxTimeMS(1) command to cause a timeout, such that the Mongo shell prints:
        E QUERY    [thread1] Error: error: {
        	"ok" : 0,
        	"errmsg" : "operation exceeded time limit",
        	"code" : 50,
        	"codeName" : "ExceededTimeLimit"
        }
        
      3. At this point, MongoS logs that it marked the primary as failed:
        # 2017-08-21T21:11:05.717+0000 I NETWORK  [NetworkInterfaceASIO-TaskExecutorPool-1-0] Marking host shard3.xyz:27017 as failed :: caused by :: ExceededTimeLimit: operation exceeded time limit
      4. Immediately issue a write command, which will fail with:
        error 133: Write failed with error code 133 and error message 'could not find host matching read preference { mode: "primary", tags: [ {} ] } for set xyz_shard3
      Show
      Set up a sharded cluster with replica sets Issue a find(query).maxTimeMS(1) command to cause a timeout, such that the Mongo shell prints: E QUERY [thread1] Error: error: { "ok" : 0, "errmsg" : "operation exceeded time limit" , "code" : 50, "codeName" : "ExceededTimeLimit" } At this point, MongoS logs that it marked the primary as failed: # 2017-08-21T21:11:05.717+0000 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-1-0] Marking host shard3.xyz:27017 as failed :: caused by :: ExceededTimeLimit: operation exceeded time limit Immediately issue a write command, which will fail with: error 133: Write failed with error code 133 and error message 'could not find host matching read preference { mode: "primary" , tags: [ {} ] } for set xyz_shard3
    • Sharding 2017-12-04, Sharding 2017-12-18

      Setup:

      Sharded cluster with replica set shards. MongoDB v3.4.6. WiredTiger with snappy.
      Collection X exists only on 1 shard (not sharded, probably not relevant).

      Problem:

      When a query fails due to a max time MS timeout (which happens now and again, since we are using a fairly tight limit), MongoS marks the node as failed. (This is incorrect.. the node is NOT failed).

      Result:

      Since the query was against the primary, and the primary is marked as failed, subsequent write operations fail due to unavailability of the primary. This lasts for a second or a few, presumably until MongoS heartbeat monitor detects the primary as up.

      This renders $maxTimeMS dangerous to use when making primary-side queries--any timed out query will temporarily make the shard write-unavailable.

      Furthermore, it seems that architecturally it's wrong to have the MongoS mark the host as failed, but then not trigger a failover. This means that the MongoS "failed primary" logic is completely disconnected from the actual primary/replica failover/election logic. This means that when MongoS reports "no primary found" for a shard, it's not because there's actually no primary in that replica set! (there is a primary and it's healthy)

      (I think that this problem applies to queries that hit replicas as well, where the replica is marked as failed, but I haven't specifically tested that.)

            Assignee:
            jack.mulrow@mongodb.com Jack Mulrow
            Reporter:
            oleg@evergage.com Oleg Rekutin
            Votes:
            3 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated:
              Resolved: