Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30768

Primary queries using maxTimeMS cause temporary shard write unavailability if ExceededTimeLimit

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.4.6, 3.6.0-rc4
    • Fix Version/s: 3.4.19, 3.6.1, 3.7.1
    • Component/s: Sharding
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.6, v3.4
    • Steps To Reproduce:
      Hide
      1. Set up a sharded cluster with replica sets
      2. Issue a find(query).maxTimeMS(1) command to cause a timeout, such that the Mongo shell prints:

        E QUERY    [thread1] Error: error: {
        	"ok" : 0,
        	"errmsg" : "operation exceeded time limit",
        	"code" : 50,
        	"codeName" : "ExceededTimeLimit"
        }
        

      3. At this point, MongoS logs that it marked the primary as failed:

        # 2017-08-21T21:11:05.717+0000 I NETWORK  [NetworkInterfaceASIO-TaskExecutorPool-1-0] Marking host shard3.xyz:27017 as failed :: caused by :: ExceededTimeLimit: operation exceeded time limit

      4. Immediately issue a write command, which will fail with:

        error 133: Write failed with error code 133 and error message 'could not find host matching read preference { mode: "primary", tags: [ {} ] } for set xyz_shard3

      Show
      Set up a sharded cluster with replica sets Issue a find(query).maxTimeMS(1) command to cause a timeout, such that the Mongo shell prints: E QUERY [thread1] Error: error: { "ok" : 0, "errmsg" : "operation exceeded time limit", "code" : 50, "codeName" : "ExceededTimeLimit" } At this point, MongoS logs that it marked the primary as failed: # 2017-08-21T21:11:05.717+0000 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-1-0] Marking host shard3.xyz:27017 as failed :: caused by :: ExceededTimeLimit: operation exceeded time limit Immediately issue a write command, which will fail with: error 133: Write failed with error code 133 and error message 'could not find host matching read preference { mode: "primary", tags: [ {} ] } for set xyz_shard3
    • Sprint:
      Sharding 2017-12-04, Sharding 2017-12-18

      Description

      Setup:

      Sharded cluster with replica set shards. MongoDB v3.4.6. WiredTiger with snappy.
      Collection X exists only on 1 shard (not sharded, probably not relevant).

      Problem:

      When a query fails due to a max time MS timeout (which happens now and again, since we are using a fairly tight limit), MongoS marks the node as failed. (This is incorrect.. the node is NOT failed).

      Result:

      Since the query was against the primary, and the primary is marked as failed, subsequent write operations fail due to unavailability of the primary. This lasts for a second or a few, presumably until MongoS heartbeat monitor detects the primary as up.

      This renders $maxTimeMS dangerous to use when making primary-side queries--any timed out query will temporarily make the shard write-unavailable.

      Furthermore, it seems that architecturally it's wrong to have the MongoS mark the host as failed, but then not trigger a failover. This means that the MongoS "failed primary" logic is completely disconnected from the actual primary/replica failover/election logic. This means that when MongoS reports "no primary found" for a shard, it's not because there's actually no primary in that replica set! (there is a primary and it's healthy)

      (I think that this problem applies to queries that hit replicas as well, where the replica is marked as failed, but I haven't specifically tested that.)

        Attachments

          Activity

            People

            • Votes:
              3 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: