Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.4.19, 3.6.1, 3.7.1
Affects Version/s: 3.4.6, 3.6.0-rc4
Component/s: Sharding
Labels:
- bkp

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v3.6, v3.4
Steps To Reproduce:
Hide

Set up a sharded cluster with replica sets

Issue a find(query).maxTimeMS(1) command to cause a timeout, such that the Mongo shell prints:

E QUERY [thread1] Error: error: { "ok" : 0, "errmsg" : "operation exceeded time limit", "code" : 50, "codeName" : "ExceededTimeLimit" }

At this point, MongoS logs that it marked the primary as failed:

# 2017-08-21T21:11:05.717+0000 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-1-0] Marking host shard3.xyz:27017 as failed :: caused by :: ExceededTimeLimit: operation exceeded time limit

Immediately issue a write command, which will fail with:

error 133: Write failed with error code 133 and error message 'could not find host matching read preference { mode: "primary", tags: [ {} ] } for set xyz_shard3
Show
Set up a sharded cluster with replica sets Issue a find(query).maxTimeMS(1) command to cause a timeout, such that the Mongo shell prints: E QUERY [thread1] Error: error: { "ok" : 0, "errmsg" : "operation exceeded time limit" , "code" : 50, "codeName" : "ExceededTimeLimit" } At this point, MongoS logs that it marked the primary as failed: # 2017-08-21T21:11:05.717+0000 I NETWORK [NetworkInterfaceASIO-TaskExecutorPool-1-0] Marking host shard3.xyz:27017 as failed :: caused by :: ExceededTimeLimit: operation exceeded time limit Immediately issue a write command, which will fail with: error 133: Write failed with error code 133 and error message 'could not find host matching read preference { mode: "primary" , tags: [ {} ] } for set xyz_shard3
Sprint:
Sharding 2017-12-04, Sharding 2017-12-18
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Setup:

Sharded cluster with replica set shards. MongoDB v3.4.6. WiredTiger with snappy.
Collection X exists only on 1 shard (not sharded, probably not relevant).

Problem:

When a query fails due to a max time MS timeout (which happens now and again, since we are using a fairly tight limit), MongoS marks the node as failed. (This is incorrect.. the node is NOT failed).

Result:

Since the query was against the primary, and the primary is marked as failed, subsequent write operations fail due to unavailability of the primary. This lasts for a second or a few, presumably until MongoS heartbeat monitor detects the primary as up.

This renders $maxTimeMS dangerous to use when making primary-side queries--any timed out query will temporarily make the shard write-unavailable.

Furthermore, it seems that architecturally it's wrong to have the MongoS mark the host as failed, but then not trigger a failover. This means that the MongoS "failed primary" logic is completely disconnected from the actual primary/replica failover/election logic. This means that when MongoS reports "no primary found" for a shard, it's not because there's actually no primary in that replica set! (there is a primary and it's healthy)

(I think that this problem applies to queries that hit replicas as well, where the replica is marked as failed, but I haven't specifically tested that.)

Assignee:: Jack Mulrow
Reporter:: Oleg Rekutin
Participants:: Githook User, Jack Mulrow, Kaloian Manassiev, Mark Agarunov, Oleg Rekutin, Paul Winterhalder, Ramon Fernandez Marina
Votes:: 3 Vote for this issue
Watchers:: 19 Start watching this issue

Created:: Aug 21 2017 11:09:48 PM UTC
Updated:: Oct 30 2023 11:14:08 PM UTC
Resolved:: Dec 12 2017 12:00:47 AM UTC

Details

Description

Attachments

Activity

People

Dates