[SERVER-12221] Sleep in ReplicaSetMonitor::_check is causing latency for slaveOk() queries in sharded cluster when there is no primary Created: 31/Dec/13  Updated: 10/Dec/14  Resolved: 15/Jan/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alexander Komyagin Assignee: Greg Studer
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Sharded cluster with 2 shards
One shard is in read-only mode (no primary)
test.test is a sharded collection on _id


Issue Links:
Duplicate
duplicates SERVER-12284 ReplicaSetMonitor is broken Closed
is duplicated by SERVER-8690 node marked as ok before marked as se... Closed
Related
is related to SERVER-7246 Mongos cannot do slaveOk queries when... Closed
Operating System: ALL
Participants:

 Description   

With SERVER-7246 we allow slaveOk() queries to proceed even if there is no primary in the shard. However, the latency of those queries will never be less than 2 seconds because we call ReplicaSetMonitor::_check to check the hosts, and it does two retries to detect the primary, sleeping 1 sec after each one: https://github.com/mongodb/mongo/blob/v2.4/src/mongo/client/dbclient_rs.cpp#L1030

2 seconds minimum latency for the query is not a good thing in most environments, even for an edge case when no primary is available.



 Comments   
Comment by Alexander Komyagin [ 31/Dec/13 ]

Corresponding loglevel5 from mongoS:

Mon Dec 30 23:51:51.155 [conn1] Request::process begin ns: test.test msg id: 62 op: 2004 attempt: 0
Mon Dec 30 23:51:51.155 [conn1] shard query: test.test  { x: 1.0 }
Mon Dec 30 23:51:51.155 [conn1] [pcursor] creating pcursor over QSpec { ns: "test.test", n2skip: 0, n2return: 0, options: 4, query: { x: 1.0 }, fields: {} } and CInfo { v_ns: "", filter: {} }
Mon Dec 30 23:51:51.155 [conn1] [pcursor] initializing over 2 shards required by [test.test @ 4|1||52c1d2a1be349f8796e733c8]
Mon Dec 30 23:51:51.155 [conn1] [pcursor] initializing on shard sh1:sh1/ip-10-112-221-225:27011,ip-10-9-157-53:27012, current connection state is { state: {}, retryNext: false, init: false, finish: false, errored: false }
Mon Dec 30 23:51:51.155 [conn1] _check : sh1/ip-10-112-221-225:27011,ip-10-9-157-53:27012
Mon Dec 30 23:51:51.155 [conn1] trying reconnect to ip-10-112-221-225:27011
Mon Dec 30 23:51:51.157 [conn1] reconnect ip-10-112-221-225:27011 failed couldn't connect to server ip-10-112-221-225:27011
Mon Dec 30 23:51:51.157 [conn1] ReplicaSetMonitor::_checkConnection: caught exception ip-10-112-221-225:27011 socket exception [CONNECT_ERROR] for ip-10-112-221-225:27011
Mon Dec 30 23:51:51.158 [conn1] ReplicaSetMonitor::_checkConnection: ip-10-9-157-53:27012 { setName: "sh1", ismaster: false, secondary: true, hosts: [ "ip-10-9-157-53:27012", "ip-10-112-221-225:27011" ], me: "ip-10-9-157-53:27012", maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, localTime: new Date(1388447511158), ok: 1.0 }
Mon Dec 30 23:51:51.158 [conn1] dbclient_rs nodes[1].ok = true ip-10-9-157-53:27012
Mon Dec 30 23:51:51.158 [conn1] dbclient_rs nodes[0].ok = false ip-10-112-221-225:27011
 
 
Mon Dec 30 23:51:52.158 [conn1] ReplicaSetMonitor::_checkConnection: caught exception ip-10-112-221-225:27011 socket exception [FAILED_STATE] for ip-10-112-221-225:27011
Mon Dec 30 23:51:52.158 [conn1] ReplicaSetMonitor::_checkConnection: ip-10-9-157-53:27012 { setName: "sh1", ismaster: false, secondary: true, hosts: [ "ip-10-9-157-53:27012", "ip-10-112-221-225:27011" ], me: "ip-10-9-157-53:27012", maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, localTime: new Date(1388447512158), ok: 1.0 }
Mon Dec 30 23:51:52.158 [conn1] dbclient_rs nodes[1].ok = true ip-10-9-157-53:27012
Mon Dec 30 23:51:52.158 [conn1] dbclient_rs nodes[0].ok = false ip-10-112-221-225:27011
 
 
Mon Dec 30 23:51:53.159 [conn1] warning: No primary detected for set sh1
Mon Dec 30 23:51:53.159 [conn1] User Assertion: 10009:ReplicaSetMonitor no master found for set: sh1
Mon Dec 30 23:51:53.159 [conn1] dbclient_rs say using secondary or tagged node selection in sh1, read pref is { pref: "secondary pref", tags: [ {} ] } (primary : ip-10-112-221-225:27011, lastTagged : ip-10-9-157-53:27012)
Mon Dec 30 23:51:53.159 [conn1] dbclient_rs selecting compatible last used node ip-10-9-157-53:27012

Generated at Thu Feb 08 03:27:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.