|
I have successfully reproduced this under 2.8.0-RC4. That said, the failure is not as pronounced as under earlier versions and we seem to issue a new SSV quicker and better detect the change to new primary. I was only able to issue the failing query twice before shard versioning correctly directed me to the correct shard (when before it was until the new primary came back under 2.6 and earlier tested versions of 2.8).
Can you repro this without that preference, when only reading from primaries?
No, primaryPreferred is a requirement. Although primaryPreferred is sometimes a secondary read, this is a case where it is known to be a primary read. The parallel cursor created uses the isSecondaryQuery function call within DBClientReplicaSet which considers a primaryPreferred read preference to be a secondary read, thus allowing you to skip all the shard versioning checks (as per SERVER-7246).
In other words, is this just reading from the member as a secondary before the mongos instance realizes it is now "primary", to meet the readPref? If so, this sounds more like an issue of improving the discovery of the primary on the mongos instance.
No. The MongoS is aware that this query is reading from a primary according to the logs below.
I suspect that the failure is due to the changes in SERVER-7246 in combination with the isSecondaryQuery behaviour above. This means we will correctly issue a query to the new known primary, but skip the steps involved with shard versioning. Below is the first query issued after the failure case described above at log level 5. You can see that we correctly detect the new primary, but don't issue an SSV on this connection.
2015-01-05T11:12:13.121+1100 D SHARDING [conn5] Request::process begin ns: test.t2 msg id: 11 op: 2004 attempt: 0
|
2015-01-05T11:12:13.122+1100 D SHARDING [conn5] query: test.t2 { query: { x: 110.0 }, $readPreference: { mode: "primaryPreferred" } } ntoreturn: 0 options: 0
|
2015-01-05T11:12:13.122+1100 D NETWORK [conn5] creating pcursor over QSpec { ns: "test.t2", n2skip: 0, n2return: 0, options: 0, query: { query: { x: 110.0 }, $readPreference: { mode: "primaryPreferred" } }, fields: {} } and CInfo { v_ns: "", filter: {} }
|
2015-01-05T11:12:13.122+1100 D QUERY [conn5] [QLOG] Beginning planning...
|
=============================
|
Options = NO_TABLE_SCAN
|
Canonical query:
|
ns=test.t2 limit=0 skip=0
|
Tree: x == 110.0
|
Sort: {}
|
Proj: {}
|
=============================
|
2015-01-05T11:12:13.122+1100 D QUERY [conn5] [QLOG] Index 0 is kp: { x: "hashed" }
|
2015-01-05T11:12:13.122+1100 D QUERY [conn5] [QLOG] Predicate over field 'x'
|
2015-01-05T11:12:13.123+1100 D QUERY [conn5] [QLOG] Relevant index 0 is kp: { x: "hashed" }
|
2015-01-05T11:12:13.123+1100 D QUERY [conn5] Relevant index 0 is kp: { x: "hashed" }
|
2015-01-05T11:12:13.123+1100 D QUERY [conn5] [QLOG] Rated tree:
|
x == 110.0 || First: 0 notFirst: full path: x
|
2015-01-05T11:12:13.123+1100 D QUERY [conn5] [QLOG] Tagging memoID 1
|
2015-01-05T11:12:13.123+1100 D QUERY [conn5] [QLOG] Enumerator: memo just before moving:
|
2015-01-05T11:12:13.124+1100 D QUERY [conn5] [QLOG] About to build solntree from tagged tree:
|
x == 110.0 || Selected Index #0 pos 0
|
2015-01-05T11:12:13.124+1100 D QUERY [conn5] [QLOG] Planner: adding solution:
|
FETCH
|
---filter:
|
x == 110.0 || Selected Index #0 pos 0
|
---fetched = 1
|
---sortedByDiskLoc = 1
|
---getSort = [{}, ]
|
---Child:
|
------IXSCAN
|
---------keyPattern = { x: "hashed" }
|
---------direction = 1
|
---------bounds = field #0['x']: [-541895413742407152, -541895413742407152]
|
---------fetched = 0
|
---------sortedByDiskLoc = 1
|
---------getSort = [{}, ]
|
2015-01-05T11:12:13.124+1100 D QUERY [conn5] [QLOG] Planner: outputted 1 indexed solutions.
|
2015-01-05T11:12:13.124+1100 D NETWORK [conn5] initializing over 1 shards required by [test.t2 @ 15|1||5487a0872504a4716776a56c]
|
2015-01-05T11:12:13.124+1100 D NETWORK [conn5] initializing on shard shard02:shard02/Pixl.local:27021,Pixl.local:27022,Pixl.local:27023, current connection state is { state: {}, retryNext: false, init: false, finish: false, errored: false }
|
2015-01-05T11:12:13.124+1100 D NETWORK [conn5] polling for status of connection to 10.8.1.229:27021, no events
|
2015-01-05T11:12:13.125+1100 D NETWORK [conn5] ReplicaSetMonitor::get shard02
|
2015-01-05T11:12:13.125+1100 D NETWORK [conn5] ReplicaSetMonitor::get shard02
|
2015-01-05T11:12:13.125+1100 D NETWORK [conn5] ReplicaSetMonitor::get shard02
|
2015-01-05T11:12:13.125+1100 D NETWORK [conn5] dbclient_rs say using secondary or tagged node selection in shard02, read pref is { pref: "primary pref", tags: [ {} ] } (primary : Pixl.local:27021, lastTagged : [not cached])
|
2015-01-05T11:12:13.125+1100 D NETWORK [conn5] ReplicaSetMonitor::get shard02
|
2015-01-05T11:12:13.125+1100 D NETWORK [conn5] ReplicaSetMonitor::get shard02
|
2015-01-05T11:12:13.126+1100 D NETWORK [conn5] creating new connection to:Pixl.local:27023
|
2015-01-05T11:12:13.126+1100 D COMMAND [ConnectBG] BackgroundJob starting: ConnectBG
|
2015-01-05T11:12:13.127+1100 D NETWORK [conn5] connected to server Pixl.local:27023 (10.8.1.229)
|
2015-01-05T11:12:13.127+1100 D NETWORK [conn5] connected connection!
|
2015-01-05T11:12:13.127+1100 D NETWORK [conn5] dbclient_rs selecting primary node Pixl.local:27023
|
2015-01-05T11:12:13.127+1100 D NETWORK [conn5] ReplicaSetMonitor::get shard02
|
2015-01-05T11:12:13.127+1100 D NETWORK [conn5] initialized query (lazily) on shard shard02:shard02/Pixl.local:27021,Pixl.local:27022,Pixl.local:27023, current connection state is { state: { conn: "shard02/Pixl.local:27021,Pixl.local:27022,Pixl.local:27023", vinfo: "test.t2 @ 15|1||5487a0872504a4716776a56c", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false }
|
2015-01-05T11:12:13.128+1100 D NETWORK [conn5] finishing over 1 shards
|
2015-01-05T11:12:13.128+1100 D NETWORK [conn5] ReplicaSetMonitor::get shard02
|
2015-01-05T11:12:13.128+1100 D NETWORK [conn5] finishing on shard shard02:shard02/Pixl.local:27021,Pixl.local:27022,Pixl.local:27023, current connection state is { state: { conn: "shard02/Pixl.local:27021,Pixl.local:27022,Pixl.local:27023", vinfo: "test.t2 @ 15|1||5487a0872504a4716776a56c", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: false, errored: false }
|
2015-01-05T11:12:13.128+1100 D NETWORK [conn5] finished on shard shard02:shard02/Pixl.local:27021,Pixl.local:27022,Pixl.local:27023, current connection state is { state: { conn: "(done)", vinfo: "test.t2 @ 15|1||5487a0872504a4716776a56c", cursor: "(empty)", count: 0, done: false }, retryNext: false, init: true, finish: true, errored: false }
|
2015-01-05T11:12:13.128+1100 D SHARDING [conn5] Request::process end ns: test.t2 msg id: 11 op: 2004 attempt: 0 6ms
|
Later in the logs (after issuing 2 more queries) we see that SSV being issued correctly, returning the error, updating and then pointing at the correct shard.
2015-01-05T11:12:14.094+1100 D NETWORK [conn5] initializing over 1 shards required by [test.t2 @ 15|1||5487a0872504a4716776a56c]
|
2015-01-05T11:12:14.094+1100 D NETWORK [conn5] initializing on shard shard02:shard02/Pixl.local:27021,Pixl.local:27022,Pixl.local:27023, current connection state is { state: {}, retryNext: false, init: false, finish: false, errored: false }
|
2015-01-05T11:12:14.094+1100 D NETWORK [conn5] ReplicaSetMonitor::get shard02
|
2015-01-05T11:12:14.094+1100 D NETWORK [conn5] ReplicaSetMonitor::get shard02
|
2015-01-05T11:12:14.094+1100 D SHARDING [conn5] setting shard version of 15|0||5487a0872504a4716776a56c for test.t2 on shard shard02:shard02/Pixl.local:27021,Pixl.local:27022,Pixl.local:27023
|
2015-01-05T11:12:14.094+1100 D SHARDING [conn5] last version sent with chunk manager iteration 0, current chunk manager iteration is 7
|
2015-01-05T11:12:14.095+1100 D SHARDING [conn5] setShardVersion shard02 Pixl.local:27023 test.t2 { setShardVersion: "test.t2", configdb: "Pixl.local:27024", shard: "shard02", shardHost: "shard02/Pixl.local:27021,Pixl.local:27022,Pixl.local:27023", version: Timestamp 15000|0, versionEpoch: ObjectId('5487a0872504a4716776a56c') } 7
|
2015-01-05T11:12:14.095+1100 D SHARDING [conn5] setShardVersion failed!
|
{ need_authoritative: true, ok: 0.0, errmsg: "first setShardVersion" }
|
2015-01-05T11:12:14.095+1100 D NETWORK [conn5] ReplicaSetMonitor::get shard02
|
2015-01-05T11:12:14.096+1100 D SHARDING [conn5] loading chunk manager for collection test.t2 using old chunk manager w/ version 15|1||5487a0872504a4716776a56c and 100 chunks
|
2015-01-05T11:12:14.096+1100 D SHARDING [conn5] major version query from 15|1||5487a0872504a4716776a56c and over 2 shards is { query: { ns: "test.t2", lastmod: { $gte: Timestamp 15000|1 } }, orderby: { lastmod: 1 } }
|
2015-01-05T11:12:14.096+1100 D SHARDING [conn5] found 3 new chunks for collection test.t2 (tracking 3), new version is 16|1||5487a0872504a4716776a56c
|
2015-01-05T11:12:14.097+1100 D SHARDING [conn5] loaded 3 chunks into new chunk manager for test.t2 with version 16|1||5487a0872504a4716776a56c
|
2015-01-05T11:12:14.097+1100 I SHARDING [conn5] ChunkManager: time to load chunks for test.t2: 1ms sequenceNumber: 8 version: 16|1||5487a0872504a4716776a56c based on: 15|1||5487a0872504a4716776a56c
|
I've attached the full 2.8.0-RC4 log
|