[SERVER-6604] slaveOk reads are going to primary Created: 26/Jul/12  Updated: 15/Aug/12  Resolved: 26/Jul/12

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.2.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Pierre Dane Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 64 with c# driver


Operating System: Windows
Participants:

 Description   

After upgrading our 2 secondaries to 2.2 rc0 with master still at 2.0.6, driver starts reading from master as well as secondaries. Switching the secondaries back to 2.0.6 resolves the problem. This could be specific to the c# driver, or might be resolved once the master is also upgraded (waiting for production release before upgrading master).



 Comments   
Comment by Robert Stam [ 26/Jul/12 ]

Pierre, I'm marking this issue as Resolved for now. Please reopen or submit a new ticket if you would like to pursue further in the future.

Comment by Pierre Dane [ 26/Jul/12 ]

ok Thanks for looking into this Robert - I'll wait for the next release and try to replicate this problem again. If I can I will send you full historical logs and mongostat for all three nodes.

Comment by Robert Stam [ 26/Jul/12 ]

I tried to reproduce this but am unable to. My test program which does slaveOk reads is evenly distributing the reads to the secondaries. Here's my mongostat output while running the test:

C:\mongodb\mongodb-win32-x86_64-2.2.0-rc0\bin>mongostat -h Kilimanjaro:10001,Kilimanjaro:10002,Kilimanjaro:10003
 
                        insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn   set repl       time
Kilimanjaro:10001            0      0      0      0       2       2       0  4.17g  8.52g    57m      4  test:0.0%          0       0|0     0|0   296b     2k     9 myset  PRI   12:44:10
Kilimanjaro:10002           *0     *0     *0     *0       0     2|0       0  4.36g  8.89g    40m      1  test:0.0%          0       0|0     0|0   202b     2k     4 myset  SEC   12:44:10
Kilimanjaro:10003           *0     *0     *0     *0       0     2|0       0  4.36g  8.89g    40m      1  test:0.0%          0       0|0     0|0   202b     2k     4 myset  SEC   12:44:10
 
Kilimanjaro:10001            0      0      0      0       0       2       0  4.17g  8.52g    57m      2     .:0.2%          0       0|0     0|0   202b     2k     9 myset  PRI   12:44:11
Kilimanjaro:10002           *0     *0     *0     *0       0     2|0       0  4.36g  8.89g    40m      0  test:0.0%          0       0|0     0|0   202b     2k     4 myset  SEC   12:44:11
Kilimanjaro:10003           *0     *0     *0     *0       0     2|0       0  4.36g  8.89g    40m      0  test:0.0%          0       0|0     0|0   202b     2k     4 myset  SEC   12:44:11
 
Kilimanjaro:10001            0      0      0      0       0       2       0  4.17g  8.52g    57m      9  test:0.0%          0       0|0     0|0   202b     2k     9 myset  PRI   12:44:12
Kilimanjaro:10002           *0     *0     *0     *0       0     2|0       0  4.36g  8.89g    40m      0  test:0.0%          0       0|0     0|0   202b     2k     4 myset  SEC   12:44:12
Kilimanjaro:10003           *0     *0     *0     *0       0     2|0       0  4.36g  8.89g    40m      4  test:0.0%          0       0|0     0|0   202b     2k     4 myset  SEC   12:44:12
 
Kilimanjaro:10001            2      0      0      1       8       7       0  4.17g  8.52g    57m    478  test:0.2%          0       0|0     0|0     1k     4k    10 myset  PRI   12:44:13
Kilimanjaro:10002           *2     84     *0     *0       0     4|0       0  4.36g  8.89g    42m    545  test:0.3%          0       0|0     0|0     4k    11k     5 myset  SEC   12:44:13
Kilimanjaro:10003           *2     84     *0     *0       0     5|0       0  4.36g  8.89g    42m    549  test:0.3%          0       0|0     0|0     4k    11k     5 myset  SEC   12:44:13
 
Kilimanjaro:10001            0      0      2      0       0       2       0  4.17g  8.52g    57m     38  test:0.0%          0       0|0     0|0   202b     2k    10 myset  PRI   12:44:14
Kilimanjaro:10002           *0    988     *0     *0       0     2|0       0  4.36g  8.89g    42m      9  test:0.1%          0       0|0     0|0    43k    95k     5 myset  SEC   12:44:14
Kilimanjaro:10003           *0    988     *0     *0       0     2|0       0  4.36g  8.89g    41m      9  test:0.0%          0       0|0     0|0    43k    95k     5 myset  SEC   12:44:14
 
                        insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn   set repl       time
Kilimanjaro:10001            0      0      0      0       0       2       0  4.17g  8.52g    57m     22     .:0.2%          0       0|0     0|0   202b     2k    10 myset  PRI   12:44:15
Kilimanjaro:10002           *0   1007     *0     *0       0     2|0       0  4.36g  8.89g    40m      9  test:0.0%          0       0|0     0|0    44k    97k     5 myset  SEC   12:44:15
Kilimanjaro:10003           *0   1006     *0     *0       0     2|0       0  4.36g  8.89g    40m      0  test:0.0%          0       0|0     0|0    44k    97k     5 myset  SEC   12:44:15
 
Kilimanjaro:10001            0      0      0      0       0       2       0  4.17g  8.52g    57m      4 local:0.0%          0       0|0     0|0   202b     2k    10 myset  PRI   12:44:17
Kilimanjaro:10002           *0   1039     *0     *0       0     2|0       0  4.36g  8.89g    40m      0  test:0.0%          0       0|0     0|0    45k   100k     5 myset  SEC   12:44:17
Kilimanjaro:10003           *0   1039     *0     *0       0     2|0       0  4.36g  8.89g    40m      9  test:0.1%          0       0|0     0|0    45k   100k     5 myset  SEC   12:44:17
 
Kilimanjaro:10001            0      0      0      0       0       2       0  4.17g  8.52g    57m      4  test:0.0%          0       0|0     0|0   202b     2k    10 myset  PRI   12:44:18
Kilimanjaro:10002           *0    994     *0     *0       0     2|0       0  4.36g  8.89g    40m      0  test:0.0%          0       0|0     0|0    43k    95k     5 myset  SEC   12:44:18
Kilimanjaro:10003           *0    994     *0     *0       0     2|0       0  4.36g  8.89g    40m      0  test:0.0%          0       0|0     0|0    43k    96k     5 myset  SEC   12:44:18
 
Kilimanjaro:10001            0      0      0      0       8       4       0  4.17g  8.52g    59m    471  test:0.1%          0       0|0     0|0   697b     3k    10 myset  PRI   12:44:19
Kilimanjaro:10002           *0    888     *0     *0       0     2|1       0  4.36g  8.89g    43m    653  test:0.3%          0       0|0     0|0    39k    86k     5 myset  SEC   12:44:19
Kilimanjaro:10003           *0    889     *0     *0       0     2|1       0  4.36g  8.89g    40m    659  test:0.4%          0       0|0     0|0    39k    86k     5 myset  SEC   12:44:19
 
Kilimanjaro:10001            0      0      2      0       0       2       0  4.17g  8.52g    57m     53     .:0.4%          0       0|0     0|0   202b     2k    10 myset  PRI   12:44:20
Kilimanjaro:10002           *0     *0     *0     *0       0     2|0       0  4.36g  8.89g    43m      0  test:0.0%          0       0|0     0|0   202b     2k     5 myset  SEC   12:44:20
Kilimanjaro:10003           *0     *0     *0     *0       0     2|0       0  4.36g  8.89g    40m      0     .:0.2%          0       0|0     0|0   202b     2k     5 myset  SEC   12:44:20
^C

Comment by Pierre Dane [ 26/Jul/12 ]

The reads from the secondaries to the primaries during resync would be getmore commands though wouldn't they?

Comment by Pierre Dane [ 26/Jul/12 ]

localhost:3002 *6 1411 *73 *0 0 22|0 0 24.1g 39g 19.5g 167 0 0 0|0 1|0 110k 814k 7325 tagstore SEC 12:39:40
mongo86:3002 *6 1505 *73 *0 0 22|0 0 24.1g 39.1g 19.5g 121 0 0 0|0 0|0 117k 868k 7336 tagstore SEC 12:39:40
mongo87:3002 *7 1611 1|81 *0 2 18|0 0 24.1g 41.9g 22.3g 83 0 0 0|0 0|0 125k 874k 7880 tagstore SEC 12:39:40
mongo88:3002 0 8214 85 0 81 15 0 26g 32.8g 23.4g 209 0 0 0|0 2|1 650k 6m 6636 tagstore M 12:39:40

Comment by Pierre Dane [ 26/Jul/12 ]

I only resynced one a time, and they were both fully synced when I was seeing the issue. I was seeing reads off all three servers, and the reads were not symmetric - the load was swinging wildly

Comment by Robert Stam [ 26/Jul/12 ]

A secondary that is doing a full resync can't be used for reads until it is caught up.

If both of your secondaries are doing a full resync at the same time neither one of them is available for reads, so the driver will send all slaveOk reads to the primary until at least one of the secondaries is caught up.

This would definitely result in a high read load on your primary, because not only are all your application reads going to the primary, but both secondaries are doing lots of reads as well as part of the full resync process.

Comment by Pierre Dane [ 26/Jul/12 ]

We are using v1.5

The following data was captured when upgrading just one secondary to 2.2, with a full resync. i.e. deleted all data and started the secondary. The read load on the master is not high in the mongostat below, but when both secondaries are upgraded the load goes up into the thousands. Feel free to check MMC - you can see the read spike on the primary (mongo88)

STARTUP2> rs.status()
{
"set" : "tagstore",
"date" : ISODate("2012-07-26T14:39:17Z"),
"myState" : 5,
"members" : [
{
"_id" : 0,
"name" : "mongo86:3002",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 128,
"optime" :

{ "t" : 0, "i" : 0 }

,
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"errmsg" : "initial sync cloning db: tagstore",
"self" : true
},
{
"_id" : 1,
"name" : "mongo87:3002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 113,
"optime" :

{ "t" : 1343308613000, "i" : 421 }

,
"optimeDate" : ISODate("2012-07-26T13:16:53Z"),
"lastHeartbeat" : ISODate("2012-07-26T14:39:16Z"),
"pingMs" : 0
},

{ "_id" : 2, "name" : "ship71:3002", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 113, "lastHeartbeat" : ISODate("2012-07-26T14:39:16Z"), "pingMs" : 0 }

,
{
"_id" : 3,
"name" : "mongo88:3002",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 113,
"optime" :

{ "t" : 1343313581000, "i" : 634 }

,
"optimeDate" : ISODate("2012-07-26T14:39:41Z"),
"lastHeartbeat" : ISODate("2012-07-26T14:39:16Z"),
"pingMs" : 0
}
],
"ok" : 1
}

localhost:3002 *0 *0 *0 *0 0 20|0 0 11.2g 18g 732m 3679 15.1 0 1|0 2|0 1k 9k 6794 tagstore UNK 14:39:27
mongo86:3002 *0 *0 *0 *0 0 20|0 0 11.2g 18g 732m 3679 15.1 0 1|0 1|0 1k 6k 6794 tagstore UNK 14:39:27
mongo87com:3002 *12 11211 *74 *0 0 26|0 0 24.1g 26.9g 11.4g 61 0 0 0|0 1|0 864k 7m 2697 tagstore SEC 14:39:27
mongo88:3002 0 135 1073 0 1 19 0 26g 32.8g 23.7g 1258 15.3 0 0|0 1|0 191k 65k 6636 tagstore M 14:39:27

Comment by Robert Stam [ 26/Jul/12 ]

Normally the C# driver only sends slaveOk reads to the primary if no secondary is available. The C# driver uses the ismaster command to query the state of each replica set member and looks for

{ secondary : true }

in the response to determine whether a replica set member is currently a secondary or not, so another useful piece of information would be run the following command in the mongo shell against each secondary that is not receiving the queries you expect:

> db.isMaster()

Comment by Robert Stam [ 26/Jul/12 ]

Can you also please report which version of the C# driver you are currently using? Thanks.

Comment by Pierre Dane [ 26/Jul/12 ]

Hey Scott,

Connecting directly to the replica set.

I've reverted to 2.0.6 but I'll upgrade again and post rs.status – anything else you want me to get while I upgrade for a short period?

Comment by Scott Hernandez (Inactive) [ 26/Jul/12 ]

Are you using sharding with a mongos instance, or is your .net client directly connecting to the replica set?

What does rs.status() show?

Generated at Thu Feb 08 03:12:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.