-
Type:
Bug
-
Resolution: Done
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Admin
-
None
-
ALL
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Cluster has 5 shard. A total of collection 3000+
Global environment
system: centos6.6
shard 1 primary 2 secondary 2 arb
mongos 3.0.8
mongod 3.0.8 wt engine snapp compress 1 primary 2 secondary 2 arb
mongo-configure 3.0.8 mmap engine
At the time point around 2016-05-20T20:08:36.657, it seems that the primary mongod of one shard entered a unstable situation. It blocked
almost all requests from the client for about 10 seconds (BTW, we're using mongo-java-driver 3.2.0).
And we also tried to do some simple log analysis. We tried to count each command for every seconds. (Usage is 'command=count')
We're not sure but we guess there is something wrong with our replcaSet.
Fri May 20 20:08:35 CST 2016 {count=17, delete=1, findAndModify=202, insert=335, isMaster=4, logout=1, ping=1, replSetHeartbeat=3, replSetUpdatePosition=247, saslContinue=2, saslStart=1, serverStatus=1, setShardVersion=2, top=1, update=3}
Fri May 20 20:08:36 CST 2016 {count=19, delete=1, findAndModify=137, insert=178, isMaster=2, logout=2, replSetHeartbeat=1, replSetUpdatePosition=171, saslContinue=4, saslStart=2, serverStatus=1, setShardVersion=28, update=3}
Fri May 20 20:08:37 CST 2016 {isMaster=1, replSetHeartbeat=3, setShardVersion=159}
Fri May 20 20:08:38 CST 2016 {isMaster=3, replSetHeartbeat=1, saslContinue=196, saslStart=98, setShardVersion=216}
Fri May 20 20:08:39 CST 2016 {ping=1, replSetHeartbeat=3}
Fri May 20 20:08:40 CST 2016 {isMaster=2, ping=1, replSetHeartbeat=1}
Fri May 20 20:08:41 CST 2016 {isMaster=1, replSetHeartbeat=3}
Fri May 20 20:08:42 CST 2016 {isMaster=1, replSetHeartbeat=1}
Fri May 20 20:08:43 CST 2016 {isMaster=20, replSetHeartbeat=3}
Fri May 20 20:08:44 CST 2016 {isMaster=180, ping=1, replSetHeartbeat=1}
Fri May 20 20:08:45 CST 2016 {isMaster=222, ping=1, replSetHeartbeat=3}
Fri May 20 20:08:46 CST 2016 {isMaster=175, replSetHeartbeat=1}
Fri May 20 20:08:47 CST 2016 {isMaster=161, replSetHeartbeat=3}
Fri May 20 20:08:48 CST 2016 {attempted=1, count=108, delete=7, findAndModify=476, insert=827, isMaster=199, replSetHeartbeat=1, replSetUpdatePosition=122, saslContinue=310, saslStart=609, serverStatus=1, setShardVersion=934, top=1, update=14}
Fri May 20 20:08:49 CST 2016 {count=133, delete=10, findAndModify=503, insert=537, isMaster=387, logout=1, ping=1, replSetHeartbeat=3, replSetUpdatePosition=204, saslContinue=2049, saslStart=1117, serverStatus=1, setShardVersion=3790, update=16}
We're expecting for your help. Thanks in advance.And we are also wondering to know more details of 'setShardVersion' command. What's the purpose of this command, and when will be triggered?
The following are raw messages from log file:
2016-05-20T20:08:36.647+0800 I QUERY [conn36825] query xxx.aaa query: {.... } 2ms
2016-05-20T20:08:36.648+0800 I COMMAND [conn33760] command xxx.$cmd command: findAndModify { .....} 0ms
2016-05-20T20:08:36.648+0800 I QUERY [conn162] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:169950952477 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:879 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 3 } } } 8ms
2016-05-20T20:08:36.649+0800 I QUERY [conn158] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:168949710400 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:302 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } 0ms
2016-05-20T20:08:36.650+0800 I COMMAND [conn163] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('565f116c85ef5155d15a0d14'), optime: Timestamp 1463746070000|112, memberId: 0, cfgver: 7, config: { _id: 0, host: "10.6.7.115:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('5720be08143133e67c0fb1a8'), optime: Timestamp 1463585585000|3, memberId: 1, cfgver: 7, config: { _id: 1, host: "10.6.7.116:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('56dd435072b07953b7619bcc'), optime: Timestamp 1463746116000|302, memberId: 5, cfgver: 7, config: { _id: 5, host: "10.6.7.129:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } } ] } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
2016-05-20T20:08:36.650+0800 I COMMAND [conn163] command xxx.aaa command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('565f116c85ef5155d15a0d14'), optime: Timestamp 1463746070000|112, memberId: 0, cfgver: 7, config: { _id: 0, host: "10.6.7.115:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('5720be08143133e67c0fb1a8'), optime: Timestamp 1463585585000|3, memberId: 1, cfgver: 7, config: { _id: 1, host: "10.6.7.116:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('56dd435072b07953b7619bcc'), optime: Timestamp 1463746116000|303, memberId: 5, cfgver: 7, config: { _id: 5, host: "10.6.7.129:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } } ] } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
2016-05-20T20:08:36.650+0800 I QUERY [conn36825] query ....... 0ms
2016-05-20T20:08:36.655+0800 I QUERY [conn36825] query ....... 0ms
2016-05-20T20:08:36.655+0800 I QUERY [conn23356] query ....... 0ms
2016-05-20T20:08:36.655+0800 I WRITE [conn26584] insert ...... 0ms
2016-05-20T20:08:36.656+0800 I COMMAND [conn26584] command bbb.xxx command: insert ... 0ms
2016-05-20T20:08:36.656+0800 I QUERY [conn162] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:169950952477 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:513 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, oplog: { acquireCount: { r: 2 } } } 4ms
2016-05-20T20:08:36.656+0800 I QUERY [conn158] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:168949710400 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:513 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, oplog: { acquireCount: { r: 2 } } } 4ms
2016-05-20T20:08:36.656+0800 I QUERY [conn33760] query bbb.xxx query: { userId: 28344562 } planSummary: IXSCAN { userId: 1.0 } ntoreturn:0 ntoskip:0 nscanned:5 nscannedObjects:5 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:5 reslen:1985 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 1ms
2016-05-20T20:08:36.657+0800 I COMMAND [conn163] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('565f116c85ef5155d15a0d14'), optime: Timestamp 1463746070000|112, memberId: 0, cfgver: 7, config: { _id: 0, host: "10.6.7.115:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('5720be08143133e67c0fb1a8'), optime: Timestamp 1463585585000|3, memberId: 1, cfgver: 7, config: { _id: 1, host: "10.6.7.116:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('56dd435072b07953b7619bcc'), optime: Timestamp 1463746116000|304, memberId: 5, cfgver: 7, config: { _id: 5, host: "10.6.7.129:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } } ] } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
2016-05-20T20:08:36.747+0800 I COMMAND [conn34681] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|14, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.749+0800 I COMMAND [conn25365] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|14, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:289 locks:{} 0ms
2016-05-20T20:08:36.749+0800 I COMMAND [conn25816] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.751+0800 I NETWORK [conn25365] end connection 10.6.7.111:39415 (1265 connections now open)
2016-05-20T20:08:36.753+0800 I COMMAND [conn36650] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.767+0800 I COMMAND [conn31145] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.776+0800 I COMMAND [conn349] command admin.$cmd command: setShardVersion { setShardVersion: "yyy.cccc", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 644000|682, versionEpoch: ObjectId('55e46bd095f6e6a11b57c560') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.776+0800 I COMMAND [conn34452] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.793+0800 I COMMAND [conn36680] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.807+0800 I COMMAND [conn36643] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.808+0800 I COMMAND [conn34662] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.823+0800 I COMMAND [conn8764] command admin.$cmd command: logout { logout: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
2016-05-20T20:08:36.823+0800 I COMMAND [conn8764] command admin.$cmd command: isMaster { ismaster: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:539 locks:{} 0ms
2016-05-20T20:08:36.826+0800 I COMMAND [conn31256] command admin.$cmd command: setShardVersion { setShardVersion: "vox-exam.afentiWrongQuestionLibrary", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 644000|682, versionEpoch: ObjectId('55e46bd095f6e6a11b57c560') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.838+0800 I COMMAND [conn19766] command admin.$cmd command: setShardVersion { setShardVersion: "vox-exam.afentiWrongQuestionLibrary", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 644000|682, versionEpoch: ObjectId('55e46bd095f6e6a11b57c560') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.853+0800 I NETWORK [conn50144] end connection 10.6.7.115:46528 (1264 connections now open)
2016-05-20T20:08:36.854+0800 I NETWORK [initandlisten] connection accepted from 10.6.7.115:46532 #50148 (1265 connections now open)
2016-05-20T20:08:36.878+0800 I COMMAND [conn25438] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.880+0800 I COMMAND [conn50148] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: "xxx" } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 22ms