Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
None
-
None
-
ALL
Description
Cluster has 5 shard. A total of collection 3000+
Global environment
system: centos6.6
shard 1 primary 2 secondary 2 arb
mongos 3.0.8
mongod 3.0.8 wt engine snapp compress 1 primary 2 secondary 2 arb
mongo-configure 3.0.8 mmap engine
At the time point around 2016-05-20T20:08:36.657, it seems that the primary mongod of one shard entered a unstable situation. It blocked
almost all requests from the client for about 10 seconds (BTW, we're using mongo-java-driver 3.2.0).
And we also tried to do some simple log analysis. We tried to count each command for every seconds. (Usage is 'command=count')
We're not sure but we guess there is something wrong with our replcaSet.
Fri May 20 20:08:35 CST 2016 {count=17, delete=1, findAndModify=202, insert=335, isMaster=4, logout=1, ping=1, replSetHeartbeat=3, replSetUpdatePosition=247, saslContinue=2, saslStart=1, serverStatus=1, setShardVersion=2, top=1, update=3}
|
Fri May 20 20:08:36 CST 2016 {count=19, delete=1, findAndModify=137, insert=178, isMaster=2, logout=2, replSetHeartbeat=1, replSetUpdatePosition=171, saslContinue=4, saslStart=2, serverStatus=1, setShardVersion=28, update=3}
|
Fri May 20 20:08:37 CST 2016 {isMaster=1, replSetHeartbeat=3, setShardVersion=159}
|
Fri May 20 20:08:38 CST 2016 {isMaster=3, replSetHeartbeat=1, saslContinue=196, saslStart=98, setShardVersion=216}
|
Fri May 20 20:08:39 CST 2016 {ping=1, replSetHeartbeat=3}
|
Fri May 20 20:08:40 CST 2016 {isMaster=2, ping=1, replSetHeartbeat=1}
|
Fri May 20 20:08:41 CST 2016 {isMaster=1, replSetHeartbeat=3}
|
Fri May 20 20:08:42 CST 2016 {isMaster=1, replSetHeartbeat=1}
|
Fri May 20 20:08:43 CST 2016 {isMaster=20, replSetHeartbeat=3}
|
Fri May 20 20:08:44 CST 2016 {isMaster=180, ping=1, replSetHeartbeat=1}
|
Fri May 20 20:08:45 CST 2016 {isMaster=222, ping=1, replSetHeartbeat=3}
|
Fri May 20 20:08:46 CST 2016 {isMaster=175, replSetHeartbeat=1}
|
Fri May 20 20:08:47 CST 2016 {isMaster=161, replSetHeartbeat=3}
|
Fri May 20 20:08:48 CST 2016 {attempted=1, count=108, delete=7, findAndModify=476, insert=827, isMaster=199, replSetHeartbeat=1, replSetUpdatePosition=122, saslContinue=310, saslStart=609, serverStatus=1, setShardVersion=934, top=1, update=14}
|
Fri May 20 20:08:49 CST 2016 {count=133, delete=10, findAndModify=503, insert=537, isMaster=387, logout=1, ping=1, replSetHeartbeat=3, replSetUpdatePosition=204, saslContinue=2049, saslStart=1117, serverStatus=1, setShardVersion=3790, update=16}
|
We're expecting for your help. Thanks in advance.And we are also wondering to know more details of 'setShardVersion' command. What's the purpose of this command, and when will be triggered?
The following are raw messages from log file:
2016-05-20T20:08:36.647+0800 I QUERY [conn36825] query xxx.aaa query: {.... } 2ms
|
2016-05-20T20:08:36.648+0800 I COMMAND [conn33760] command xxx.$cmd command: findAndModify { .....} 0ms
|
2016-05-20T20:08:36.648+0800 I QUERY [conn162] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:169950952477 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:879 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 3 } } } 8ms
|
2016-05-20T20:08:36.649+0800 I QUERY [conn158] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:168949710400 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:302 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } 0ms
|
2016-05-20T20:08:36.650+0800 I COMMAND [conn163] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('565f116c85ef5155d15a0d14'), optime: Timestamp 1463746070000|112, memberId: 0, cfgver: 7, config: { _id: 0, host: "10.6.7.115:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('5720be08143133e67c0fb1a8'), optime: Timestamp 1463585585000|3, memberId: 1, cfgver: 7, config: { _id: 1, host: "10.6.7.116:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('56dd435072b07953b7619bcc'), optime: Timestamp 1463746116000|302, memberId: 5, cfgver: 7, config: { _id: 5, host: "10.6.7.129:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } } ] } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
|
2016-05-20T20:08:36.650+0800 I COMMAND [conn163] command xxx.aaa command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('565f116c85ef5155d15a0d14'), optime: Timestamp 1463746070000|112, memberId: 0, cfgver: 7, config: { _id: 0, host: "10.6.7.115:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('5720be08143133e67c0fb1a8'), optime: Timestamp 1463585585000|3, memberId: 1, cfgver: 7, config: { _id: 1, host: "10.6.7.116:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('56dd435072b07953b7619bcc'), optime: Timestamp 1463746116000|303, memberId: 5, cfgver: 7, config: { _id: 5, host: "10.6.7.129:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } } ] } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
|
2016-05-20T20:08:36.650+0800 I QUERY [conn36825] query ....... 0ms
|
2016-05-20T20:08:36.655+0800 I QUERY [conn36825] query ....... 0ms
|
2016-05-20T20:08:36.655+0800 I QUERY [conn23356] query ....... 0ms
|
2016-05-20T20:08:36.655+0800 I WRITE [conn26584] insert ...... 0ms
|
2016-05-20T20:08:36.656+0800 I COMMAND [conn26584] command bbb.xxx command: insert ... 0ms
|
2016-05-20T20:08:36.656+0800 I QUERY [conn162] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:169950952477 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:513 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, oplog: { acquireCount: { r: 2 } } } 4ms
|
2016-05-20T20:08:36.656+0800 I QUERY [conn158] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:168949710400 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:513 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, oplog: { acquireCount: { r: 2 } } } 4ms
|
2016-05-20T20:08:36.656+0800 I QUERY [conn33760] query bbb.xxx query: { userId: 28344562 } planSummary: IXSCAN { userId: 1.0 } ntoreturn:0 ntoskip:0 nscanned:5 nscannedObjects:5 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:5 reslen:1985 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 1ms
|
2016-05-20T20:08:36.657+0800 I COMMAND [conn163] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('565f116c85ef5155d15a0d14'), optime: Timestamp 1463746070000|112, memberId: 0, cfgver: 7, config: { _id: 0, host: "10.6.7.115:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('5720be08143133e67c0fb1a8'), optime: Timestamp 1463585585000|3, memberId: 1, cfgver: 7, config: { _id: 1, host: "10.6.7.116:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('56dd435072b07953b7619bcc'), optime: Timestamp 1463746116000|304, memberId: 5, cfgver: 7, config: { _id: 5, host: "10.6.7.129:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } } ] } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
|
2016-05-20T20:08:36.747+0800 I COMMAND [conn34681] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|14, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.749+0800 I COMMAND [conn25365] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|14, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:289 locks:{} 0ms
|
2016-05-20T20:08:36.749+0800 I COMMAND [conn25816] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.751+0800 I NETWORK [conn25365] end connection 10.6.7.111:39415 (1265 connections now open)
|
2016-05-20T20:08:36.753+0800 I COMMAND [conn36650] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.767+0800 I COMMAND [conn31145] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.776+0800 I COMMAND [conn349] command admin.$cmd command: setShardVersion { setShardVersion: "yyy.cccc", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 644000|682, versionEpoch: ObjectId('55e46bd095f6e6a11b57c560') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.776+0800 I COMMAND [conn34452] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.793+0800 I COMMAND [conn36680] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.807+0800 I COMMAND [conn36643] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.808+0800 I COMMAND [conn34662] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.823+0800 I COMMAND [conn8764] command admin.$cmd command: logout { logout: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
|
2016-05-20T20:08:36.823+0800 I COMMAND [conn8764] command admin.$cmd command: isMaster { ismaster: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:539 locks:{} 0ms
|
2016-05-20T20:08:36.826+0800 I COMMAND [conn31256] command admin.$cmd command: setShardVersion { setShardVersion: "vox-exam.afentiWrongQuestionLibrary", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 644000|682, versionEpoch: ObjectId('55e46bd095f6e6a11b57c560') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.838+0800 I COMMAND [conn19766] command admin.$cmd command: setShardVersion { setShardVersion: "vox-exam.afentiWrongQuestionLibrary", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 644000|682, versionEpoch: ObjectId('55e46bd095f6e6a11b57c560') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.853+0800 I NETWORK [conn50144] end connection 10.6.7.115:46528 (1264 connections now open)
|
2016-05-20T20:08:36.854+0800 I NETWORK [initandlisten] connection accepted from 10.6.7.115:46532 #50148 (1265 connections now open)
|
2016-05-20T20:08:36.878+0800 I COMMAND [conn25438] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
|
2016-05-20T20:08:36.880+0800 I COMMAND [conn50148] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: "xxx" } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 22ms
|