[SERVER-24233] Cluster suddenly hang . Created: 20/May/16  Updated: 20/May/16  Resolved: 20/May/16

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: ufo ofu Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File QQ图片20160520230528.png    
Issue Links:
Duplicate
is duplicated by SERVER-24232 Cluster suddenly hang Closed
Operating System: ALL
Participants:

 Description   

Cluster has 5 shard. A total of collection 3000+
Global environment
system: centos6.6
mongos 3.0.8
mongod 3.0.8 wt engine
mongo-configure 3.0.8 mmap engine
After 2016-05-20T20:08:36.657, the service appears abnormal. The following is a abnormal shard's log:

2016-05-20T20:08:36.647+0800 I QUERY    [conn36825] query xxx.aaa query: {.... } 2ms
2016-05-20T20:08:36.648+0800 I COMMAND  [conn33760] command xxx.$cmd command: findAndModify { .....} 0ms
2016-05-20T20:08:36.648+0800 I QUERY    [conn162] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:169950952477 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:879 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 3 } } } 8ms
2016-05-20T20:08:36.649+0800 I QUERY    [conn158] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:168949710400 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:302 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } 0ms
2016-05-20T20:08:36.650+0800 I COMMAND  [conn163] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('565f116c85ef5155d15a0d14'), optime: Timestamp 1463746070000|112, memberId: 0, cfgver: 7, config: { _id: 0, host: "10.6.7.115:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('5720be08143133e67c0fb1a8'), optime: Timestamp 1463585585000|3, memberId: 1, cfgver: 7, config: { _id: 1, host: "10.6.7.116:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('56dd435072b07953b7619bcc'), optime: Timestamp 1463746116000|302, memberId: 5, cfgver: 7, config: { _id: 5, host: "10.6.7.129:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } } ] } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
2016-05-20T20:08:36.650+0800 I COMMAND  [conn163] command xxx.aaa command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('565f116c85ef5155d15a0d14'), optime: Timestamp 1463746070000|112, memberId: 0, cfgver: 7, config: { _id: 0, host: "10.6.7.115:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('5720be08143133e67c0fb1a8'), optime: Timestamp 1463585585000|3, memberId: 1, cfgver: 7, config: { _id: 1, host: "10.6.7.116:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('56dd435072b07953b7619bcc'), optime: Timestamp 1463746116000|303, memberId: 5, cfgver: 7, config: { _id: 5, host: "10.6.7.129:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } } ] } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
2016-05-20T20:08:36.650+0800 I QUERY    [conn36825]   query ....... 0ms
2016-05-20T20:08:36.655+0800 I QUERY    [conn36825]   query ....... 0ms
2016-05-20T20:08:36.655+0800 I QUERY    [conn23356]   query ....... 0ms
2016-05-20T20:08:36.655+0800 I WRITE    [conn26584]   insert ...... 0ms
2016-05-20T20:08:36.656+0800 I COMMAND  [conn26584] command bbb.xxx command: insert ... 0ms
2016-05-20T20:08:36.656+0800 I QUERY    [conn162] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:169950952477 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:513 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, oplog: { acquireCount: { r: 2 } } } 4ms
2016-05-20T20:08:36.656+0800 I QUERY    [conn158] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1463585585000|3 } } cursorid:168949710400 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:513 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, oplog: { acquireCount: { r: 2 } } } 4ms
2016-05-20T20:08:36.656+0800 I QUERY    [conn33760] query bbb.xxx query: { userId: 28344562 } planSummary: IXSCAN { userId: 1.0 } ntoreturn:0 ntoskip:0 nscanned:5 nscannedObjects:5 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:5 reslen:1985 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 1ms
2016-05-20T20:08:36.657+0800 I COMMAND  [conn163] command admin.$cmd command: replSetUpdatePosition { replSetUpdatePosition: 1, optimes: [ { _id: ObjectId('565f116c85ef5155d15a0d14'), optime: Timestamp 1463746070000|112, memberId: 0, cfgver: 7, config: { _id: 0, host: "10.6.7.115:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('5720be08143133e67c0fb1a8'), optime: Timestamp 1463585585000|3, memberId: 1, cfgver: 7, config: { _id: 1, host: "10.6.7.116:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } }, { _id: ObjectId('56dd435072b07953b7619bcc'), optime: Timestamp 1463746116000|304, memberId: 5, cfgver: 7, config: { _id: 5, host: "10.6.7.129:47003", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } } ] } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
2016-05-20T20:08:36.747+0800 I COMMAND  [conn34681] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|14, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.749+0800 I COMMAND  [conn25365] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|14, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:289 locks:{} 0ms
2016-05-20T20:08:36.749+0800 I COMMAND  [conn25816] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.751+0800 I NETWORK  [conn25365] end connection 10.6.7.111:39415 (1265 connections now open)
2016-05-20T20:08:36.753+0800 I COMMAND  [conn36650] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.767+0800 I COMMAND  [conn31145] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.776+0800 I COMMAND  [conn349] command admin.$cmd command: setShardVersion { setShardVersion: "yyy.cccc", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 644000|682, versionEpoch: ObjectId('55e46bd095f6e6a11b57c560') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.776+0800 I COMMAND  [conn34452] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.793+0800 I COMMAND  [conn36680] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.807+0800 I COMMAND  [conn36643] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.808+0800 I COMMAND  [conn34662] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.823+0800 I COMMAND  [conn8764] command admin.$cmd command: logout { logout: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 0ms
2016-05-20T20:08:36.823+0800 I COMMAND  [conn8764] command admin.$cmd command: isMaster { ismaster: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:539 locks:{} 0ms
2016-05-20T20:08:36.826+0800 I COMMAND  [conn31256] command admin.$cmd command: setShardVersion { setShardVersion: "vox-exam.afentiWrongQuestionLibrary", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 644000|682, versionEpoch: ObjectId('55e46bd095f6e6a11b57c560') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.838+0800 I COMMAND  [conn19766] command admin.$cmd command: setShardVersion { setShardVersion: "vox-exam.afentiWrongQuestionLibrary", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 644000|682, versionEpoch: ObjectId('55e46bd095f6e6a11b57c560') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.853+0800 I NETWORK  [conn50144] end connection 10.6.7.115:46528 (1264 connections now open)
2016-05-20T20:08:36.854+0800 I NETWORK  [initandlisten] connection accepted from 10.6.7.115:46532 #50148 (1265 connections now open)
2016-05-20T20:08:36.878+0800 I COMMAND  [conn25438] command admin.$cmd command: setShardVersion { setShardVersion: "xxx.bbbb", configdb: "10.6.7.112:27001,10.6.7.115:27001,10.6.7.118:27001", shard: "47003", shardHost: "47003/10.6.7.115:47003,10.6.7.116:47003,10.6.7.129:47003", version: Timestamp 5000|26, versionEpoch: ObjectId('573cc54cc19de8acaafe1323') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:146 locks:{} 0ms
2016-05-20T20:08:36.880+0800 I COMMAND  [conn50148] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: "xxx" } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 22ms



 Comments   
Comment by Kelsey Schubert [ 20/May/16 ]

Hi liang.zhang@17zuoye.com,

Thanks for your report. Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

Kind regards,
Thomas

Generated at Thu Feb 08 04:05:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.