[SERVER-17921] mongo cluster have five shards.The peak of the business,One server of the cluster load suddenly increased, the abnormal. Created: 08/Apr/15  Updated: 16/Apr/15  Resolved: 16/Apr/15

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: 2.6.7
Fix Version/s: None

Type: Question Priority: Critical - P2
Reporter: liangzhang Assignee: Unassigned
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File cpu.png     Text File cpuinfo.txt     Text File crash-0407-09-mongodb.log     Text File crash-0407-09-sharding.log     Text File interrupts1.log     JPEG File proc_interrupts.jpeg     PNG File soft_irqs.png     Text File softirqs1.log     Text File top1.log     Text File vmstat1.log    
Participants:

 Description   

mongos's log:

2015-04-07T19:20:42.844+0800 [conn1418] warning: Failed to connect to xxx.xxx.xxx.xxx:xxxx, reason: errno:106 Transport endpoint is already connected
2015-04-07T19:20:44.561+0800 [conn2562] warning: Failed to connect to xxx.xxx.xxx.xxx:xxxx, reason: errno:106 Transport endpoint is already connected
2015-04-07T19:20:44.563+0800 [conn2692] warning: Failed to connect to xxx.xxx.xxx.xxx:xxxx, reason: errno:106 Transport endpoint is already connected
2015-04-07T19:20:44.564+0800 [conn2971] warning: Failed to connect to xxx.xxx.xxx.xxx:xxxx, reason: errno:106 Transport endpoint is already connected
2015-04-07T19:20:44.569+0800 [conn2656] warning: Failed to connect to xxx.xxx.xxx.xxx:xxxx, reason: errno:106 Transport endpoint is already connected
2015-04-07T19:20:44.570+0800 [conn2376] warning: Failed to connect to xxx.xxx.xxx.xxx:xxxx, reason: errno:106 Transport endpoint is already connected
2015-04-07T19:20:44.574+0800 [conn2957] warning: Failed to connect to xxx.xxx.xxx.xxx:xxxx, reason: errno:106 Transport endpoint is already connected
2015-04-07T19:20:44.575+0800 [conn2860] warning: Primary for shard1/xxx.xxx.xxx.xxx:xxxx was down before, bypassing setShardVersio
n. The local replica set view and targeting may be stale.
2015-04-07T19:20:44.578+0800 [conn347] warning: Failed to connect to xxx.xxx.xxx.xxx:xxxx, reason: errno:106 Transport endpoint is already connected
2015-04-07T19:20:44.579+0800 [conn1177] warning: Failed to connect to xxx.xxx.xxx.xxx:xxxx, reason: errno:106 Transport endpoint is already connected
2015-04-07T19:20:44.589+0800 [conn2669] warning: Primary for shard3/xxx.xxx.xxx.xxx:xxxx was down before, bypassing setShardVersio
n. The local replica set view and targeting may be stale.
2015-04-07T19:20:44.591+0800 [conn82] warning: Primary for shard3/xxx.xxx.xxx.xxx:xxxx was down before, bypassing setShardVersion.
 The local replica set view and targeting may be stale.
2015-04-07T19:20:44.600+0800 [conn2860] warning: Primary for shard1/xxx.xxx.xxx.xxx:xxxx was down before, bypassing setShardVersio
n. The local replica set view and targeting may be stale.

java's log:

2015-04-07T19:20:36+08:00 xxxx xxxxx [ERROR] {c.a.d.r.filter.ExceptionFilter} -  [DUBBO] Got unchecked and undeclared exception which called by xxx.xxx.xxx.xxx. service: , exception: org.springframework.dao.DataAccessResourceFailureException: Exception opening the socket; nested exception is com.mongodb.MongoException$Network: Exception opening the socket, dubbo version: 2.4.9_ZIBO_1.0.1, current host: xxx.xxx.xxx.xxx 	org.springframework.dao.DataAccessResourceFailureException: Exception opening the socket; nested exception is com.mongodb.MongoException$Network: Exception opening the socket
		at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:56)
		at org.springframework.data.mongodb.core.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:1828)
		at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1711)
		at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1522)
		at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1506)
		at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:532)
	Caused by: com.mongodb.MongoException$Network: Exception opening the socket
		at com.mongodb.DBPort.<init>(DBPort.java:117)
		at com.mongodb.DBPort.<init>(DBPort.java:95)
		at com.mongodb.DBPortFactory.create(DBPortFactory.java:28)
		at com.mongodb.PooledConnectionProvider$ConnectionItemFactory.create(PooledConnectionProvider.java:186)
		at com.mongodb.PooledConnectionProvider$ConnectionItemFactory.create(PooledConnectionProvider.java:183)
		at com.mongodb.ConcurrentPool.createNewAndReleasePermitIfFailure(ConcurrentPool.java:150)
		at com.mongodb.ConcurrentPool.get(ConcurrentPool.java:118)
		at com.mongodb.PooledConnectionProvider.get(PooledConnectionProvider.java:75)
		at com.mongodb.DefaultServer.getConnection(DefaultServer.java:61)
		at com.mongodb.BaseCluster$WrappedServer.getConnection(BaseCluster.java:254)
		at com.mongodb.DBTCPConnector$MyPort.getConnection(DBTCPConnector.java:505)
		at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:448)
		at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:284)
		at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:269)
		at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:84)
		at com.mongodb.TickableDBCollectionImpl.find(TickableDBCollectionImpl.java:78)
		at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:66)
		at com.mongodb.DBCursor._check(DBCursor.java:498)
		at com.mongodb.DBCursor._hasNext(DBCursor.java:621)
		at com.mongodb.DBCursor.hasNext(DBCursor.java:657)
		at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1697)
		at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1522)
		at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1506)
		at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:532)
		at com.mongodb.DBPort.ensureOpen(DBPort.java:287)
		at com.mongodb.DBPort.<init>(DBPort.java:113)
		at com.mongodb.DBPort.<init>(DBPort.java:95)
		at com.mongodb.DBPortFactory.create(DBPortFactory.java:28)
		at com.mongodb.PooledConnectionProvider$ConnectionItemFactory.create(PooledConnectionProvider.java:186)
		at com.mongodb.PooledConnectionProvider$ConnectionItemFactory.create(PooledConnectionProvider.java:183)
		at com.mongodb.ConcurrentPool.createNewAndReleasePermitIfFailure(ConcurrentPool.java:150)
		at com.mongodb.ConcurrentPool.get(ConcurrentPool.java:118)
		at com.mongodb.PooledConnectionProvider.get(PooledConnectionProvider.java:75)
		at com.mongodb.DefaultServer.getConnection(DefaultServer.java:61)
		at com.mongodb.BaseCluster$WrappedServer.getConnection(BaseCluster.java:254)
		at com.mongodb.DBTCPConnector$MyPort.getConnection(DBTCPConnector.java:505)
		at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:448)
		at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:284)
		at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:269)
		at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:84)
		at com.mongodb.TickableDBCollectionImpl.find(TickableDBCollectionImpl.java:78)
		at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:66)
		at com.mongodb.DBCursor._check(DBCursor.java:498)
		at com.mongodb.DBCursor._hasNext(DBCursor.java:621)
		at com.mongodb.DBCursor.hasNext(DBCursor.java:657)
		at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1697)
		at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1522)
		at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1506)
		at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:532)
	java.lang.RuntimeException: org.springframework.dao.DataAccessResourceFailureException: Exception opening the socket; nested exception is com.mongodb.MongoException$Network: Exception opening the socket
org.springframework.dao.DataAccessResourceFailureException: Exception opening the socket; nested exception is com.mongodb.MongoException$Network: Exception opening the socket
	at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:56)
	at org.springframework.data.mongodb.core.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:1828)
	at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1711)
	at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1522)
	at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1506)
	at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:532)
Caused by: com.mongodb.MongoException$Network: Exception opening the socket
	at com.mongodb.DBPort.<init>(DBPort.java:117)
	at com.mongodb.DBPort.<init>(DBPort.java:95)
	at com.mongodb.DBPortFactory.create(DBPortFactory.java:28)
	at com.mongodb.PooledConnectionProvider$ConnectionItemFactory.create(PooledConnectionProvider.java:186)
	at com.mongodb.PooledConnectionProvider$ConnectionItemFactory.create(PooledConnectionProvider.java:183)
	at com.mongodb.ConcurrentPool.createNewAndReleasePermitIfFailure(ConcurrentPool.java:150)
	at com.mongodb.ConcurrentPool.get(ConcurrentPool.java:118)
	at com.mongodb.PooledConnectionProvider.get(PooledConnectionProvider.java:75)
	at com.mongodb.DefaultServer.getConnection(DefaultServer.java:61)
	at com.mongodb.BaseCluster$WrappedServer.getConnection(BaseCluster.java:254)
	at com.mongodb.DBTCPConnector$MyPort.getConnection(DBTCPConnector.java:505)
	at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:448)
	at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:284)
	at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:269)
	at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:84)
	at com.mongodb.TickableDBCollectionImpl.find(TickableDBCollectionImpl.java:78)
	at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:66)
	at com.mongodb.DBCursor._check(DBCursor.java:498)
	at com.mongodb.DBCursor._hasNext(DBCursor.java:621)
	at com.mongodb.DBCursor.hasNext(DBCursor.java:657)
	at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1697)
	at com.mongodb.DBPort.ensureOpen(DBPort.java:287)
	at com.mongodb.DBPort.<init>(DBPort.java:113)
	java.lang.RuntimeException: org.springframework.dao.DataAccessResourceFailureException: Exception opening the socket; nested exception is com.
mongodb.MongoException$Network: Exception opening the socket
org.springframework.dao.DataAccessResourceFailureException: Exception opening the socket; nested exception is com.mongodb.MongoException$Network: Exception opening the socket
	at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:56)
	at org.springframework.data.mongodb.core.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:1828)
	at org.springframework.data.mongodb.core.MongoTemplate.executeFindOneInternal(MongoTemplate.java:1658)
	at org.springframework.data.mongodb.core.MongoTemplate.doFindAndModify(MongoTemplate.java:1586)
	at org.springframework.data.mongodb.core.MongoTemplate.findAndModify(MongoTemplate.java:615)
	at org.springframework.data.mongodb.core.MongoTemplate.findAndModify(MongoTemplate.java:610)
	at com.voxlearning.utopia.dao.mongo.support.AbstractMongoDao.updateById(AbstractMongoDao.java:272)
	at com.voxlearning.utopia.dao.mongo.support.AbstractMongoDao.updateById(AbstractMongoDao.java:246)
Caused by: com.mongodb.MongoException$Network: Exception opening the socket
	at com.mongodb.DBPort.<init>(DBPort.java:117)
	at com.mongodb.DBPort.<init>(DBPort.java:95)
	at com.mongodb.DBPortFactory.create(DBPortFactory.java:28)
	at com.mongodb.PooledConnectionProvider$ConnectionItemFactory.create(PooledConnectionProvider.java:186)
	at com.mongodb.PooledConnectionProvider$ConnectionItemFactory.create(PooledConnectionProvider.java:183)
	at com.mongodb.ConcurrentPool.createNewAndReleasePermitIfFailure(ConcurrentPool.java:150)
	at com.mongodb.ConcurrentPool.get(ConcurrentPool.java:118)
	at com.mongodb.PooledConnectionProvider.get(PooledConnectionProvider.java:75)
	at com.mongodb.DefaultServer.getConnection(DefaultServer.java:61)
	at com.mongodb.BaseCluster$WrappedServer.getConnection(BaseCluster.java:254)
	at com.mongodb.DBTCPConnector$MyPort.getConnection(DBTCPConnector.java:505)
	at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:448)
	at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:284)
	at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:269)
	at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:84)
	at com.mongodb.TickableDBCollectionImpl.find(TickableDBCollectionImpl.java:78)
	at com.mongodb.DB.command(DB.java:320)
	at com.mongodb.TickableDBApiLayer.command(TickableDBApiLayer.java:56)
	at com.mongodb.DB.command(DB.java:299)
	at com.mongodb.DB.command(DB.java:374)
	at com.mongodb.DB.command(DB.java:246)
	at com.mongodb.DBCollection.findAndModify(DBCollection.java:480)
	at com.mongodb.DBCollection.findAndModify(DBCollection.java:424)
	at org.springframework.data.mongodb.core.MongoTemplate$FindAndModifyCallback.doInCollection(MongoTemplate.java:1967)
	at org.springframework.data.mongodb.core.MongoTemplate$FindAndModifyCallback.doInCollection(MongoTemplate.java:1949)
	at org.springframework.data.mongodb.core.MongoTemplate.executeFindOneInternal(MongoTemplate.java:1654)
	at com.mongodb.DBPort.ensureOpen(DBPort.java:287)
	at com.mongodb.DBPort.<init>(DBPort.java:113)



 Comments   
Comment by Ramon Fernandez Marina [ 16/Apr/15 ]

Thanks for the update wxiaoguang@gmail.com. Indeed we do recommend disabling transparent huge pages to run MongoDB.

Comment by Xiaoguang Wang [ 16/Apr/15 ]

We disable the linux huge page feature, it seems that the problem disappears.

Comment by Xiaoguang Wang [ 09/Apr/15 ]

The attached logs&graphs may not be at the same moment, but the problems are all the same.

The most suspicious thing is that the interrupts of TLB&LOC are extremely high when the problem occurs.

Comment by Xiaoguang Wang [ 08/Apr/15 ]

(google group post: https://groups.google.com/forum/#!topic/mongodb-user/mMkpdp2jSbw )

Some instances of our mongodb cluster hang randomly.
Our system is CentOS 6.5 x86_64, mongodb cluster 2.6.7
The storage are PCI-E flash cards

When an instance hangs, the CPU (%sy) is very high,
and it seems that the mongos consumes 7000% CPU usage.

Normally, there are seldom slow queries.
The mongo cluster hangs first, then the queries become slow.
When the mongo cluster hangs critically, all connections between mongos&app, and mongos&mongod disconnect

It seems that if we reboot the machine, the problem disappears for a while, then after some time, the problem becomes more and more serious.

Generated at Thu Feb 08 03:46:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.