[SERVER-24407] too many WT Cache Eviction cause server hang Created: 06/Jun/16  Updated: 14/Jul/16  Resolved: 08/Jun/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.6
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: YANG Chenghu Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File diagnostic1.png     File mongo.log.tgz     Text File pstack.log     PNG File screenshot-1.png     Text File top.log    
Issue Links:
Duplicate
duplicates WT-2560 Stuck trying to update oldest transac... Closed
Operating System: ALL
Steps To Reproduce:

every document size is about 500B.
70+ mongoimport processes concurrent working.

wait wt cache full, and happen evcition.

mongostat watch the commands, and pstack mongod

Participants:

 Description   

A lot of process insert into MongodDB,when wt cache full,will trigger eviction,even in MongoDB insert thread.

But evict is too slow, and so many race conditions, CPU Heavy, Server Hang, can't server any request.

 131 Thread 145 (Thread 0x7fb86d464700 (LWP 31157)):
 132 #0  0x0000000001aa35bb in __wt_txn_update_oldest ()
 133 #1  0x0000000001a49a8d in __wt_evict ()
 134 #2  0x0000000001a46a9f in __evict_page ()
 135 #3  0x0000000001a490f6 in __wt_cache_eviction_worker ()
 136 #4  0x0000000001a963b8 in __session_begin_transaction ()
 137 #5  0x00000000010a513c in mongo::WiredTigerRecoveryUnit::_txnOpen(mongo::OperationContext*) ()
 138 #6  0x00000000010a52f7 in mongo::WiredTigerRecoveryUnit::getSession(mongo::OperationContext*) ()
 139 #7  0x00000000010a53ca in mongo::WiredTigerCursor::WiredTigerCursor(std::basic_string<char, std::char_traits<char>, std::al     locator<char> > const&, unsigned long, bool, mongo::OperationContext*) ()
 140 #8  0x000000000109aa11 in mongo::WiredTigerRecordStore::insertRecords(mongo::OperationContext*, std::vector<mongo::Record,      std::allocator<mongo::Record> >*, bool) ()



 Comments   
Comment by Ramon Fernandez Marina [ 08/Jun/16 ]

We're closing this ticket as a duplicate of WT-2560, which as been fixed in 3.2.7.

Regards,
Ramón.

Comment by Ramon Fernandez Marina [ 07/Jun/16 ]

ych.tiger@gmail.com, a colleague points out that this could be a manifestation of WT-2560, which was fixed in 3.2.7. Can you please upgrade to MongoDB 3.2.7 and see if the problem you're observing is fixed?

Thanks,
Ramón.

Comment by Yaoxing Zhang [ 06/Jun/16 ]

Attached the mongodb.log and diagnostic.data

Comment by Yaoxing Zhang [ 06/Jun/16 ]

By "server hang" I think ych.tiger@gmail.com means no requests can be proceeded. I attached a photo, which shows when eviction happens all CRUD counters reduced to 0.

Comment by Yaoxing Zhang [ 06/Jun/16 ]

CPU: e7-4807@1.87ghz 48 cores
Memory: 252G
WiredTigerCache: 10G
Hard Disk: tried various kinds of hard drives. SSD, SAS, with or without RAID 5/10, same result
Filesystem: XFS
Deployment: tried standalone, replica set, sharded cluster, same result

numactl --interleave=all mongod -f mongoDB.config --dbpath /data/web-mongodb/db_27017 --logpath /data/web-mongodb/log_27017/mongodb.log --port 27017 --wiredTigerCacheSizeGB 10 #

systemLog:
verbosity: 0 #日志粒度
logAppend: true #追加日志
logRotate: "rename" #logRotate执行结果
destination: "file" #日志格式
processManagement:
fork: true #后台运行
net:
maxIncomingConnections: 819 #最多连接数
setParameter:
failIndexKeyTooLong: false #超出长度限制的字段不建立索引且不返回错误
storage:
indexBuildRetry: true #实例崩溃重启后是否重建索引
journal:
enabled: false #启用日志
directoryPerDB: true #每个DB一个目录
syncPeriodSecs: 100000 #检查点(s)
engine: "wiredTiger" #存储引擎
wiredTiger:
engineConfig:
statisticsLogDelaySecs: 0 #关闭统计信息
directoryForIndexes: true #数据、索引分目录存储
collectionConfig:
blockCompressor: snappy #数据压缩算法
indexConfig:
prefixCompression: true #开启索引前缀压缩

Comment by Alexander Gorrod [ 06/Jun/16 ]

yaoxing.zhang@mongodb.com In order to determine whether the issue is a MongoDB bug we will need:

  • More information about exactly what you mean by "cause server hang" - I don't see any evidence of a hang
  • A full mongod log file from the deployment
  • An archive of the diagnostic.data directory that will be a sub-directory of the MongoDB database directory
  • Details of the hardware being used
  • Details of the MongoDB configuration options being used
  • Information about the type of MongoDB deployment (standalone, replica set, sharded, etc)
Comment by Yaoxing Zhang [ 06/Jun/16 ]

User also mentioned that this problem only happens after upgrading to 3.2.6. 3.0 used to work fine.

Generated at Thu Feb 08 04:06:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.