-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.3.5
-
Component/s: WiredTiger
-
None
-
Environment:HW has fast SSD (~100k IOPs), 24 cores / 48 HW threads, Centos 4.0.9, 256G RAM, MongoDB 3.3.5 built from source and using jemalloc
-
Fully Compatible
-
ALL
-
I am running linkbench via scripts described below. The test does a load, 24 hours of query tests with 16 concurrent clients, then a sequence of 1 hour query tests with increasing concurrency starting at one and increasing to 48.
The test was run with mmapv1, WiredTiger (zlib & snappy), RocksDB (zlib & snappy). The RocksDB tests are still in progress so I don't know whether they will have a problem. The WiredTiger+snappy test finished. The WiredTiger+zlib test appears to have hung with 44 concurrent clients. Given that the server has 48 HW threads I wonder if contention on spinlocks is the problem.
By "hang" I mean that QPS has dropped from ~1500 to something close to zero. I don't have mongostat on these servers, I will try to install it after creating this bug. Looking at PMP, to be attached, shows all threads in eviction code. Neither the mongod log file, nor the client output files have been updated for 2 hours so I call this a hang. The "secs_running" attribute in db.currentOp() output shows ~10,000 seconds for all queries.
This is the QPS for each number of concurrent clients
1 4 8 12 16 20 24 28 32 36 40 44 48 concurrent clients 1137 4281 4032 3246 3199 3038 2918 2815 2802 2839 2722 2751 2732 snappy 651 2400 2312 2085 2014 1847 1878 1826 1802 1465 1556 x x zlib