Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
3.0.5
-
None
-
Centos 6.6 (linux 2.6.32-504.23.4.el6.x86_64)
-
Linux
Description
We're recently upgraded our sharded cluster from 2.4.6 to 3.0.5. We also moved to WT with snappy compression, by doing initial sync with empty data folder.
There 3 shards (2 replicas, 1 arbiter per shard). One shard is on average SSD (100-300MB/s) and others are on regular hdds. Currently we're using ext4, but we will migrate to xfs shortly, when we will move rest of the shard to the SSD (thanks to 10x compression!). Total dataset size is around 300GB, 600Gb including indexes (:|).
We have 26 collections on two main databases. Both databases are the part of the same cluster. We have 6 sharded collection (15M ... 4.5B records), and rest of the collections are too small to be sharded and hosted only on Shard #1.
Everything is running in ESXi. THP are disabled, ulimits are set as described in docs.
We're seeing very strange behaviour: after certain point one collection is suddenly pops up in mongotop:
s total read write
|
<db>.<25k_collection> 1656ms 1654ms 1ms
|
<other_db>.<empty_collection> 31ms 0ms 31ms
|
<db>.<2_records_collection> 17ms 17ms 0ms
|
local.oplog.rs 12ms 12ms 0ms
|
<db>.<20k_records> 6ms 3ms 3ms
|
This cause 2x increase in response time for related app endpoints.
This collection is hosted on Shard #1.25k_collection average workload is 10-15rps: reads (45%), updates (45%) and small number of inserts (10%).
This collection is pretty small, here the stats from collection:
{
|
"ns" : "<db>.<25k_collection>",
|
"count" : 53436,
|
"size" : 25299163,
|
"avgObjSize" : 473,
|
"storageSize" : 6418432,
|
"capped" : false,
|
"wiredTiger" : {
|
"metadata" : {
|
"formatVersion" : 1
|
},
|
"creationString" : "allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.16=(addr=\"01c5ce81e4122eed45c5cf81e4d0eea3b9c5d081e4629af3d2808080e361cfc0e3622fc0\",order=16,time=1440549414,size=6443008,write_gen=848)),checkpoint_lsn=(2656,6794880),checksum=on,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=26,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1)",
|
"type" : "file",
|
"uri" : "statistics:table:collection-23-8167943771317719544",
|
"LSM" : {
|
"bloom filters in the LSM tree" : 0,
|
"bloom filter false positives" : 0,
|
"bloom filter hits" : 0,
|
"bloom filter misses" : 0,
|
"bloom filter pages evicted from cache" : 0,
|
"bloom filter pages read into cache" : 0,
|
"total size of bloom filters" : 0,
|
"sleep for LSM checkpoint throttle" : 0,
|
"chunks in the LSM tree" : 0,
|
"highest merge generation in the LSM tree" : 0,
|
"queries that could have benefited from a Bloom filter that did not exist" : 0,
|
"sleep for LSM merge throttle" : 0
|
},
|
"block-manager" : {
|
"file allocation unit size" : 4096,
|
"blocks allocated" : 74,
|
"checkpoint size" : 6443008,
|
"allocations requiring file extension" : 3,
|
"blocks freed" : 31,
|
"file magic number" : 120897,
|
"file major version number" : 1,
|
"minor version number" : 0,
|
"file bytes available for reuse" : 24576,
|
"file size in bytes" : 6418432
|
},
|
"btree" : {
|
"btree checkpoint generation" : 3422,
|
"column-store variable-size deleted values" : 0,
|
"column-store fixed-size leaf pages" : 0,
|
"column-store internal pages" : 0,
|
"column-store variable-size leaf pages" : 0,
|
"pages rewritten by compaction" : 0,
|
"number of key/value pairs" : 0,
|
"fixed-record size" : 0,
|
"maximum tree depth" : 4,
|
"maximum internal page key size" : 368,
|
"maximum internal page size" : 4096,
|
"maximum leaf page key size" : 3276,
|
"maximum leaf page size" : 32768,
|
"maximum leaf page value size" : 67108864,
|
"overflow pages" : 0,
|
"row-store internal pages" : 0,
|
"row-store leaf pages" : 0
|
},
|
"cache" : {
|
"bytes read into cache" : 162112625609,
|
"bytes written from cache" : 458425,
|
"checkpoint blocked page eviction" : 0,
|
"unmodified pages evicted" : 4983876,
|
"page split during eviction deepened the tree" : 0,
|
"modified pages evicted" : 3,
|
"data source pages selected for eviction unable to be evicted" : 2385,
|
"hazard pointer blocked page eviction" : 2385,
|
"internal pages evicted" : 0,
|
"pages split during eviction" : 1,
|
"in-memory page splits" : 0,
|
"overflow values cached in memory" : 0,
|
"pages read into cache" : 4989851,
|
"overflow pages read into cache" : 0,
|
"pages written from cache" : 46
|
},
|
"compression" : {
|
"raw compression call failed, no additional data available" : 0,
|
"raw compression call failed, additional data available" : 0,
|
"raw compression call succeeded" : 0,
|
"compressed pages read" : 4984383,
|
"compressed pages written" : 18,
|
"page written failed to compress" : 0,
|
"page written was too small to compress" : 28
|
},
|
"cursor" : {
|
"create calls" : 510398,
|
"insert calls" : 356,
|
"bulk-loaded cursor-insert calls" : 0,
|
"cursor-insert key and value bytes inserted" : 154017,
|
"next calls" : 214506,
|
"prev calls" : 1,
|
"remove calls" : 0,
|
"cursor-remove key bytes removed" : 0,
|
"reset calls" : 12156163367,
|
"search calls" : 12147296837,
|
"search near calls" : 834,
|
"update calls" : 0,
|
"cursor-update value bytes updated" : 0
|
},
|
"reconciliation" : {
|
"dictionary matches" : 0,
|
"internal page multi-block writes" : 0,
|
"leaf page multi-block writes" : 9,
|
"maximum blocks required for a page" : 0,
|
"internal-page overflow keys" : 0,
|
"leaf-page overflow keys" : 0,
|
"overflow values written" : 0,
|
"pages deleted" : 0,
|
"page checksum matches" : 6,
|
"page reconciliation calls" : 43,
|
"page reconciliation calls for eviction" : 0,
|
"leaf page key bytes discarded using prefix compression" : 0,
|
"internal page key bytes discarded using suffix compression" : 9
|
},
|
"session" : {
|
"object compaction" : 0,
|
"open cursor count" : 510398
|
|
|
},
|
"transaction" : {
|
"update conflicts" : 0
|
}
|
},
|
"nindexes" : 10,
|
"totalIndexSize" : 4096000,
|
"indexSizes" : {
|
"_id_" : 782336,
|
"ancestry_1" : 335872,
|
"named_path_1" : 638976,
|
"instance_id_1" : 761856,
|
"acl.read_1" : 307200,
|
"acl.create_1" : 253952,
|
"acl.update_1" : 253952,
|
"acl.destroy_1" : 253952,
|
"acl.audit_1" : 253952,
|
"acl.grant_1" : 253952
|
},
|
"ok" : 1,
|
"$gleStats" : {
|
"lastOpTime" : Timestamp(0, 0),
|
"electionId" : ObjectId("55db2213018f752b5e378de9")
|
}
|
}
|
As you can see "bytes read into cache" value is pretty insane: 162Gb for 6Mb collection.
Here the mongostat
insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn set repl time
|
*0 219 19 *0 1 5|0 3.2 90.5 0 9.2G 7.5G 0|0 3|0 38k 283k 142 shard1 PRI 03:54:28
|
1 162 15 *0 0 1|0 3.2 90.6 0 9.2G 7.5G 0|0 2|2 27k 223k 142 shard1 PRI 03:54:29
|
*0 258 27 *0 1 5|0 3.2 90.7 0 9.2G 7.5G 0|0 5|0 46k 208k 142 shard1 PRI 03:54:30
|
*0 287 18 *0 1 6|0 3.2 90.8 0 9.2G 7.5G 0|0 4|0 43k 311k 142 shard1 PRI 03:54:31
|
*0 242 26 *0 1 2|0 3.2 90.8 0 9.2G 7.5G 0|0 5|0 44k 323k 142 shard1 PRI 03:54:32
|
*0 101 13 *0 1 3|0 3.2 90.8 0 9.2G 7.5G 0|1 5|1 18k 105k 142 shard1 PRI 03:54:33
|
*0 206 45 *0 1 6|0 3.2 90.9 0 9.2G 7.5G 0|1 2|0 41k 324k 142 shard1 PRI 03:54:34
|
*0 191 23 *0 0 11|0 3.2 91.0 0 9.2G 7.5G 0|1 3|0 36k 275k 142 shard1 PRI 03:54:35
|
*0 272 20 *0 1 2|0 3.2 91.0 0 9.2G 7.5G 0|0 6|0 44k 170k 142 shard1 PRI 03:54:36
|
*0 825 17 *0 1 5|0 3.2 91.1 0 9.2G 7.5G 0|0 1|1 146k 372k 142 shard1 PRI 03:54:37
|
Also we're seeing 2x increase in our map/reduce processing times. Average before upgrade was around 4s for request, and now it's about 8s.
Also, on some shard I'm seeing " Failed: BSONObj size: 40950496 (0x270DAE0) is inv alid. Size must be between 0 and 16793600(16MB)" error, when starting mongotop.
Thanks!