Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20149

WiredTiger read performance

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • 3.0.5
    • WiredTiger
    • None
    • Centos 6.6 (linux 2.6.32-504.23.4.el6.x86_64)
    • Linux

    Description

      We're recently upgraded our sharded cluster from 2.4.6 to 3.0.5. We also moved to WT with snappy compression, by doing initial sync with empty data folder.

      There 3 shards (2 replicas, 1 arbiter per shard). One shard is on average SSD (100-300MB/s) and others are on regular hdds. Currently we're using ext4, but we will migrate to xfs shortly, when we will move rest of the shard to the SSD (thanks to 10x compression!). Total dataset size is around 300GB, 600Gb including indexes (:|).

      We have 26 collections on two main databases. Both databases are the part of the same cluster. We have 6 sharded collection (15M ... 4.5B records), and rest of the collections are too small to be sharded and hosted only on Shard #1.

      Everything is running in ESXi. THP are disabled, ulimits are set as described in docs.

      We're seeing very strange behaviour: after certain point one collection is suddenly pops up in mongotop:

      s                                total     read      write
      <db>.<25k_collection>            1656ms    1654ms    1ms
      <other_db>.<empty_collection>      31ms      0ms       31ms
      <db>.<2_records_collection>      17ms      17ms      0ms
      local.oplog.rs                   12ms      12ms      0ms
      <db>.<20k_records>             6ms       3ms       3ms
      

      This cause 2x increase in response time for related app endpoints.

      This collection is hosted on Shard #1.25k_collection average workload is 10-15rps: reads (45%), updates (45%) and small number of inserts (10%).

      This collection is pretty small, here the stats from collection:

      {
              "ns" : "<db>.<25k_collection>",
              "count" : 53436,
              "size" : 25299163,
              "avgObjSize" : 473,
              "storageSize" : 6418432,
              "capped" : false,
              "wiredTiger" : {
                      "metadata" : {
                              "formatVersion" : 1
                      },
                      "creationString" : "allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.16=(addr=\"01c5ce81e4122eed45c5cf81e4d0eea3b9c5d081e4629af3d2808080e361cfc0e3622fc0\",order=16,time=1440549414,size=6443008,write_gen=848)),checkpoint_lsn=(2656,6794880),checksum=on,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=26,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1)",
                      "type" : "file",
                      "uri" : "statistics:table:collection-23-8167943771317719544",
                      "LSM" : {
                              "bloom filters in the LSM tree" : 0,
                              "bloom filter false positives" : 0,
                              "bloom filter hits" : 0,
                              "bloom filter misses" : 0,
                              "bloom filter pages evicted from cache" : 0,
                              "bloom filter pages read into cache" : 0,
                              "total size of bloom filters" : 0,
                              "sleep for LSM checkpoint throttle" : 0,
                              "chunks in the LSM tree" : 0,
                              "highest merge generation in the LSM tree" : 0,
                              "queries that could have benefited from a Bloom filter that did not exist" : 0,
                              "sleep for LSM merge throttle" : 0
                      },
                      "block-manager" : {
                              "file allocation unit size" : 4096,
                              "blocks allocated" : 74,
                              "checkpoint size" : 6443008,
                              "allocations requiring file extension" : 3,
                              "blocks freed" : 31,
                              "file magic number" : 120897,
                              "file major version number" : 1,
                              "minor version number" : 0,
                              "file bytes available for reuse" : 24576,
                              "file size in bytes" : 6418432
                      },
                      "btree" : {
                              "btree checkpoint generation" : 3422,
                              "column-store variable-size deleted values" : 0,
                              "column-store fixed-size leaf pages" : 0,
                              "column-store internal pages" : 0,
                              "column-store variable-size leaf pages" : 0,
                              "pages rewritten by compaction" : 0,
                              "number of key/value pairs" : 0,
                              "fixed-record size" : 0,
                              "maximum tree depth" : 4,
                              "maximum internal page key size" : 368,
                              "maximum internal page size" : 4096,
                              "maximum leaf page key size" : 3276,
                              "maximum leaf page size" : 32768,
                              "maximum leaf page value size" : 67108864,
                              "overflow pages" : 0,
                              "row-store internal pages" : 0,
                              "row-store leaf pages" : 0
                      },
                      "cache" : {
                              "bytes read into cache" : 162112625609,
                              "bytes written from cache" : 458425,
                              "checkpoint blocked page eviction" : 0,
                              "unmodified pages evicted" : 4983876,
                              "page split during eviction deepened the tree" : 0,
                              "modified pages evicted" : 3,
                              "data source pages selected for eviction unable to be evicted" : 2385,
                              "hazard pointer blocked page eviction" : 2385,
                              "internal pages evicted" : 0,
                              "pages split during eviction" : 1,
                              "in-memory page splits" : 0,
                              "overflow values cached in memory" : 0,
                              "pages read into cache" : 4989851,
                              "overflow pages read into cache" : 0,
                              "pages written from cache" : 46
                      },
                      "compression" : {
                              "raw compression call failed, no additional data available" : 0,
                              "raw compression call failed, additional data available" : 0,
                              "raw compression call succeeded" : 0,
                              "compressed pages read" : 4984383,
                              "compressed pages written" : 18,
                              "page written failed to compress" : 0,
                              "page written was too small to compress" : 28
                      },
                      "cursor" : {
                              "create calls" : 510398,
                              "insert calls" : 356,
                              "bulk-loaded cursor-insert calls" : 0,
                              "cursor-insert key and value bytes inserted" : 154017,
                              "next calls" : 214506,
                              "prev calls" : 1,
                              "remove calls" : 0,
                              "cursor-remove key bytes removed" : 0,
                              "reset calls" : 12156163367,
                              "search calls" : 12147296837,
                              "search near calls" : 834,
                              "update calls" : 0,
                              "cursor-update value bytes updated" : 0
                      },
                      "reconciliation" : {
                              "dictionary matches" : 0,
                              "internal page multi-block writes" : 0,
                              "leaf page multi-block writes" : 9,
                              "maximum blocks required for a page" : 0,
                              "internal-page overflow keys" : 0,
                              "leaf-page overflow keys" : 0,
                              "overflow values written" : 0,
                              "pages deleted" : 0,
                              "page checksum matches" : 6,
                              "page reconciliation calls" : 43,
                              "page reconciliation calls for eviction" : 0,
                              "leaf page key bytes discarded using prefix compression" : 0,
                              "internal page key bytes discarded using suffix compression" : 9
                      },
                      "session" : {
                              "object compaction" : 0,
                              "open cursor count" : 510398
       
                      },
                      "transaction" : {
                              "update conflicts" : 0
                      }
              },
              "nindexes" : 10,
              "totalIndexSize" : 4096000,
              "indexSizes" : {
                      "_id_" : 782336,
                      "ancestry_1" : 335872,
                      "named_path_1" : 638976,
                      "instance_id_1" : 761856,
                      "acl.read_1" : 307200,
                      "acl.create_1" : 253952,
                      "acl.update_1" : 253952,
                      "acl.destroy_1" : 253952,
                      "acl.audit_1" : 253952,
                      "acl.grant_1" : 253952
              },
              "ok" : 1,
              "$gleStats" : {
                      "lastOpTime" : Timestamp(0, 0),
                      "electionId" : ObjectId("55db2213018f752b5e378de9")
              }
      }
      

      As you can see "bytes read into cache" value is pretty insane: 162Gb for 6Mb collection.

      Here the mongostat

      insert query update delete getmore command % dirty % used flushes vsize  res qr|qw ar|aw netIn netOut conn         set repl     time
          *0   219     19     *0       1     5|0     3.2   90.5       0  9.2G 7.5G   0|0   3|0   38k   283k  142 shard1  PRI 03:54:28
           1   162     15     *0       0     1|0     3.2   90.6       0  9.2G 7.5G   0|0   2|2   27k   223k  142 shard1  PRI 03:54:29
          *0   258     27     *0       1     5|0     3.2   90.7       0  9.2G 7.5G   0|0   5|0   46k   208k  142 shard1  PRI 03:54:30
          *0   287     18     *0       1     6|0     3.2   90.8       0  9.2G 7.5G   0|0   4|0   43k   311k  142 shard1  PRI 03:54:31
          *0   242     26     *0       1     2|0     3.2   90.8       0  9.2G 7.5G   0|0   5|0   44k   323k  142 shard1  PRI 03:54:32
          *0   101     13     *0       1     3|0     3.2   90.8       0  9.2G 7.5G   0|1   5|1   18k   105k  142 shard1  PRI 03:54:33
          *0   206     45     *0       1     6|0     3.2   90.9       0  9.2G 7.5G   0|1   2|0   41k   324k  142 shard1  PRI 03:54:34
          *0   191     23     *0       0    11|0     3.2   91.0       0  9.2G 7.5G   0|1   3|0   36k   275k  142 shard1  PRI 03:54:35
          *0   272     20     *0       1     2|0     3.2   91.0       0  9.2G 7.5G   0|0   6|0   44k   170k  142 shard1  PRI 03:54:36
          *0   825     17     *0       1     5|0     3.2   91.1       0  9.2G 7.5G   0|0   1|1  146k   372k  142 shard1  PRI 03:54:37
      

      Also we're seeing 2x increase in our map/reduce processing times. Average before upgrade was around 4s for request, and now it's about 8s.

      Also, on some shard I'm seeing " Failed: BSONObj size: 40950496 (0x270DAE0) is inv alid. Size must be between 0 and 16793600(16MB)" error, when starting mongotop.

      Thanks!

      Attachments

        1. db04.zip
          427 kB
        2. gdbmon.py
          3 kB
        3. serverStatus_306.txt
          59 kB
        4. serverStatus.txt
          76 kB
        5. ss.log.zip
          308 kB
        6. trace_11Sep15.zip
          1.15 MB
        7. trace.tar.gz
          1.94 MB

        Activity

          People

            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            yopp Alex
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: