[SERVER-20149] WiredTiger read performance Created: 26/Aug/15  Updated: 18/Sep/15  Resolved: 18/Sep/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alex Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 6.6 (linux 2.6.32-504.23.4.el6.x86_64)


Attachments: Zip Archive db04.zip     File gdbmon.py     Text File serverStatus.txt     Text File serverStatus_306.txt     Zip Archive ss.log.zip     File trace.tar.gz     Zip Archive trace_11Sep15.zip    
Operating System: Linux
Participants:

 Description   

We're recently upgraded our sharded cluster from 2.4.6 to 3.0.5. We also moved to WT with snappy compression, by doing initial sync with empty data folder.

There 3 shards (2 replicas, 1 arbiter per shard). One shard is on average SSD (100-300MB/s) and others are on regular hdds. Currently we're using ext4, but we will migrate to xfs shortly, when we will move rest of the shard to the SSD (thanks to 10x compression!). Total dataset size is around 300GB, 600Gb including indexes (:|).

We have 26 collections on two main databases. Both databases are the part of the same cluster. We have 6 sharded collection (15M ... 4.5B records), and rest of the collections are too small to be sharded and hosted only on Shard #1.

Everything is running in ESXi. THP are disabled, ulimits are set as described in docs.

We're seeing very strange behaviour: after certain point one collection is suddenly pops up in mongotop:

s                                total     read      write
<db>.<25k_collection>            1656ms    1654ms    1ms
<other_db>.<empty_collection>      31ms      0ms       31ms
<db>.<2_records_collection>      17ms      17ms      0ms
local.oplog.rs                   12ms      12ms      0ms
<db>.<20k_records>             6ms       3ms       3ms

This cause 2x increase in response time for related app endpoints.

This collection is hosted on Shard #1.25k_collection average workload is 10-15rps: reads (45%), updates (45%) and small number of inserts (10%).

This collection is pretty small, here the stats from collection:

{
        "ns" : "<db>.<25k_collection>",
        "count" : 53436,
        "size" : 25299163,
        "avgObjSize" : 473,
        "storageSize" : 6418432,
        "capped" : false,
        "wiredTiger" : {
                "metadata" : {
                        "formatVersion" : 1
                },
                "creationString" : "allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.16=(addr=\"01c5ce81e4122eed45c5cf81e4d0eea3b9c5d081e4629af3d2808080e361cfc0e3622fc0\",order=16,time=1440549414,size=6443008,write_gen=848)),checkpoint_lsn=(2656,6794880),checksum=on,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=26,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1)",
                "type" : "file",
                "uri" : "statistics:table:collection-23-8167943771317719544",
                "LSM" : {
                        "bloom filters in the LSM tree" : 0,
                        "bloom filter false positives" : 0,
                        "bloom filter hits" : 0,
                        "bloom filter misses" : 0,
                        "bloom filter pages evicted from cache" : 0,
                        "bloom filter pages read into cache" : 0,
                        "total size of bloom filters" : 0,
                        "sleep for LSM checkpoint throttle" : 0,
                        "chunks in the LSM tree" : 0,
                        "highest merge generation in the LSM tree" : 0,
                        "queries that could have benefited from a Bloom filter that did not exist" : 0,
                        "sleep for LSM merge throttle" : 0
                },
                "block-manager" : {
                        "file allocation unit size" : 4096,
                        "blocks allocated" : 74,
                        "checkpoint size" : 6443008,
                        "allocations requiring file extension" : 3,
                        "blocks freed" : 31,
                        "file magic number" : 120897,
                        "file major version number" : 1,
                        "minor version number" : 0,
                        "file bytes available for reuse" : 24576,
                        "file size in bytes" : 6418432
                },
                "btree" : {
                        "btree checkpoint generation" : 3422,
                        "column-store variable-size deleted values" : 0,
                        "column-store fixed-size leaf pages" : 0,
                        "column-store internal pages" : 0,
                        "column-store variable-size leaf pages" : 0,
                        "pages rewritten by compaction" : 0,
                        "number of key/value pairs" : 0,
                        "fixed-record size" : 0,
                        "maximum tree depth" : 4,
                        "maximum internal page key size" : 368,
                        "maximum internal page size" : 4096,
                        "maximum leaf page key size" : 3276,
                        "maximum leaf page size" : 32768,
                        "maximum leaf page value size" : 67108864,
                        "overflow pages" : 0,
                        "row-store internal pages" : 0,
                        "row-store leaf pages" : 0
                },
                "cache" : {
                        "bytes read into cache" : 162112625609,
                        "bytes written from cache" : 458425,
                        "checkpoint blocked page eviction" : 0,
                        "unmodified pages evicted" : 4983876,
                        "page split during eviction deepened the tree" : 0,
                        "modified pages evicted" : 3,
                        "data source pages selected for eviction unable to be evicted" : 2385,
                        "hazard pointer blocked page eviction" : 2385,
                        "internal pages evicted" : 0,
                        "pages split during eviction" : 1,
                        "in-memory page splits" : 0,
                        "overflow values cached in memory" : 0,
                        "pages read into cache" : 4989851,
                        "overflow pages read into cache" : 0,
                        "pages written from cache" : 46
                },
                "compression" : {
                        "raw compression call failed, no additional data available" : 0,
                        "raw compression call failed, additional data available" : 0,
                        "raw compression call succeeded" : 0,
                        "compressed pages read" : 4984383,
                        "compressed pages written" : 18,
                        "page written failed to compress" : 0,
                        "page written was too small to compress" : 28
                },
                "cursor" : {
                        "create calls" : 510398,
                        "insert calls" : 356,
                        "bulk-loaded cursor-insert calls" : 0,
                        "cursor-insert key and value bytes inserted" : 154017,
                        "next calls" : 214506,
                        "prev calls" : 1,
                        "remove calls" : 0,
                        "cursor-remove key bytes removed" : 0,
                        "reset calls" : 12156163367,
                        "search calls" : 12147296837,
                        "search near calls" : 834,
                        "update calls" : 0,
                        "cursor-update value bytes updated" : 0
                },
                "reconciliation" : {
                        "dictionary matches" : 0,
                        "internal page multi-block writes" : 0,
                        "leaf page multi-block writes" : 9,
                        "maximum blocks required for a page" : 0,
                        "internal-page overflow keys" : 0,
                        "leaf-page overflow keys" : 0,
                        "overflow values written" : 0,
                        "pages deleted" : 0,
                        "page checksum matches" : 6,
                        "page reconciliation calls" : 43,
                        "page reconciliation calls for eviction" : 0,
                        "leaf page key bytes discarded using prefix compression" : 0,
                        "internal page key bytes discarded using suffix compression" : 9
                },
                "session" : {
                        "object compaction" : 0,
                        "open cursor count" : 510398
 
                },
                "transaction" : {
                        "update conflicts" : 0
                }
        },
        "nindexes" : 10,
        "totalIndexSize" : 4096000,
        "indexSizes" : {
                "_id_" : 782336,
                "ancestry_1" : 335872,
                "named_path_1" : 638976,
                "instance_id_1" : 761856,
                "acl.read_1" : 307200,
                "acl.create_1" : 253952,
                "acl.update_1" : 253952,
                "acl.destroy_1" : 253952,
                "acl.audit_1" : 253952,
                "acl.grant_1" : 253952
        },
        "ok" : 1,
        "$gleStats" : {
                "lastOpTime" : Timestamp(0, 0),
                "electionId" : ObjectId("55db2213018f752b5e378de9")
        }
}

As you can see "bytes read into cache" value is pretty insane: 162Gb for 6Mb collection.

Here the mongostat

insert query update delete getmore command % dirty % used flushes vsize  res qr|qw ar|aw netIn netOut conn         set repl     time
    *0   219     19     *0       1     5|0     3.2   90.5       0  9.2G 7.5G   0|0   3|0   38k   283k  142 shard1  PRI 03:54:28
     1   162     15     *0       0     1|0     3.2   90.6       0  9.2G 7.5G   0|0   2|2   27k   223k  142 shard1  PRI 03:54:29
    *0   258     27     *0       1     5|0     3.2   90.7       0  9.2G 7.5G   0|0   5|0   46k   208k  142 shard1  PRI 03:54:30
    *0   287     18     *0       1     6|0     3.2   90.8       0  9.2G 7.5G   0|0   4|0   43k   311k  142 shard1  PRI 03:54:31
    *0   242     26     *0       1     2|0     3.2   90.8       0  9.2G 7.5G   0|0   5|0   44k   323k  142 shard1  PRI 03:54:32
    *0   101     13     *0       1     3|0     3.2   90.8       0  9.2G 7.5G   0|1   5|1   18k   105k  142 shard1  PRI 03:54:33
    *0   206     45     *0       1     6|0     3.2   90.9       0  9.2G 7.5G   0|1   2|0   41k   324k  142 shard1  PRI 03:54:34
    *0   191     23     *0       0    11|0     3.2   91.0       0  9.2G 7.5G   0|1   3|0   36k   275k  142 shard1  PRI 03:54:35
    *0   272     20     *0       1     2|0     3.2   91.0       0  9.2G 7.5G   0|0   6|0   44k   170k  142 shard1  PRI 03:54:36
    *0   825     17     *0       1     5|0     3.2   91.1       0  9.2G 7.5G   0|0   1|1  146k   372k  142 shard1  PRI 03:54:37

Also we're seeing 2x increase in our map/reduce processing times. Average before upgrade was around 4s for request, and now it's about 8s.

Also, on some shard I'm seeing " Failed: BSONObj size: 40950496 (0x270DAE0) is inv alid. Size must be between 0 and 16793600(16MB)" error, when starting mongotop.

Thanks!



 Comments   
Comment by Ramon Fernandez Marina [ 18/Sep/15 ]

yopp, I don't think there's enough information on this ticket to point to a bug in the server, and since we keep the SERVER project for reporting bugs in (or feature suggestions for) the MongoDB server I'm going to close this ticket. If you can provide evidence of a bug in mongod please post it here so we can reopen this ticket.

For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources.

Regards,
Ramón.

Comment by Alex [ 17/Sep/15 ]

Hi.

Do you need any additional information that can help with this issue?

Thanks!

Comment by Alex [ 11/Sep/15 ]

I've managed to catch another severe slowdown (see trace_11Sep15.zip for ss.log and gdbmon.log)

Here the mongotop output:

                 ns     total      read    write    2015-09-11T16:54:39+08:00
 <db>.<collection1>    2781ms    2781ms      0ms
     local.oplog.rs     102ms       0ms    101ms
               <db>      57ms       0ms     57ms
 <db>.<collection2>       5ms       5ms      0ms
 <db>.<collection3>       5ms       2ms      2ms
 <db>.<collection4>       3ms       3ms      0ms
 <db>.<collection5>       2ms       2ms      0ms
 <db>.<collection6>       1ms       1ms      0ms
                          0ms       0ms      0ms
 admin.system.roles       0ms       0ms      0ms

Here the query for <collection1> from the log:

2015-09-11T16:42:28.053+0800 I QUERY    [conn90] query <db>.<collection1> query: {
$query: { <attribute>: "<string>" }, $orderby: { _id:
 1 } } planSummary: IXSCAN { _id: 1 } ntoskip:0 nscanned:44547 nscannedObjects:4
4547 keyUpdates:0 writeConflicts:0 numYields:348 nreturned:1 reslen:452 locks:{
Global: { acquireCount: { r: 698 } }, Database: { acquireCount: { r: 349 } }, Co
llection: { acquireCount: { r: 349 } } } 716ms

And for example this:

2015-09-11T16:42:57.555+0800 I COMMAND  [conn1125] command admin.$cmd command: r
eplSetHeartbeat { replSetHeartbeat: "shard1", pv: 1, v: 19, from: "dbconfig
05", fromId: 0, checkEmpty: false } ntoreturn:1 keyUpdates:0 writ
eConflicts:0 numYields:0 reslen:205 locks:{} 138ms

According to top CPU usage was 180-250%. After server restart with the same workload CPU usage is 20-25%.

Comment by Alex [ 08/Sep/15 ]

Just to be clear about what's going inside our app, and what issues we are seeing.

We have two stages in our app: Sync and Push

  • Sync. Mainly read/update on two 50k collection (one with small documents, second with huge documents).
  • Data push. Mainly Inserts on 4.5B collection (small documents) with some updates to the 50k collection.

First stage is very simple: app reads document and then updates it. Few apps doing this at the same time, but they are updating different documents. So concurrent document-level locking is very unlikely.

Second stage is a bit complicated. We're inserting 0.5-1.5k documents in sharded collection, and then running couple heavy "aggregation" map/reduce against new data with output to another sharded collections. All apps are pushing data to the same collection, but document-level intersections is unlikely as well. One app making one map/reduce at the time, but there few app running simultaneously.

On any failure app is going back to the sync stage. For this cluster normal workload is when all apps are pushing data, with very little read access to the cluster.

What we're seeing after upgrade to 3.0 with WT:

Sync stage: at some point sync operations speed degrades significantly (2-3x slower).
Symptoms:

  1. Backend response time increased. Normally mongo queries are less than a 5% of response time, but we're seeing mongo taking over with 40-80% of total request time.
  2. We're seeing sync-specific collections in the mongotop with 250-1500ms read/write times
  3. Slow queries are logged to the mongo.log, even there no slow queries logging/profiling enabled
  4. After a while mongotop can't be started with Failed: BSONObj size: 21312475 (0x14533DB) is invalid. Size must be between 0 and 16793600(16MB) error message.
  5. Not always reproducible: Increase of oplog lag for some secondaries in the cluster. We're seeing oplog collection in the mongotop on affected nodes as well.
  6. count() on some collections can take seconds
  7. mongod average CPU usage goes from 0.1-0.2 cores, to 0.3-0.5. With peeks to 1.5-1.8 cores.
    Workarounds:
  8. Run reIndex() on all affected collections
  9. Restart node

Some queries logged by mongo:

update <db>.<collection> query: { _id: ObjectId('4fee0d592
7ac586067000001') } update: { $set: { <2 attributes> }} nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:0 numYields:1 locks:
{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3 }, acquireWaitCount: { w: 2 }
, timeAcquiringMicros: { w: 59431832 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: {
 w: 1 } } } 7327ms
 
getmore local.oplog.rs query: { ts: { $gte: Timestamp 144
0751583000|5 } } cursorid:15020793523 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:1 nreturned:0 re
slen:20 locks:{ Global: { acquireCount: { r: 1460 } }, Database: { acquireCount: { r: 730 } }, oplog: { ac
quireCount: { r: 730 } } } 10223ms
 
query <db>.<collection2> query: { $query: { attachabl
e_id: ObjectId('5326953dd7f15b1ffc03f4e1') }, $orderby: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoski
p:0 nscanned:8 nscannedObjects:8 keyUpdates:0 writeConflicts:0 numYields:1 nreturned:0 reslen:20 locks:{ G
lobal: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 }, acquireWaitCount: { r: 2 }, timeAcq
uiringMicros: { r: 22481080 } }, Collection: { acquireCount: { r: 2 } } } 713ms

As you can see "Database" lock acquire times are nuts. Also, from the WT release notes is seems like locking should be on the document level, not the collection/database, right?

Push stage: at some point push operations speed degrades to the unusable (20-50x slower).
Symptoms:

  1. Insert performance degrades over time
  2. Map/reduce takes more and more time for the same amount of data, from 2-8s to more than 120s
  3. A lot of locks related messages in mongos logs
    Workarounds:
  4. Restart server

We're seeing some degradation in processing when more than one app is pushing data. But from what you said, it seems like app is spending more time waiting for collection deletion to be committed at the checkpoint, than actually doing something. And combined with the fact, that performance degrades over time, waiting time increases and at the end cluster just can't process queries in the reasonable time. Can we increase checkpoint

We're seeing same issue on all of our deployments, but smaller deployments with less data and less load are degrading much slower (weeks) than larger one with more load (days, hours sometimes).

Also, we're still running cluster with disabled shardBalancer (SERVER-20150), because chunk migration locks the whole collection for very long time, and map/reduce can't write to it.

Is there anything else, that I can collect from affected nodes? We have MMS on one cluster, if this helps in any way.

Comment by Michael Cahill (Inactive) [ 08/Sep/15 ]

yopp, thank you for the additional information.

Was there a slow collection during the 40 minutes covered by the ss.log file you uploaded? I can't see anything particularly bad going on in the statistics. The bytes read into cache also looks reasonable (less than 500KB/s at peak).

One issue I can see is that there are collection drop operations occurring, and the drops are sometimes blocked for 1-2 seconds. That is probably because the drops are happening concurrent with checkpoints, which take up to 3 seconds to complete for your workload. I don't know whether your application is explicitly dropping collections or databases or if these are temporary collections holding intermediate results.

You mentioned "this issue affects all collection in mongo, including oplog". How did you determine that?

I would like to keep this ticket focussed on slow read operations with WiredTiger. If you would like help with the sharding messages, please open a separate ticket.

Comment by Alex [ 07/Sep/15 ]

Also, we're seeing a lot of this:

2015-09-08T04:22:08.216+0800 I SHARDING [conn11] lock update lost, lock 'collection/xxx' not propagated.

As I said before, map/reduce performance reduced significantly.

I'v checked config server, but it seems fine. From time to time I can see "config.locks" collection in the mongotop with 1-3ms read/write times.

Comment by Alex [ 07/Sep/15 ]

Also db04.zip from another server, but gdbmon.py failed after a while with:

^error,msg="Cannot find new threads: debugger service failed"
possibly permissions issue tracing the process?

Comment by Alex [ 07/Sep/15 ]

I got a chance to make a profiling. You can find results in the attachment.

Comment by Alex [ 06/Sep/15 ]

We haven't had a maintenance window this weekend to make a profiling.

But we're also found out that this issue affects all collection in mongo, including oplog, so secondaries are falling significantly behind the primaries (13h in about a week). Same symptoms: high CPU load, oplog collection in mongotop, and server restart temporary fixes the issue.

Comment by Alexander Gorrod [ 03/Sep/15 ]

Thanks - that would be very helpful. Not that I made a minor update to the commands - please use the most recent version.

Comment by Alex [ 02/Sep/15 ]

Okay, I'll do that a over a weekend.

Thanks!

Comment by Alexander Gorrod [ 02/Sep/15 ]

The problems you are seeing are not related to WT-2076 - the code changes that ticket refers to aren't included in the 3.0 release of MongoDB.

Sorry for the slow turn around - I haven't been able to understand the behavior so far, I've been hoping an explanation will come to me. More information might point us towards the cause. To that end:

I've attached a tool to this ticket, gdbmon.py, that will use gdb to collect stack trace samples at regular intervals; this should give us a clearer picture of what's going on inside mongod. Please be aware that using this tool will have some performance (and potentially even functional) impact on the mongod in question; but since it is not performing well as it is, this may be a risk you are willing to accept. If you are willing to do so, while the shard is in the low-performing state please run the following two data collection scripts in parallel:

mongo --eval "while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:1}))); sleep(5000)}" >ss.log &
python gdbmon.py $(pidof mongod) 5 >gdbmon.log &

This will collect serverStatus time series and in addition stack trace samples at 5-second intervals. Stop the commands after about 20 minutes and upload ss.log and gdbmon.log for us to review.

Comment by Alex [ 01/Sep/15 ]

Can this be related to WT-2076?

Comment by Alex [ 01/Sep/15 ]

Over the time we're seeing major map/reduce speed regression. After 2 days It's more than 10x times slower: was 2-8s, now 30-80s according to application log. And it's keeping slowly slowing down.

Comment by Alex [ 30/Aug/15 ]

So after 24h this collections are popped up again. reIndex() helps for a while, but did not resolve issue completely. Right now all primares are running from SSD, that improved request time a bit, but still, performance is lower than before upgrade.

Regarding our sharded map/reduces. There a bit odd numbers:

2015-08-30T18:44:34.956+0800 I COMMAND  [conn2786] command app.$cmd command: mapReduce { mapreduce: "<4.5B collection>", map: <js>, reduce: <js>, out: "tmp.mrs.<name>_1440931474_8382", shardedFirstPass: true, splitInfo: 67108864 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:233 locks:{ Global: { acquireCount: { r: 35, w: 14, W: 3 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 637 } }, Database: { acquireCount: { r: 7, w: 6, R: 2, W: 11 }, acquireWaitCount: { W: 2 }, timeAcquiringMicros: { W: 2048 } }, Collection: { acquireCount: { r: 7, w: 4, W: 3 } }, oplog: { acquireCount: { w: 2 } } } 142ms
 
2015-08-30T18:44:39.957+0800 I COMMAND  [conn2786] command app.$cmd command: mapreduce.shardedfinish { mapreduce.shardedfinish: { mapreduce: "<4.5B collection>", map: <js>, reduce: <js> out: { reduce: "<100M collection>", sharded: true }, query: { s: 34091395 } }, inputDB: "app", shardedOutputCollection: "tmp.mrs.<4.5B collection>_1440931474_8382", shards: { shard1/db01:27017,db04:27017: { result: "tmp.mrs.<4.5B collection>_1440931474_8382", splitKeys: [], timeMillis: 136, counts: { input: 0, emit: 0, reduce: 0, output: 0 }, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1440931474000|25, electionId: ObjectId('55e18827d504f4ca83106786') } }, shard2/db02:27017,db05:27017: { result: "tmp.mrs.<4.5B collection>_1440931474_8382", splitKeys: [], timeMillis: 135, counts: { input: 9, emit: 9, reduce: 0, output: 9 }, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1440931474000|58, electionId: ObjectId('55e150ee5a56a2c624d9bd36') } }, shard3/db03:27017,db06:27017: { result: "tmp.mrs.<4.5B collection>_1440931474_8382", splitKeys: [], timeMillis: 75, counts: { input: 9, emit: 9, reduce: 0, output: 9 }, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1440931474000|42, electionId: ObjectId('55e2c8273f22980c76624cbe') } } }, shardCounts: { shard1/db01:27017,db04:27017: { input: 0, emit: 0, reduce: 0, output: 0 }, shard2/db02:27017,db05:27017: { input: 9, emit: 9, reduce: 0, output: 9 }, shard3/db03:27017,db06:27017: { input: 9, emit: 9, reduce: 0, output: 9 } }, counts: { emit: 18, input: 18, output: 18, reduce: 0 } } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:36984 locks:{ Global: { acquireCount: { r: 18, w: 9, W: 1 } }, Database: { acquireCount: { r: 4, w: 6, W: 4 }, acquireWaitCount: { W: 2 }, timeAcquiringMicros: { W: 1625 } }, Collection: { acquireCount: { r: 4, w: 2, W: 1 } }, oplog: { acquireCount: { w: 4 } } } 1832ms

The total runtime for all shards is reported ~400ms, but total map/reduce time is 1832ms. How we can check where the 1200ms is spent? I suspect that we're facing the same issue here? I can see map/reduce temporary collections in mongostat as well on all shards, with "200-400ms" read times.

Just in case: our indexes and working set on shards currently doesn't fit in the memory.

About posting mongo.log: I just found out that even with disabled profiler and normal log levels, mongo writes some queries to the log (like map/reduce above, so perhaps its logging all queries that took more than ~100ms). So I can't post it as-is, but if you'll tell me what to grep for, I can give you that.

Cheers!

Comment by Alex [ 29/Aug/15 ]

We're upgraded our cluster to 3.0.6, but this issue is not resolved by 3.0.6 upgrade.

After a while problematic collections are still popping up in "mongotop". But I've found out that calling reIndex(), on problematic collection immediately resolves the issue. CPU load is back to normal, and there no time consuming collections in "mongotop" Seems like the same effect as server restart. During that time everything is running on reasonable speed, but eventually this collection pops up again. I will check on mongostat during weekend, and I'll keep you updated.

I've checked status on problematic collection, and cache numbers are normal, but check out the new serverStatus in attachment:

"bytes read into cache": 1311874815271

It's almost 1.3Tb of data read in the cache. How that can be possible if server is running for just couple hours? Or this number is persisted between server restarts?

Thanks.

Comment by Alex [ 28/Aug/15 ]

By the way, is this eviction thing might also cause SERVER-20150?

Comment by Alex [ 28/Aug/15 ]

What have we done so far:

1. We're replaced one node on shard #1 with new VM on ssd
2. Node did the initial sync
3. After sync we're upgraded mongo to 3.0.6
4. We're promoted node to master

Seems like processing time is improved, stats are showing reasonable amount of cache traffic.

We'll migrate rest of the shard to SSD over the weekend, so we will upgrade shard to 3.0.6.

I'll come back with results.

Comment by Alexander Gorrod [ 28/Aug/15 ]

Thanks for the additional information. Could you please upload the mongod log file for shard #1?

We made a number of improvements to the eviction algorithm between the 3.0.5 and 3.0.6 releases. Could you try upgrading and report whether the issue is resolved?

Comment by Alex [ 27/Aug/15 ]

Sure. I've attached it for you.

Comment by Alexander Gorrod [ 27/Aug/15 ]

Could you please post the output of db.serverStatus as well?

Generated at Thu Feb 08 03:53:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.