[SERVER-20149] WiredTiger read performance Created: 26/Aug/15 Updated: 18/Sep/15 Resolved: 18/Sep/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alex | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Centos 6.6 (linux 2.6.32-504.23.4.el6.x86_64) |
||
| Attachments: |
|
| Operating System: | Linux |
| Participants: |
| Description |
|
We're recently upgraded our sharded cluster from 2.4.6 to 3.0.5. We also moved to WT with snappy compression, by doing initial sync with empty data folder. There 3 shards (2 replicas, 1 arbiter per shard). One shard is on average SSD (100-300MB/s) and others are on regular hdds. Currently we're using ext4, but we will migrate to xfs shortly, when we will move rest of the shard to the SSD (thanks to 10x compression!). Total dataset size is around 300GB, 600Gb including indexes (:|). We have 26 collections on two main databases. Both databases are the part of the same cluster. We have 6 sharded collection (15M ... 4.5B records), and rest of the collections are too small to be sharded and hosted only on Shard #1. Everything is running in ESXi. THP are disabled, ulimits are set as described in docs. We're seeing very strange behaviour: after certain point one collection is suddenly pops up in mongotop:
This cause 2x increase in response time for related app endpoints. This collection is hosted on Shard #1.25k_collection average workload is 10-15rps: reads (45%), updates (45%) and small number of inserts (10%). This collection is pretty small, here the stats from collection:
As you can see "bytes read into cache" value is pretty insane: 162Gb for 6Mb collection. Here the mongostat
Also we're seeing 2x increase in our map/reduce processing times. Average before upgrade was around 4s for request, and now it's about 8s. Also, on some shard I'm seeing " Failed: BSONObj size: 40950496 (0x270DAE0) is inv alid. Size must be between 0 and 16793600(16MB)" error, when starting mongotop. Thanks! |
| Comments |
| Comment by Ramon Fernandez Marina [ 18/Sep/15 ] | |||||||||||||||||||||
|
yopp, I don't think there's enough information on this ticket to point to a bug in the server, and since we keep the SERVER project for reporting bugs in (or feature suggestions for) the MongoDB server I'm going to close this ticket. If you can provide evidence of a bug in mongod please post it here so we can reopen this ticket. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources. Regards, | |||||||||||||||||||||
| Comment by Alex [ 17/Sep/15 ] | |||||||||||||||||||||
|
Hi. Do you need any additional information that can help with this issue? Thanks! | |||||||||||||||||||||
| Comment by Alex [ 11/Sep/15 ] | |||||||||||||||||||||
|
I've managed to catch another severe slowdown (see trace_11Sep15.zip for ss.log and gdbmon.log) Here the mongotop output:
Here the query for <collection1> from the log:
And for example this:
According to top CPU usage was 180-250%. After server restart with the same workload CPU usage is 20-25%. | |||||||||||||||||||||
| Comment by Alex [ 08/Sep/15 ] | |||||||||||||||||||||
|
Just to be clear about what's going inside our app, and what issues we are seeing. We have two stages in our app: Sync and Push
First stage is very simple: app reads document and then updates it. Few apps doing this at the same time, but they are updating different documents. So concurrent document-level locking is very unlikely. Second stage is a bit complicated. We're inserting 0.5-1.5k documents in sharded collection, and then running couple heavy "aggregation" map/reduce against new data with output to another sharded collections. All apps are pushing data to the same collection, but document-level intersections is unlikely as well. One app making one map/reduce at the time, but there few app running simultaneously. On any failure app is going back to the sync stage. For this cluster normal workload is when all apps are pushing data, with very little read access to the cluster. What we're seeing after upgrade to 3.0 with WT: Sync stage: at some point sync operations speed degrades significantly (2-3x slower).
Some queries logged by mongo:
As you can see "Database" lock acquire times are nuts. Also, from the WT release notes is seems like locking should be on the document level, not the collection/database, right? Push stage: at some point push operations speed degrades to the unusable (20-50x slower).
We're seeing some degradation in processing when more than one app is pushing data. But from what you said, it seems like app is spending more time waiting for collection deletion to be committed at the checkpoint, than actually doing something. And combined with the fact, that performance degrades over time, waiting time increases and at the end cluster just can't process queries in the reasonable time. Can we increase checkpoint We're seeing same issue on all of our deployments, but smaller deployments with less data and less load are degrading much slower (weeks) than larger one with more load (days, hours sometimes). Also, we're still running cluster with disabled shardBalancer ( Is there anything else, that I can collect from affected nodes? We have MMS on one cluster, if this helps in any way. | |||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 08/Sep/15 ] | |||||||||||||||||||||
|
yopp, thank you for the additional information. Was there a slow collection during the 40 minutes covered by the ss.log file you uploaded? I can't see anything particularly bad going on in the statistics. The bytes read into cache also looks reasonable (less than 500KB/s at peak). One issue I can see is that there are collection drop operations occurring, and the drops are sometimes blocked for 1-2 seconds. That is probably because the drops are happening concurrent with checkpoints, which take up to 3 seconds to complete for your workload. I don't know whether your application is explicitly dropping collections or databases or if these are temporary collections holding intermediate results. You mentioned "this issue affects all collection in mongo, including oplog". How did you determine that? I would like to keep this ticket focussed on slow read operations with WiredTiger. If you would like help with the sharding messages, please open a separate ticket. | |||||||||||||||||||||
| Comment by Alex [ 07/Sep/15 ] | |||||||||||||||||||||
|
Also, we're seeing a lot of this:
As I said before, map/reduce performance reduced significantly. I'v checked config server, but it seems fine. From time to time I can see "config.locks" collection in the mongotop with 1-3ms read/write times. | |||||||||||||||||||||
| Comment by Alex [ 07/Sep/15 ] | |||||||||||||||||||||
|
Also db04.zip from another server, but gdbmon.py failed after a while with:
| |||||||||||||||||||||
| Comment by Alex [ 07/Sep/15 ] | |||||||||||||||||||||
|
I got a chance to make a profiling. You can find results in the attachment. | |||||||||||||||||||||
| Comment by Alex [ 06/Sep/15 ] | |||||||||||||||||||||
|
We haven't had a maintenance window this weekend to make a profiling. But we're also found out that this issue affects all collection in mongo, including oplog, so secondaries are falling significantly behind the primaries (13h in about a week). Same symptoms: high CPU load, oplog collection in mongotop, and server restart temporary fixes the issue. | |||||||||||||||||||||
| Comment by Alexander Gorrod [ 03/Sep/15 ] | |||||||||||||||||||||
|
Thanks - that would be very helpful. Not that I made a minor update to the commands - please use the most recent version. | |||||||||||||||||||||
| Comment by Alex [ 02/Sep/15 ] | |||||||||||||||||||||
|
Okay, I'll do that a over a weekend. Thanks! | |||||||||||||||||||||
| Comment by Alexander Gorrod [ 02/Sep/15 ] | |||||||||||||||||||||
|
The problems you are seeing are not related to Sorry for the slow turn around - I haven't been able to understand the behavior so far, I've been hoping an explanation will come to me. More information might point us towards the cause. To that end: I've attached a tool to this ticket, gdbmon.py, that will use gdb to collect stack trace samples at regular intervals; this should give us a clearer picture of what's going on inside mongod. Please be aware that using this tool will have some performance (and potentially even functional) impact on the mongod in question; but since it is not performing well as it is, this may be a risk you are willing to accept. If you are willing to do so, while the shard is in the low-performing state please run the following two data collection scripts in parallel:
This will collect serverStatus time series and in addition stack trace samples at 5-second intervals. Stop the commands after about 20 minutes and upload ss.log and gdbmon.log for us to review. | |||||||||||||||||||||
| Comment by Alex [ 01/Sep/15 ] | |||||||||||||||||||||
|
Can this be related to | |||||||||||||||||||||
| Comment by Alex [ 01/Sep/15 ] | |||||||||||||||||||||
|
Over the time we're seeing major map/reduce speed regression. After 2 days It's more than 10x times slower: was 2-8s, now 30-80s according to application log. And it's keeping slowly slowing down. | |||||||||||||||||||||
| Comment by Alex [ 30/Aug/15 ] | |||||||||||||||||||||
|
So after 24h this collections are popped up again. reIndex() helps for a while, but did not resolve issue completely. Right now all primares are running from SSD, that improved request time a bit, but still, performance is lower than before upgrade. Regarding our sharded map/reduces. There a bit odd numbers:
The total runtime for all shards is reported ~400ms, but total map/reduce time is 1832ms. How we can check where the 1200ms is spent? I suspect that we're facing the same issue here? I can see map/reduce temporary collections in mongostat as well on all shards, with "200-400ms" read times. Just in case: our indexes and working set on shards currently doesn't fit in the memory. About posting mongo.log: I just found out that even with disabled profiler and normal log levels, mongo writes some queries to the log (like map/reduce above, so perhaps its logging all queries that took more than ~100ms). So I can't post it as-is, but if you'll tell me what to grep for, I can give you that. Cheers! | |||||||||||||||||||||
| Comment by Alex [ 29/Aug/15 ] | |||||||||||||||||||||
|
We're upgraded our cluster to 3.0.6, but this issue is not resolved by 3.0.6 upgrade. After a while problematic collections are still popping up in "mongotop". But I've found out that calling reIndex(), on problematic collection immediately resolves the issue. CPU load is back to normal, and there no time consuming collections in "mongotop" Seems like the same effect as server restart. During that time everything is running on reasonable speed, but eventually this collection pops up again. I will check on mongostat during weekend, and I'll keep you updated. I've checked status on problematic collection, and cache numbers are normal, but check out the new serverStatus in attachment: "bytes read into cache": 1311874815271 It's almost 1.3Tb of data read in the cache. How that can be possible if server is running for just couple hours? Or this number is persisted between server restarts? Thanks. | |||||||||||||||||||||
| Comment by Alex [ 28/Aug/15 ] | |||||||||||||||||||||
|
By the way, is this eviction thing might also cause | |||||||||||||||||||||
| Comment by Alex [ 28/Aug/15 ] | |||||||||||||||||||||
|
What have we done so far: 1. We're replaced one node on shard #1 with new VM on ssd Seems like processing time is improved, stats are showing reasonable amount of cache traffic. We'll migrate rest of the shard to SSD over the weekend, so we will upgrade shard to 3.0.6. I'll come back with results. | |||||||||||||||||||||
| Comment by Alexander Gorrod [ 28/Aug/15 ] | |||||||||||||||||||||
|
Thanks for the additional information. Could you please upload the mongod log file for shard #1? We made a number of improvements to the eviction algorithm between the 3.0.5 and 3.0.6 releases. Could you try upgrading and report whether the issue is resolved? | |||||||||||||||||||||
| Comment by Alex [ 27/Aug/15 ] | |||||||||||||||||||||
|
Sure. I've attached it for you. | |||||||||||||||||||||
| Comment by Alexander Gorrod [ 27/Aug/15 ] | |||||||||||||||||||||
|
Could you please post the output of db.serverStatus as well? |