[SERVER-48685] very high IO utilization after upgrade from mongo 3.6 to 4.0.18 Created: 10/Jun/20 Updated: 23/Jun/20 Resolved: 17/Jun/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.0.18 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kay Agahd | Assignee: | Dmitry Agranat |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
We upgraded our sharded cluster from mongo 3.6 to 4.0.18 on Linux Stretch. Since then IO utilization is almost always at 100%Unable to render embedded object: File ( Almost all monitored metrics look worse than before the upgrade) not found. I'll attach a screenshot where you can see that IO utilization went up to almost constant 100% at around 0:00 o'clock when we upgraded the shard to mongodb 4.0.18 We did not yet set FCV to 4.0 to allow our deployment to run without enabling these features for a burn-in period to ensure the likelihood of downgrade is minimal - as suggested in mongodb's documentation. Any help is greatly appreciated because this issue becomes critical! |
| Comments |
| Comment by Kay Agahd [ 23/Jun/20 ] | |
|
Hi Dima, this is just to confirm that adding the shardkey to the above mentioned index solved the problem. We still identified other indexes to be adapted in the same manner but the above mentioned index was clearly the dominant one. Now we are upgrading from mongodb v3.6 to 4.0. Utilization of both I/O and CPU look good again as in v3.6. Thanks! | |
| Comment by Dmitry Agranat [ 17/Jun/20 ] | |
|
I am not sure this is the only query you'll need to address but the query I've noticed is a dominant one and is a known issue. I think you can follow the same method of looking at queries which perform significantly more index entries scan / or perform differently at your testing/staging env. Since we've identified this is a duplicate of Regards, | |
| Comment by Kay Agahd [ 16/Jun/20 ] | |
|
Hi Dima, thank you for the suggested workaround. Is this query the only one which is responsable for the high hardware utilization? Thus, do we need to adapt only this index (and hint) to be ready to upgrade to mongodb 4.0? | |
| Comment by Dmitry Agranat [ 16/Jun/20 ] | |
|
I suspect the change in the reported behavior has to do with the fix for Let's have a look at one of the count operations in 4.0.18 (I redacted some fields and IPs but you can find this command based on the timestamp):
Please note here that it is a IXSCAN rather than a COUNT_SCAN. This is despite the fact that the index covers both of the query predicate fields. That is because the database is now having to FETCH the document to apply the SHARDING_FILTER stage. The solution is to add the sharding fields to the index (and update the hints to use it).
Since we are just creating a new compound index, this should not have much of an effect on 3.6. In 3.6 the optimizer will just ignore the trailing shard key field(s) in the index and should not cause any problems. As usual, we recommend testing any change before implementing in production. Thanks, | |
| Comment by Kay Agahd [ 15/Jun/20 ] | |
|
Hi Dima, thanks for the insights, they are very interesting! | |
| Comment by Dmitry Agranat [ 15/Jun/20 ] | |
|
Hi kay.agahd@idealo.de, thank you for providing the requested information. Here is the full sentence which was cut in the middle: "This usually happens when the workload is shifting to a different part of the dataset that is not in FS/WT cache or when there is a significant change in QueryPlan / optimizer." The most significant change that we see between 3.6 and 4.0 is the ratio between totalDocsExamined and totalKeysExamined. While we return the same number of documents in both versions, in 4.0 we perform significantly more index entries scan. Usually, this is an indication of faster fetching but this is not the case here. By looking at the log, I can see you are using hints which might have helped in 3.6 but now might be making things worse. There are also several queries/aggregations using $in, $ne, sorts and merge operators around which we had some optimizations in 4.0 but again, perhaps using hints is counter-productive with these optimizations. I am going to have a closer look at queries/aggregations which have more keysExamined in 4.0 than in 3.6. Once identified, we'll need to collect query plan for each to see exactly in which stage we are performing differently and why (and perhaps removing hints) between 3.6 and 4.0. Thanks, | |
| Comment by Kay Agahd [ 11/Jun/20 ] | |
|
Hi Dima, I also uploaded logs and diagnostic data for shard 5. Maybe it's useful to see how the old hardware behaved compared to the newly added hardware (shard 1 to 4). One question: You did not finish your sentence: "This usually happens when you This explains..." | |
| Comment by Kay Agahd [ 11/Jun/20 ] | |
|
Hi Dima, many thanks for your investigation! I've uploaded the requested file named "shard01-02b.diagnostic.tgz" just a few minutes ago. Btw. some days before I opened also a ticket related to a problem with CloudManager during this upgrade: https://support.mongodb.com/case/00663258 I point this out because I described there what we were doing: Replacing hardware and at the same time upgrading from 3.6 to v4.0. Since the newly added hardware is much more powerful, we removed 5 of the 9 shards. The remaining 4 shards are running on our new hardware. The last shard running on old harware (shard 5) has been removed just a few hours ago. I write this because you may see in the logs more load than usual because of the shard draining process. At least we noticed quite more slow operations during the draining phase. Now it's over and all metrics look fine. To your question why we have enableMajorityReadConcern:false We try to keep latency short. We don't worry if we don't get every time the most actual data. Also, besides PSS we also have PSA and even PSSAA replSets running, so we ruled out enableMajorityReadConcern:false everywhere.
| |
| Comment by Dmitry Agranat [ 11/Jun/20 ] | |
|
Hi kay.agahd@idealo.de, thank you for providing the requested information. I can see the issue you are describing. For some reason, in 4.0.18 you are reading most of the data from disk while in 3.6.18 you are reading most of the data from the WT cache. This usually happens when the workload is shifting to a different part of the dataset that is not in FS/WT cache or when there is a significant change in QueryPlan / optimizer. This explains the difference in i/o utilization but does not explain the read source change. Before A is 4.0.18, after A is 3.6.18. Since the uploaded data does not cover the time when the cluster was running on 3.6.18 before the upgrade to 4.0.18 (we do have a couple of hours of 3.6.18 run after the downgrade from 4.0.18 but it's too short for the comparison I'd like to investigate), is it possible to upload just the diagnostic.data only from mongo-offerstore-de01-02? This will give us a larger comparison window between 3.6.18 and 4.0.18. In addition, is there a reason you have enableMajorityReadConcern:false? Thanks, | |
| Comment by Kay Agahd [ 10/Jun/20 ] | |
|
Hi Dima, I've uploaded the requested files for shard1 and shard2. Shard1 ist the primary shard (we have running only 1 database on this cluster). Thanks! | |
| Comment by Dmitry Agranat [ 10/Jun/20 ] | |
|
Hi kay.agahd@idealo.de, thank you for the report. Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) from one of the servers in question and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Thanks, | |
| Comment by Kay Agahd [ 10/Jun/20 ] | |
|
Since the metrics looked so bad, we downgraded all routers and shards to mongodb v3.6.18. This immediately reduced the IO load. All other metrics now look much better than with v4.0.18 (see screenshot of the primary of the first shard). I intentionally chose the same servers as the primary after the downgrade as before the upgrade, so that we can compare the metrics in the dashboards more easily. The downgrade took place between just after 12:00 (Germany) until now. | |
| Comment by Kay Agahd [ 10/Jun/20 ] | |
|
The above description was probably defaced by the upload of the picture. Since I can't edit it, I post it here again:
|