[SERVER-48685] very high IO utilization after upgrade from mongo 3.6 to 4.0.18 Created: 10/Jun/20  Updated: 23/Jun/20  Resolved: 17/Jun/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.18
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kay Agahd Assignee: Dmitry Agranat
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Bildschirmfoto 2020-06-10 um 11.10.04.png     PNG File PrimaryAfterDowngrade.png     PNG File PrimaryAfterUpgrade.png     PNG File disk_reads.png    
Issue Links:
Duplicate
is duplicated by SERVER-3645 Sharded collection counts (on primary... Closed
Operating System: ALL
Participants:
Case:

 Description   

We upgraded our sharded cluster from mongo 3.6 to 4.0.18 on Linux Stretch. Since then IO utilization is almost always at 100%Unable to render embedded object: File ( Almost all monitored metrics look worse than before the upgrade) not found. I'll attach a screenshot where you can see that IO utilization went up to almost constant 100% at around 0:00 o'clock when we upgraded the shard to mongodb 4.0.18 

We did not yet set FCV to 4.0 to allow our deployment to run without enabling these features for a burn-in period to ensure the likelihood of downgrade is minimal - as suggested in mongodb's documentation.

Any help is greatly appreciated because this issue becomes critical!



 Comments   
Comment by Kay Agahd [ 23/Jun/20 ]

Hi Dima,

this is just to confirm that adding the shardkey to the above mentioned index solved the problem. We still identified other indexes to be adapted in the same manner but the above mentioned index was clearly the dominant one.

Now we are upgrading from mongodb v3.6 to 4.0. Utilization of both I/O and CPU look good again as in v3.6.

Thanks! 

Comment by Dmitry Agranat [ 17/Jun/20 ]

Hi kay.agahd@idealo.de,

I am not sure this is the only query you'll need to address but the query I've noticed is a dominant one and is a known issue. I think you can follow the same method of looking at queries which perform significantly more index entries scan / or perform differently at your testing/staging env.

Since we've identified this is a duplicate of SERVER-3645, I will go ahead and close this case. If you need further assistance troubleshooting, I encourage you to ask our community by posting on the MongoDB Community Forums or on Stack Overflow with the mongodb tag.

Regards,
Dima

Comment by Kay Agahd [ 16/Jun/20 ]

Hi Dima,

thank you for the suggested workaround. Is this query the only one which is responsable for the high hardware utilization? Thus, do we need to adapt only this index (and hint) to be ready to upgrade to mongodb 4.0?

Comment by Dmitry Agranat [ 16/Jun/20 ]

Hi kay.agahd@idealo.de,

I suspect the change in the reported behavior has to do with the fix for SERVER-3645. I'll link this case as a duplicate of SERVER-3645.

Let's have a look at one of the count operations in 4.0.18 (I redacted some fields and IPs but you can find this command based on the timestamp):

2020-06-10T06:25:07.198+0200 I COMMAND  [conn290260] command *redacted* command: count { count: "offer", query: { *redacted*: "9701:cDBD3KWcZF4Wuym2NP6pkQ", *redacted*: { $gt: 0 } }, hint: { $hint: "*redacted_indexA*_1_*redacted_indexB*_1" }, shardVersion: [ Timestamp(66312, 0), ObjectId('591c6ebedb98d406d8ffccde') ], lsid: { id: UUID("99941558-606c-4ab6-b110-0a9d2215a140"), uid: BinData(0, E6EBA640EF06F78A6F3C8F547BC18071345FAF3D2C75432E7C80F4EA7281C1D1) }, $readPreference: { mode: "primaryPreferred" }, $clusterTime: { clusterTime: Timestamp(1591763054, 1149), signature: { hash: BinData(0, A8625D2FEDAEBE1D4513D1E3A031B9CA1FBA1027), keyId: 6796312822776070829 } }, $client: { driver: { name: "mongo-java-driver|legacy", version: "3.11.2" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "3.10.0-1062.12.1.el7.x86_64" }, platform: "Java/AdoptOpenJDK/11.0.6+10", mongos: { host: "*redacted*:27017", client: "*redacted*:35786", version: "3.6.12" } }, $configServerState: { opTime: { ts: Timestamp(1591763051, 2362), t: 49 } }, $db: "*redacted*" } planSummary: IXSCAN { *redacted*: 1, *redacted*: 1 } keysExamined:231414 docsExamined:231414 numYields:4872 reslen:340 locks:{ Global: { acquireCount: { r: 4873 } }, Database: { acquireCount: { r: 4873 } }, Collection: { acquireCount: { r: 4873 } } } storage:{ data: { bytesRead: 2808582539, timeReadingMicros: 49775559 } } protocol:op_msg 52629ms

Please note here that it is a IXSCAN rather than a COUNT_SCAN. This is despite the fact that the index covers both of the query predicate fields. That is because the database is now having to FETCH the document to apply the SHARDING_FILTER stage.

The solution is to add the sharding fields to the index (and update the hints to use it).
Let’s say that the shard key for the collection is some userId field. If so, then for the above log entry you would need to change that current index to be redacted_indexA: 1, redacted_indexB: 1, userId:1 and make sure that you update the hint(s) to use that index as well. That way the database can determine by the index alone whether or not a document is owned by the shard or not without having to FETCH the document to get that field.

These optimizations must also work well also for mongodb v3.6 because we cannot modify queries and/or indexes at the same time we upgrade to v4.0.

Since we are just creating a new compound index, this should not have much of an effect on 3.6. In 3.6 the optimizer will just ignore the trailing shard key field(s) in the index and should not cause any problems.

As usual, we recommend testing any change before implementing in production.

Thanks,
Dima

Comment by Kay Agahd [ 15/Jun/20 ]

Hi Dima,

thanks for the insights, they are very interesting!
Yes, we are explictely using hints because we encountered performance issues when they were not used (also in former mongodb versions already). When hints are not used, some shards may have used a different, sub-optimal index so the query took very long on these shards, hence the client received the response not as quick as usual.
As we know which index performs best for a given query, we hint it to be sure that no other index will be used. I'm eager to know why mongodb v4.0 behaves differently than previous versions in regard of index usages. I'd be greatful if you could shed some light on this when you have identified the queries in question. Then we need to know how to optimize either queries or indexes so that the hardware is not overloaded after upgrading to mongodb v4.0. These optimizations must also work well also for mongodb v3.6 because we cannot modify queries and/or indexes at the same time we upgrade to v4.0.
Thanks!

Comment by Dmitry Agranat [ 15/Jun/20 ]

Hi kay.agahd@idealo.de, thank you for providing the requested information.

Here is the full sentence which was cut in the middle: "This usually happens when the workload is shifting to a different part of the dataset that is not in FS/WT cache or when there is a significant change in QueryPlan / optimizer."

The most significant change that we see between 3.6 and 4.0 is the ratio between totalDocsExamined and totalKeysExamined. While we return the same number of documents in both versions, in 4.0 we perform significantly more index entries scan. Usually, this is an indication of faster fetching but this is not the case here.

By looking at the log, I can see you are using hints which might have helped in 3.6 but now might be making things worse. There are also several queries/aggregations using $in, $ne, sorts and merge operators around which we had some optimizations in 4.0 but again, perhaps using hints is counter-productive with these optimizations.

I am going to have a closer look at queries/aggregations which have more keysExamined in 4.0 than in 3.6. Once identified, we'll need to collect query plan for each to see exactly in which stage we are performing differently and why (and perhaps removing hints) between 3.6 and 4.0.

Thanks,
Dima

Comment by Kay Agahd [ 11/Jun/20 ]

Hi Dima, 

I also uploaded logs and diagnostic data for shard 5. Maybe it's useful to see how the old hardware behaved compared to the newly added hardware (shard 1 to 4).

One question: You did not finish your sentence: "This usually happens when you This explains..."

Comment by Kay Agahd [ 11/Jun/20 ]

Hi Dima,

many thanks for your investigation!

I've uploaded the requested file named "shard01-02b.diagnostic.tgz" just a few minutes ago.

Btw. some days before I opened also a ticket related to a problem with CloudManager during this upgrade: https://support.mongodb.com/case/00663258

I point this out because I described there what we were doing: Replacing hardware and at the same time upgrading from 3.6 to v4.0. Since the newly added hardware is much more powerful, we removed 5 of the 9 shards. The remaining 4 shards are running on our new hardware. The last shard running on old harware (shard 5) has been removed just a few hours ago.

I write this because you may see in the logs more load than usual because of the shard draining process. At least we noticed quite more slow operations during the draining phase. Now it's over and all metrics look fine.

To your question why we have  enableMajorityReadConcern:false 

We try to keep latency short. We don't worry if we don't get every time the most actual data. Also, besides PSS we also have PSA and even PSSAA replSets running, so we ruled out enableMajorityReadConcern:false everywhere.

 

 

Comment by Dmitry Agranat [ 11/Jun/20 ]

Hi kay.agahd@idealo.de, thank you for providing the requested information. I can see the issue you are describing. For some reason, in 4.0.18 you are reading most of the data from disk while in 3.6.18 you are reading most of the data from the WT cache. This usually happens when the workload is shifting to a different part of the dataset that is not in FS/WT cache or when there is a significant change in QueryPlan / optimizer. This explains the difference in i/o utilization but does not explain the read source change.

Before A is 4.0.18, after A is 3.6.18.
This strongly indicates that the workload must have shifted to a different part of the dataset that is not in WT/FS cache and from this point, we are reading mainly from disk.

Since the uploaded data does not cover the time when the cluster was running on 3.6.18 before the upgrade to 4.0.18 (we do have a couple of hours of 3.6.18 run after the downgrade from 4.0.18 but it's too short for the comparison I'd like to investigate), is it possible to upload just the diagnostic.data only from mongo-offerstore-de01-02? This will give us a larger comparison window between 3.6.18 and 4.0.18.

In addition, is there a reason you have enableMajorityReadConcern:false?

Thanks,
Dima

Comment by Kay Agahd [ 10/Jun/20 ]

Hi Dima,

I've uploaded the requested files for shard1 and shard2. Shard1 ist the primary shard (we have running only 1 database on this cluster).

Thanks!

Comment by Dmitry Agranat [ 10/Jun/20 ]

Hi kay.agahd@idealo.de, thank you for the report.

Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) from one of the servers in question and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Comment by Kay Agahd [ 10/Jun/20 ]

Since the metrics looked so bad, we downgraded all routers and shards to mongodb v3.6.18. This immediately reduced the IO load. All other metrics now look much better than with v4.0.18 (see screenshot of the primary of the first shard). I intentionally chose the same servers as the primary after the downgrade as before the upgrade, so that we can compare the metrics in the dashboards more easily. The downgrade took place between just after 12:00 (Germany) until now.

Comment by Kay Agahd [ 10/Jun/20 ]

The above description was probably defaced by the upload of the picture. Since I can't edit it, I post it here again:

We upgraded our sharded cluster from mongo 3.6 to 4.0.18 on Linux Stretch. Since then, IO utilization is almost always at 100%! Almost all monitored metrics look worse than before the upgrade. I'll attach a screenshot where you can see that IO utilization went up to almost constant 100% at around 0:00 o'clock when we upgraded the shard to mongodb 4.0.18 

We did not yet set FCV to 4.0 to allow our deployment to run without enabling these features for a burn-in period to ensure the likelihood of downgrade is minimal - as suggested in mongodb's documentation.

Any help is greatly appreciated because this issue becomes critical!

Generated at Thu Feb 08 05:17:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.