[SERVER-39219] Insert Performance with MongoDB 3.4.10 and MongoDB 4.0.5 by mongostat Created: 28/Jan/19  Updated: 27/Oct/23  Resolved: 13/Mar/23

Status: Closed
Project: Core Server
Component/s: Performance, Storage, WiredTiger
Affects Version/s: 4.0.5
Fix Version/s: 4.3 Desired

Type: Question Priority: Major - P3
Reporter: Dennis Assignee: Backlog - Storage Engines Team
Resolution: Gone away Votes: 0
Labels: customer-mgmt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File 3.4.10.txt     Text File 4.0.5.txt     PNG File cache.png     PNG File ops_manager_20190829.png    
Assigned Teams:
Storage Engines
Sprint: Storage Engines 2019-06-17, Storage Engines 2019-07-01, Storage Engines 2019-08-12
Participants:
Story Points: 5

 Description   

When we insert 700 million data with MongoDB 3.4.10, it needs 4.5 hours. But when we upgrade to MongoDB 4.0.5, it needs 6.5 hours. As time goes on, insert the same data need more and more time. Where can I check the setting to improve my performance.

In MongoDB 3.4.10 cache dirty under 6%, but in MongoDB 4.0.5 cache dirty will up to 20%. We install MongoDB with default setting, and disable NUMA. Related content as attachments, thanks a lot!!



 Comments   
Comment by Sulabh Mahajan [ 13/Mar/23 ]

I am closing this issue as it is no longer relevant for the currently supported releases. Please re-open a new ticket if this applies to a more recent release.

Comment by Dennis [ 29/Aug/19 ]

Hi Bruce,

We are glad to inform you that we updated the information for the issue. We observe that memory grows up, INSERT data slow down. However, the MongoDB service restarts, the insert operation returns to be normal. Please refer to the attachment for the Ops Manager information.

Thanks,
Dennis

Comment by Dennis [ 25/Apr/19 ]

Hi Bruce,

Thanks for your assistance, if there has any question, please don't hesitate to let me know. Thanks.

Thanks,
Dennis

Comment by Bruce Lucas (Inactive) [ 24/Apr/19 ]

Hi Dennis,

Sorry for the delay. Unfortunately we don't have any recommendations yet. The cache.png attachment reflects some internal investigation we are doing with the storage engine team; we'll update you with the outcome of that discussion.

Bruce

Comment by Dennis [ 24/Apr/19 ]

Hi Bruce,

Thanks for the updated comment and providing that data.
According the cache.png, can you give me some suggestions? Please feel free to tell me if there is anything setting need to change. Thank you.

Thanks,
Dennis

Comment by Dennis [ 15/Apr/19 ]

Hi Daniel,

Sorry to trouble you again!
If you have any suggestions for improvement, please contact me. Thank you.

Thanks,
Dennis

Comment by Dennis [ 22/Mar/19 ]

Hi Danny,

Thanks for the updated comment.
We have run 3.4.10 case again and uploaded new diagnostic.data ( field name: diagnostic.data_3.4.10_PSS_20190322.zip ) at the private link you provided. Let me know if you find anything. Thanks a lot!

Thanks,
Dennis

Comment by Danny Hatcher (Inactive) [ 19/Mar/19 ]

Hello Dennis,

I apologize for not responding earlier. It appears that we are still lacking the diagnostic.data for 3.4.10 second test on the Secondary. However, it may not necessary if it does not exist. We are still investigating the issue and will reach out to you again soon.

Thanks,

Danny

Comment by Dennis [ 19/Mar/19 ]

Hi Bruce,

I am just wondering if you have received my previous attachment. Sorry for any inconvenience that could cause. Let me know if you need anything else. Thanks a lot!

Best,
Dennis

Comment by Dennis [ 08/Mar/19 ]

Hi Daniel,

Thanks for the updated comment.
As requested in your comment, I have re-uploaded the diagnostic.data ( diagnostic.data_3.4.10_PSS_20190308.zip, diagnostic.data_4.0.5_PSS_201900308.zip ) at the private link you provided. Let me know if you need anything else. Thank you again for your attention.

Best,
Dennis

Comment by Danny Hatcher (Inactive) [ 07/Mar/19 ]

Hello Dennis,

Thanks for your patience, we are still looking into this. I do see that the most recent uploads do not include Secondary diagnostic.data for the 3.4.10 second test run. Do you still have those files available?

Danny

Comment by Dennis [ 27/Feb/19 ]

Hi Bruce,

Sorry to trouble you again!
I have uploaded our diagnostic.data ( diagnostic.data_3.4.10_PSS_20190220.zip, diagnostic.data_4.0.5_PSS_20190220.zip ) at the private link you provided. If you have any suggestions for improvement, please do not hesitate to contact me. Looking forward to your favorable response.

Best,
Dennis

Comment by Dennis [ 20/Feb/19 ]

Hi Bruce,

Thanks for the updated comment.

As requested in your comment, I have re-uploaded the diagnostic.data ( diagnostic.data_3.4.10_PSS_20190220.zip, diagnostic.data_4.0.5_PSS_20190220.zip ) at the private link you provided.
Thank you again for your attention.

Thanks,
Dennis

Comment by Bruce Lucas (Inactive) [ 19/Feb/19 ]

Hi Dennis,

Thanks for uploading the new data.

In the 4.0.5 case we see the secondaries sometimes lagging, sometimes significantly (tens of minutes). In 4.0 high lag creates additional cache pressure on the primary that can slow it down.

I would like to investigate why the lag is occurring in 4.0.5 but not 3.4.10. Can you please upload the diagnostic.data directories from both secondaries covering both of those tests?

Thanks,
Bruce

Comment by Dennis [ 19/Feb/19 ]

Hi Bruce,

I am just wondering if you have received my previous attachment.  I am uploaded it again just in case you might not have got it yet. Sorry for any inconvenience that could cause. Let me know if you need anything else. Thanks a lot! Looking forward to your favorable response.

Best,
Dennis

Comment by Dennis [ 15/Feb/19 ]

Hi Bruce,

 

Thanks for the updated information.

 

I'll try to answer the questions as much as possible:

We run 3.4.10 case and 4.0.5 case on the same host with the same network environment. In the same condition, there is great difference between execution time and performance.

 

Test data information:

Data count: 600 million

Storage size: 320 GB

  MongoDB 3.4.10 MongoDB 4.0.5
First Time 5 hours 5 hours
Second Time 5 hours 7 hours
Remarks Whether the MongoDB service is restarted or not, it is maintained for 5 hours. As time goes on, the execution time needs more and more. When the MongoDB service restarts, the execution time returns to 5 hours.

  

Beside, as you stated before, in the 4.0.5 case we see one secondary member out of sync, so the information in the 4.0.5 case was incorrect. I have re-uploaded the diagnostic.data ( diagnostic.data_3.4.10_20190215.zip, diagnostic.data_4.0.5_20190215.zip ) which was retested at the private link you provided. Could you please help us to re-diagnose the cause?

 

Thanks. Let me know if you need anything else.

 

Thanks,
Dennis

Comment by Bruce Lucas (Inactive) [ 31/Jan/19 ]

Hi Dennis,

In the 4.0.5 case we see one secondary member lagging severely until lag built to more than an hour and it fell off the oplog. A severely lagging secondary can result in higher cache pressure and lower performance on the node that the secondary is syncing from because old parts of the oplog must be kept in cache or re-read into cache, and indeed we see a high rate of data being read into cache for the oplog in the 4.0.5 case.

In the 3.4.10 case we also see a lagging secondary, but we don't see a high rate of oplog data being read into cache. This is likely because the lagging secondary was not syncing from the primary in that case but rather from the other secondary.

In the 4.0.5 data we also see a high rate of tcp retransmissions. (We don't record that information in 3.4 until 3.4.16). This makes me suspect the lag is likely related to a network issue. I'd recommend that you investigate the cause of the lag, and the possible network issue.

Bruce

Comment by Dennis [ 31/Jan/19 ]

Hey Bruce,

I have uploaded the diagnostic.data(diagnostic.data_3.4.10.zip, diagnostic.data_4.0.5.zip) and the mongostat log at the private link you provided. Let me know if you find anything.
Thanks a lot!

Best,
Dennis

Comment by Bruce Lucas (Inactive) [ 29/Jan/19 ]

Hi Dennis,

We don't at this point provide a viewer for the diagnostic data. The data collected in the diagnostic.data directory is described here. Also, you may view the code that collects that data here.

You can upload your files to this secure private portal. This will also allow you to upload the entire diagnostic.data directory, which may exceed the JIRA attachment limit.

Thanks,
Bruce

Comment by Dennis [ 29/Jan/19 ]

Hi Bruce,

Thanks for adding the comment. 

The files in diagnostic.data directory look like binary data. Could you please tell me how to view this data? We don't want to unknowingly share any secret data.  The company has strict policy regarding sharing of such data. Or where can we upload to share the information with the assurance that keep it private? 

Thanks,
Dennis

Comment by Bruce Lucas (Inactive) [ 28/Jan/19 ]

Hi Dennis,

Can you please archive and attach the content of the $dbpath/diagnostic.data directory? It is important that the diagnostic data should cover both the 3.4 and 4.0 insert jobs; you can check the time period covered by looking at the metrics.* filenames, which are timestamps indicating the start of the data covered by that file. Also, if you can attach mongod log files covering the two import jobs that may be helfpul.

Thanks,
Bruce

Generated at Thu Feb 08 04:51:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.