[SERVER-39219] Insert Performance with MongoDB 3.4.10 and MongoDB 4.0.5 by mongostat Created: 28/Jan/19 Updated: 27/Oct/23 Resolved: 13/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Storage, WiredTiger |
| Affects Version/s: | 4.0.5 |
| Fix Version/s: | 4.3 Desired |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Dennis | Assignee: | Backlog - Storage Engines Team |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | customer-mgmt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Assigned Teams: |
Storage Engines
|
| Sprint: | Storage Engines 2019-06-17, Storage Engines 2019-07-01, Storage Engines 2019-08-12 |
| Participants: | |
| Story Points: | 5 |
| Description |
|
When we insert 700 million data with MongoDB 3.4.10, it needs 4.5 hours. But when we upgrade to MongoDB 4.0.5, it needs 6.5 hours. As time goes on, insert the same data need more and more time. Where can I check the setting to improve my performance. In MongoDB 3.4.10 cache dirty under 6%, but in MongoDB 4.0.5 cache dirty will up to 20%. We install MongoDB with default setting, and disable NUMA. Related content as attachments, thanks a lot!! |
| Comments |
| Comment by Sulabh Mahajan [ 13/Mar/23 ] | ||||||||||||
|
I am closing this issue as it is no longer relevant for the currently supported releases. Please re-open a new ticket if this applies to a more recent release. | ||||||||||||
| Comment by Dennis [ 29/Aug/19 ] | ||||||||||||
|
Hi Bruce, We are glad to inform you that we updated the information for the issue. We observe that memory grows up, INSERT data slow down. However, the MongoDB service restarts, the insert operation returns to be normal. Please refer to the attachment for the Ops Manager information. Thanks, | ||||||||||||
| Comment by Dennis [ 25/Apr/19 ] | ||||||||||||
|
Hi Bruce, Thanks for your assistance, if there has any question, please don't hesitate to let me know. Thanks. Thanks, | ||||||||||||
| Comment by Bruce Lucas (Inactive) [ 24/Apr/19 ] | ||||||||||||
|
Hi Dennis, Sorry for the delay. Unfortunately we don't have any recommendations yet. The cache.png attachment reflects some internal investigation we are doing with the storage engine team; we'll update you with the outcome of that discussion. Bruce | ||||||||||||
| Comment by Dennis [ 24/Apr/19 ] | ||||||||||||
|
Hi Bruce, Thanks for the updated comment and providing that data. Thanks, | ||||||||||||
| Comment by Dennis [ 15/Apr/19 ] | ||||||||||||
|
Hi Daniel, Sorry to trouble you again! Thanks, | ||||||||||||
| Comment by Dennis [ 22/Mar/19 ] | ||||||||||||
|
Hi Danny, Thanks for the updated comment. Thanks, | ||||||||||||
| Comment by Danny Hatcher (Inactive) [ 19/Mar/19 ] | ||||||||||||
|
Hello Dennis, I apologize for not responding earlier. It appears that we are still lacking the diagnostic.data for 3.4.10 second test on the Secondary. However, it may not necessary if it does not exist. We are still investigating the issue and will reach out to you again soon. Thanks, Danny | ||||||||||||
| Comment by Dennis [ 19/Mar/19 ] | ||||||||||||
|
Hi Bruce, I am just wondering if you have received my previous attachment. Sorry for any inconvenience that could cause. Let me know if you need anything else. Thanks a lot! Best, | ||||||||||||
| Comment by Dennis [ 08/Mar/19 ] | ||||||||||||
|
Hi Daniel, Thanks for the updated comment. Best, | ||||||||||||
| Comment by Danny Hatcher (Inactive) [ 07/Mar/19 ] | ||||||||||||
|
Hello Dennis, Thanks for your patience, we are still looking into this. I do see that the most recent uploads do not include Secondary diagnostic.data for the 3.4.10 second test run. Do you still have those files available? Danny | ||||||||||||
| Comment by Dennis [ 27/Feb/19 ] | ||||||||||||
|
Hi Bruce, Sorry to trouble you again! Best, | ||||||||||||
| Comment by Dennis [ 20/Feb/19 ] | ||||||||||||
|
Hi Bruce, Thanks for the updated comment. As requested in your comment, I have re-uploaded the diagnostic.data ( diagnostic.data_3.4.10_PSS_20190220.zip, diagnostic.data_4.0.5_PSS_20190220.zip ) at the private link you provided. Thanks, | ||||||||||||
| Comment by Bruce Lucas (Inactive) [ 19/Feb/19 ] | ||||||||||||
|
Hi Dennis, Thanks for uploading the new data. In the 4.0.5 case we see the secondaries sometimes lagging, sometimes significantly (tens of minutes). In 4.0 high lag creates additional cache pressure on the primary that can slow it down. I would like to investigate why the lag is occurring in 4.0.5 but not 3.4.10. Can you please upload the diagnostic.data directories from both secondaries covering both of those tests? Thanks, | ||||||||||||
| Comment by Dennis [ 19/Feb/19 ] | ||||||||||||
|
Hi Bruce, I am just wondering if you have received my previous attachment. I am uploaded it again just in case you might not have got it yet. Sorry for any inconvenience that could cause. Let me know if you need anything else. Thanks a lot! Looking forward to your favorable response. Best, | ||||||||||||
| Comment by Dennis [ 15/Feb/19 ] | ||||||||||||
|
Hi Bruce,
Thanks for the updated information.
I'll try to answer the questions as much as possible: We run 3.4.10 case and 4.0.5 case on the same host with the same network environment. In the same condition, there is great difference between execution time and performance.
Test data information: Data count: 600 million Storage size: 320 GB
Beside, as you stated before, in the 4.0.5 case we see one secondary member out of sync, so the information in the 4.0.5 case was incorrect. I have re-uploaded the diagnostic.data ( diagnostic.data_3.4.10_20190215.zip, diagnostic.data_4.0.5_20190215.zip ) which was retested at the private link you provided. Could you please help us to re-diagnose the cause?
Thanks. Let me know if you need anything else.
Thanks, | ||||||||||||
| Comment by Bruce Lucas (Inactive) [ 31/Jan/19 ] | ||||||||||||
|
Hi Dennis, In the 4.0.5 case we see one secondary member lagging severely until lag built to more than an hour and it fell off the oplog. A severely lagging secondary can result in higher cache pressure and lower performance on the node that the secondary is syncing from because old parts of the oplog must be kept in cache or re-read into cache, and indeed we see a high rate of data being read into cache for the oplog in the 4.0.5 case. In the 3.4.10 case we also see a lagging secondary, but we don't see a high rate of oplog data being read into cache. This is likely because the lagging secondary was not syncing from the primary in that case but rather from the other secondary. In the 4.0.5 data we also see a high rate of tcp retransmissions. (We don't record that information in 3.4 until 3.4.16). This makes me suspect the lag is likely related to a network issue. I'd recommend that you investigate the cause of the lag, and the possible network issue. Bruce | ||||||||||||
| Comment by Dennis [ 31/Jan/19 ] | ||||||||||||
|
Hey Bruce, I have uploaded the diagnostic.data(diagnostic.data_3.4.10.zip, diagnostic.data_4.0.5.zip) and the mongostat log at the private link you provided. Let me know if you find anything. Best, | ||||||||||||
| Comment by Bruce Lucas (Inactive) [ 29/Jan/19 ] | ||||||||||||
|
Hi Dennis, We don't at this point provide a viewer for the diagnostic data. The data collected in the diagnostic.data directory is described here. Also, you may view the code that collects that data here. You can upload your files to this secure private portal. This will also allow you to upload the entire diagnostic.data directory, which may exceed the JIRA attachment limit. Thanks, | ||||||||||||
| Comment by Dennis [ 29/Jan/19 ] | ||||||||||||
|
Hi Bruce, Thanks for adding the comment. The files in diagnostic.data directory look like binary data. Could you please tell me how to view this data? We don't want to unknowingly share any secret data. The company has strict policy regarding sharing of such data. Or where can we upload to share the information with the assurance that keep it private? Thanks, | ||||||||||||
| Comment by Bruce Lucas (Inactive) [ 28/Jan/19 ] | ||||||||||||
|
Hi Dennis, Can you please archive and attach the content of the $dbpath/diagnostic.data directory? It is important that the diagnostic data should cover both the 3.4 and 4.0 insert jobs; you can check the time period covered by looking at the metrics.* filenames, which are timestamps indicating the start of the data covered by that file. Also, if you can attach mongod log files covering the two import jobs that may be helfpul. Thanks, |