[SERVER-49223] Suggestions to speed up initial sync Created: 01/Jul/20 Updated: 28/Jul/20 Resolved: 28/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.2.8 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Oliver Yeh | Assignee: | Dmitry Agranat |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
I have a 5TB, 2 collection instance that I need to move from zlib to zstd. The initial sync is painfully slow right now on a very beefy secondary instance (m5.12xlarge, 192GB of RAM, 48vCPUs). The initial sync is scheduled to be completed in 8 days.
Primary instance is on an even beefier machine with no load. I tried diagnosing the slowdown and determined - disk is not saturated with iostat - cpu is not saturated with htop. - changing instance type to increase/decrease RAM - played around with maxIndexBuildMemoryUsageMegabytes with setParameter - played around with replWriterThreadCount with setParameter
It seems like the instance can do a lot more with disk + CPU not saturated on both the primary and the secondary. Is there anything else I can try? |
| Comments |
| Comment by Dmitry Agranat [ 12/Jul/20 ] |
|
Yes, if possible please upload the data covering this process from the start, both from syncing secondary and from the primary. 200MB of diagnostic data should cover these 8 days. Just as fyi, the "metric" file you've uploaded just covers 5 hours so it's better to upload the whole archive of diagnostic.data. From the period of time covering these 5 hours, we can see that setting replWriterThreadCount to 32 is making things worse. Checkpoint is not keeping up with the demand of replicating 23k write operations. Is is possible to gather all the requested information under the default configuration? Thanks, |
| Comment by Oliver Yeh [ 08/Jul/20 ] |
|
I uploaded what I could. Some of the log files have been overwritten (apparently only 200MB on the diagnostic.data?). If that is not enough, we can close the issue and I can reopen it next time I do a full resync. Thank you! |
| Comment by Dmitry Agranat [ 07/Jul/20 ] |
|
Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) from both syncing secondary and the primary covering the time of the initial sync and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Thanks, |
| Comment by Oliver Yeh [ 02/Jul/20 ] |
|
4.2.8 |
| Comment by Dmitry Agranat [ 02/Jul/20 ] |
|
What MongoDB version do you use during the initial sync? Thanks, |