[SERVER-58873] Compaction procedure for small collection takes a significant amount of time Created: 27/Jul/21 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Basil Markov | Assignee: | Backlog - Storage Engines Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | refinement | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Storage Engines
|
||||
| Participants: | |||||
| Story Points: | 8 | ||||
| Description |
|
Hi there, I have some questions about the compaction procedure timing. Our pre-requisites was: 1. We started compaction first on our smallest collection, let's say collectionX size of
Also, we pre-checked the potential amount of disk resource we can free:
So, with that in mind we started compaction by running this command:
At pretty close to the same time we saw this message in mongo.log file:
That's exactly the time when compaction started, and then
......
It took smth like 38 hours to get the job done for this collection and indexes and additional 25 hours to catch up with the master. We haven't seen any anomalies or spikes in CPU/MEM or disk usage consumption during compaction, just some asserts in our monitoring, but not crucially, especially compared to average workday metrics. As for the result - it's pretty amazing, the final size of our collection on disk became 3204157440 (4 times less than we had had before compaction). But the main question is - is that a normal time for compaction procedure for this collection size(~13G)? How can we calculate the potential amount of time compaction may last? Maybe we can do it based on some objectives or average load metrics/total collections size on disks/whatever? Because we were slightly frustrated about a 38h compaction for such a small amount of disk space. |
| Comments |
| Comment by Dmitry Agranat [ 19/Sep/21 ] | |||||||
|
Thank you for the latest data haltandcatchfire91@gmail.com. We're assigning this ticket to the appropriate team to be further investigated based on our findings so far. Updates will be posted on this ticket as they happen. | |||||||
| Comment by Basil Markov [ 15/Sep/21 ] | |||||||
|
Hi Dmitry Agranat. | |||||||
| Comment by Dmitry Agranat [ 13/Sep/21 ] | |||||||
|
Hi haltandcatchfire91@gmail.com, for completeness, could you upload the output from collStats for the irkkt.REG.checks_schedule collection? I'd like to review storage stats including all the indexes. | |||||||
| Comment by Basil Markov [ 10/Sep/21 ] | |||||||
|
Is there any kind of update about FTDC and compaction procedure? | |||||||
| Comment by Dmitry Agranat [ 05/Sep/21 ] | |||||||
|
Thanks haltandcatchfire91@gmail.com, I am looking at the provided data and cannot see anything unusual that might indicate an issue. However, because the compact is done in the "maintenance mode", we also do not collect any FTDC. Having visibility into some FTDC metrics might have helped us to spot any potential inefficiency. I am going to investigate this a bit further to see if we can turn on diagnostic data gathering during the compact command. | |||||||
| Comment by Basil Markov [ 01/Sep/21 ] | |||||||
|
Hi Dmitry Agranat
| |||||||
| Comment by Dmitry Agranat [ 01/Sep/21 ] | |||||||
|
Hi haltandcatchfire91@gmail.com, here is the new uploader link. Please add all the relevant information (similar to what you did before, "file bytes available for reuse" before compact, name of the collection, timestamp of the compact command start and end) in the next comment once the data is uploaded. | |||||||
| Comment by Basil Markov [ 01/Sep/21 ] | |||||||
|
Dmitry Agranat can you please share a new upload link and reopen this issue just to prevent creating duplicates? | |||||||
| Comment by Basil Markov [ 30/Aug/21 ] | |||||||
|
Hi! | |||||||
| Comment by Dmitry Agranat [ 29/Aug/21 ] | |||||||
|
Hi haltandcatchfire91@gmail.com, We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. Regards, | |||||||
| Comment by Basil Markov [ 12/Aug/21 ] | |||||||
|
Hi! | |||||||
| Comment by Dmitry Agranat [ 12/Aug/21 ] | |||||||
|
Hi haltandcatchfire91@gmail.com, We still need additional information to diagnose the problem. If this is still an issue for you, would you please upload the requested data into this support uploader? Thanks, | |||||||
| Comment by Dmitry Agranat [ 28/Jul/21 ] | |||||||
|
Hi haltandcatchfire91@gmail.com, we would need some additional information in order to understand why it took 38 hours to compact this collection. Would you please archive (tar or zip) the mongod.log files covering the incident and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Regards, |