[SERVER-63095] Operations slow down during index build Created: 28/Jan/22 Updated: 27/Oct/23 Resolved: 17/Aug/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.2.17 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Vladimir Beliakov | Assignee: | Edwin Zhou |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | index, slowdown, | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 16.04 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | 1. Use the configuration mentioned above.
3. Create an index with seven key on a collection with about 750k document (size - ~430 MB, storage size - ~170 MB). |
||||||||
| Participants: | |||||||||
| Description |
|
The original problem: when we upgraded our MongoDB servers to 4.0.12, we started experiencing issues with building indexes. When building an index, the queries and update operations got hung during the time the index was built. To solve this problem, we would restart one of the secondary replicas and make that replica the primary one. That would completely resolve this issue. We were hoping that after upgrading our MongoDB servers to 4.2.17, the new optimized build process would completely eliminate the problem. However, when we tried to build an index with seven keys (not unique, with partial filter expression with two equality expressions) on a non-sharded collection with about 750k document (size - ~430 MB, storage size - ~170 MB), we observed the same behavior. The MongoDB server got very slow for the index build time. The queries got slowed down during the time the index was being built (that was about 66 seconds). The queries took several dozen seconds when usually they take less than 100 milliseconds. We tried to restart some of the secondaries and make it primary, and the index got built within several seconds and no slowdowns were observed. Our cluster configuration:
Replica server configuration:
|
| Comments |
| Comment by Edwin Zhou [ 17/Aug/22 ] |
|
We're happy to hear that you haven't encountered a similar behavior with operations slowing down during an index build. Since our previous investigation has suggested the behavior you saw is expected, I'll close this ticket as Works as Designed. Best, |
| Comment by Vladimir Beliakov [ 12/May/22 ] |
|
Sorry for the late response. Unfortunately (or thankfully |
| Comment by Edwin Zhou [ 04/May/22 ] |
|
We still need additional information to diagnose the problem. If this is still an issue for you, would you please attach diagnostic data and logs that covers an index build that impacts other operations? |
| Comment by Edwin Zhou [ 09/Mar/22 ] |
|
Hi vladimirred456@gmail.com, However, during the index build, we see disk utilization jump to 100% and is indicative of a disk bottleneck. Since an index build performs a collection scan, we can expect it to keep the disk busy while it scans data outside of the cache. Based on this incident, we can hypothesize that the previous incident, where other operations were impacted, may have been from a similar disk bottleneck, but we aren't able say for certain without data from the index build where operations are impacted. If you continue to see index builds impact other operations, please attach diagnostic data that covers those occurrences. Best, |
| Comment by Vladimir Beliakov [ 01/Mar/22 ] |
|
Hi Edwin Zhou, It was 12:50 to 12:56 (UTC) Feb 17 2022. |
| Comment by Edwin Zhou [ 28/Feb/22 ] |
|
Thank you for following up with the diagnostic data for another occurrence. Can you please provide timestamps for when you performed the index build to help correlate the diagnostic data with the date and time of the incident and increased iowait? Best, |
| Comment by Vladimir Beliakov [ 21/Feb/22 ] |
|
Hi, Edwin Zhou! Thanks for your reply. Unfortunately, we don't have `diagnostic.data` for the time when the incident happened. However, several days ago we encounter somewhat similar behavior when building indexes (but only with three keys, though). Even though the ongoing operations weren't inhibited, the primary replica and some secondary replica underwent an increase in iowait. I hope these attached files will shed some light on the culprit. And I'll keep in mind to collect `diagnostic.data` should we run into the same problem. |
| Comment by Edwin Zhou [ 11/Feb/22 ] |
|
Thank you for your report. Would you please archive (tar or zip) the $dbpath/diagnostic.data directory (the contents are described here) that cover the incident you described and attach it to this ticket? Best, |