[SERVER-26948] Update conflict detection should be optimized for big documents Created: 08/Nov/16 Updated: 06/Dec/22 Resolved: 19/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Write Ops |
| Affects Version/s: | 3.2.10 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Delbosc Benoit | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query
|
||||||||
| Participants: | |||||||||
| Description |
|
When updating document with hundreds of properties, the time spent in update conflict detection is high. The FieldRefSet::insert is visible in flame graph. Attached is a js script that reproduce the problem that create a bottleneck in production. Just for information, we also do update conflict detection on our application before submitting the update commands to MongoDB. We had the same performance problem using a similar algorithm, this has been fixed using a different approach: |
| Comments |
| Comment by Ian Whalen (Inactive) [ 19/Jan/18 ] | ||||||||
|
Fixed by the rewrite of the update sub-system to build a tree of update nodes. | ||||||||
| Comment by Asya Kamsky [ 18/Jan/18 ] | ||||||||
|
Since 3.6 rewrote much of the code involved, I did a very quick test using the attached JS file. I didn't do any profiling, only looked at raw performance of the tested updates.
| ||||||||
| Comment by Ian Whalen (Inactive) [ 13/Dec/16 ] | ||||||||
|
Hi Delbosc, we realized that we could actually use a prefix tree to get better than O(n ^ 2). Putting this in Planned But Not Scheduled for future consideration. | ||||||||
| Comment by David Storch [ 30/Nov/16 ] | ||||||||
|
Hi bdelbosc, Thanks for this clear issue report. I have confirmed this behavior—under your repro I see a profile that looks very similar to the attached flame graph. I am moving this ticket to the Query Team's backlog for triage, please continue to watch for updates. Since all 200 fields in the $set expression share a prefix, our conflict detection is O(n^2), where n is the number of fields in the update expression that share a prefix. I think it has to be O(n^2) since you have to consider whether any pair of two paths is conflicting. However, it does look like the implementation leaves some room for optimization. Do keep in mind that this update is idempotent and that the repro repeatedly updates the same document. All updates but the first are no-ops. The update subsystem's no-op detection will prevent any of these updates from actually doing writes to storage. This causes conflict detection to show up more in the profile. However, I did run a test in which each update was applied to a different document, and the conflict detection still shows up as the most expensive component of the hot path. Best, | ||||||||
| Comment by Ramon Fernandez Marina [ 13/Nov/16 ] | ||||||||
|
Thanks for the detailed report bdelbosc, we've sent this to the Query team to investigate further. | ||||||||
| Comment by Delbosc Benoit [ 08/Nov/16 ] | ||||||||
|
Attaching update-conflict-prod.svg a MongoDB flamegraph taken in production where updates queries can take more than 500ms on a m4.2xlarge host.
Thanks for your help. |