[SERVER-26948] Update conflict detection should be optimized for big documents Created: 08/Nov/16  Updated: 06/Dec/22  Resolved: 19/Jan/18

Status: Closed
Project: Core Server
Component/s: Performance, Write Ops
Affects Version/s: 3.2.10
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Delbosc Benoit Assignee: Backlog - Query Team (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File update-conflict-prod.svg     File update-conflict-test.js     PNG File update-conflict.png     File update-conflict.svg    
Issue Links:
Duplicate
is duplicated by SERVER-28189 Update with $set of many array elemen... Closed
Assigned Teams:
Query
Participants:

 Description   

When updating document with hundreds of properties, the time spent in update conflict detection is high.

The FieldRefSet::insert is visible in flame graph.

Attached is a js script that reproduce the problem that create a bottleneck in production.

Just for information, we also do update conflict detection on our application before submitting the update commands to MongoDB. We had the same performance problem using a similar algorithm, this has been fixed using a different approach:
https://github.com/nuxeo/nuxeo/commit/eeceab7024445b88082f7406290478d9f6702c28



 Comments   
Comment by Ian Whalen (Inactive) [ 19/Jan/18 ]

Fixed by the rewrite of the update sub-system to build a tree of update nodes.

Comment by Asya Kamsky [ 18/Jan/18 ]

Since 3.6 rewrote much of the code involved, I did a very quick test using the attached JS file. I didn't do any profiling, only looked at raw performance of the tested updates.

// 3.4.x
    961 5ms
      4 6ms
     26 7ms
      3 8ms
// 3.6.0
    979 0ms
     16 1ms

Comment by Ian Whalen (Inactive) [ 13/Dec/16 ]

Hi Delbosc, we realized that we could actually use a prefix tree to get better than O(n ^ 2). Putting this in Planned But Not Scheduled for future consideration.

Comment by David Storch [ 30/Nov/16 ]

Hi bdelbosc,

Thanks for this clear issue report. I have confirmed this behavior—under your repro I see a profile that looks very similar to the attached flame graph. I am moving this ticket to the Query Team's backlog for triage, please continue to watch for updates.

Since all 200 fields in the $set expression share a prefix, our conflict detection is O(n^2), where n is the number of fields in the update expression that share a prefix. I think it has to be O(n^2) since you have to consider whether any pair of two paths is conflicting. However, it does look like the implementation leaves some room for optimization. Do keep in mind that this update is idempotent and that the repro repeatedly updates the same document. All updates but the first are no-ops. The update subsystem's no-op detection will prevent any of these updates from actually doing writes to storage. This causes conflict detection to show up more in the profile. However, I did run a test in which each update was applied to a different document, and the conflict detection still shows up as the most expensive component of the hot path.

Best,
Dave

Comment by Ramon Fernandez Marina [ 13/Nov/16 ]

Thanks for the detailed report bdelbosc, we've sent this to the Query team to investigate further.

Comment by Delbosc Benoit [ 08/Nov/16 ]

Attaching update-conflict-prod.svg a MongoDB flamegraph taken in production where updates queries can take more than 500ms on a m4.2xlarge host.

2016-11-08T11:01:33.744+0000 I COMMAND  [conn4903] warning: log line attempted (120kB) over max size (10kB), printing beginning and end ... command nuxeo.$cmd command: update { update: "default", ordered: false, updates: [ { q: { ecm:id: "f17b6a7f-d001-46d7-acfe-b6ab04853d5f" }, u: { $set: { ... table of 1000 items ...  } } } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:55 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } protocol:op_query 528ms

Thanks for your help.

Generated at Thu Feb 08 04:13:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.