[SERVER-84972] Investigate if pipeline-style update performance got better for IncFewLargeDocLongFields Created: 06/Jun/19  Updated: 12/Jan/24  Resolved: 17/Sep/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: James Wahlin Assignee: Ruoxin Xu
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2019-06-17 at 2.20.56 PM.png    
Issue Links:
Related
Sprint: Query 2019-06-17, Query 2019-07-01, Query 2019-07-29, Query 2019-08-12, Query 2019-08-26, Query 2019-09-09, Query 2019-10-07, Query 2019-12-16, Query 2020-09-21
Participants:
Linked BF Score: 0

 Description   

See some comments and previous description for context, we're wondering if the new strategy for logging these updates will help performance on this workload.

 

The focus for this ticket will be profiling of the IncFewLargeDocLongFields perf test. There was a 2x-5x slowdown for this test as compared to the baseline across all variants.



 Comments   
Comment by Ian Boros [ 16/Sep/20 ]

ruoxin.xu I'd say you're good to close this.

Comment by Ruoxin Xu [ 16/Sep/20 ]

This microbenchmark was running an incorrect query. It's now been modified and renamed to 'IncrementFewKeysLargeDocLongFields' after PERF-2028. Generating $v:2 oplog entries didn't seem to have obvious regression on the new microbenchmark. ian.boros Do you think we need other further investigation on this ticket?

Comment by Ian Boros [ 03/Sep/20 ]

This workload has a bug caused by a typo in the update pipeline. Instead of incrementing fields, the update is making the document one level deeper each time. Eventually, the updates start failing with a "depth exceeded" error and taking a slow uasserted()/exception code path. I don't think we should read too much into changes in performance until this problem is fixed. See PERF-2028.

Comment by Charlie Swanson [ 18/Dec/19 ]

Re-titled this ticket to better reflect current reality after discussing with ian.boros. Throwing this back on the backlog for now.

Comment by James Wahlin [ 23/Oct/19 ]

david.storch - it is definitely possible that SERVER-41114 would improve performance and is worth a try.

This ticket was filed to address to explore slowness in pipeline-based updates when compared to non-pipeline-based updates. The IncFewLargeDocLongFields microbenchmark was implemented for both and is an example of a scenario that pipeline-based updates will not be performant on since we piggy-back on the replacement update mechanism. This is not actually a regression since we have not replaced the $inc update operator with pipeline updates. I will retitle this ticket to reflect.

Comment by David Storch [ 22/Oct/19 ]

james.wahlin, is it possible that SERVER-41114 would improve performance? If that's at least plausible, then we could try running a perf patch build with justin.seyster's draft changes for SERVER-41114.

Can you also clarify whether there was a performance regression? Or were you making an observation about how pipeline-based updates are slow compared to the baseline set by regular old-fashioned non-pipeline-based updates?

Comment by James Wahlin [ 21/Oct/19 ]

I profiled this and found a significant amount of time going through a BSON -> Document/Value -> mutablebson -> BSON transformation for pipeline updates. I was curious whether the work being done for Document/Value at the time help make this faster.

Comment by David Storch [ 19/Oct/19 ]

james.wahlin, I'm not aware of Martin ever looking into this. Can you elaborate on your understanding of the source of the slowness?

Comment by James Wahlin [ 18/Oct/19 ]

david.storch - if martin.neupauer did not find anything Document/Value related that would improve document transformation performance (outside of longer term CQF work) then I think we can close this ticket. I investigated earlier and did not find any quick wins.

Comment by David Storch [ 18/Oct/19 ]

james.wahlin do you have time to investigate during this sprint?

Comment by James Wahlin [ 02/Jul/19 ]

We plan to address the slowness caused by document transformation between mutablebson, BSONObj and Document/Value as part of the Common Query Framework roadmap. The only step remaining here is to confirm whether the Document/Value project can provide any nearer-term improvements.

martin.neupauer - I am assigning this ticket to you to wrap up as I will be OOO next week. If there are no quick-wins to be had via Document/Value then feel free to close this ticket.

Comment by James Wahlin [ 25/Jun/19 ]

Reviewing the perf data confirms that the majority of time spent is in creating and destroying document elements. Pipeline updates pay a substantial cost in that they go through a BSON -> Document/Value -> mutablebson -> BSON transformation.

I suspect that the way forward here will be to replace mutablebson with Document/Value across the update system. As part of this we can consider replacing use of the ObjectReplaceExecutor with a mechanism specific for pipeline update.

Comment by James Wahlin [ 24/Jun/19 ]

Initial profiling efforts show that CPU time is split across transforming Document to BSONObj, performing the pipeline transformation and applying the replacement style update to the provided mutablebson document. Applying the changes from the Document/Value project did not improve performance.

Next steps are:
1) Martin is going to take a look to see if the Document/Value project can improve performance for this use case.
2) I am going to dig deeper into where time is being spent to see how we could improve.

Generated at Thu Feb 08 06:56:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.