[SERVER-49084] Consider using BSONObj instead of mutablebson for storage validation in updates Created: 25/Jun/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ian Boros Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Participants:

 Description   

All updates currently do document storage validation (depth, field name checks, type checks for _id) using mutablebson. We should consider changing this interface to instead accept BSONObj so that pipeline-based updates, replacement updates, and (soon) $v: 2 delta updates don't need to go through mutablebson.

The current modifier-style update system does storage validation node by node (rather than once at the end), so to do this efficiently that would have to change.

EDIT: After more reading, there's a good chance this would cause perf regressions for modifier-style updates. Doing storage validation on BSONObj in the "update with damages" path would likely require us to serialize the mutablebson post-image object into BSONObj so that it could be passed to the validation code. The serialized post image would be discarded, and WT would apply the damage vector. Re-serializing the post image could be a decent amount of wasted work, and could lead to performance regressions, especially for cases where a very small in-place update is done to a very large document.

We should keep this in mind when triaging this ticket.



 Comments   
Comment by Ian Boros [ 25/Jun/20 ]

The motivation there was mostly about simplifying the code but I suspect that this would improve performance on some paths. Today documents which are modified using pipeline based updates are converted from BSONObj to mutablebson, back to BSONObj, then to Document/Value, then back to BSONObj, through mutablebson again, and finally back to BSONObj. Some of these conversions are free, and some are not. Regardless, these transitions add needless complexity, especially considering that pipeline based updates do not take advantage of mutablebson's capabilities to do in-place modifications and so on. It would be good if we only used mutablebson in code paths which benefit from it, and the work described in this ticket would be a step towards achieving that.

I imagine removing all transitions to/from mutablebson for pipeline updates and full replacement updates would be a performance win, for those code paths. The discussion at the end was intended to warn that that naively making the kind of change suggested here could result in a performance loss for modifier-style updates in the "update with damages" path. I'm sorry for the confusion!

Comment by Eric Milkie [ 25/Jun/20 ]

I don't understand the motivation of "don't need to go through mutablebson".  Is that expected to be a performance gain?  That is in conflict with the later discussion that it could instead result in performance loss.

Comment by Ian Boros [ 25/Jun/20 ]

Sorry, that was wording mistakenly borrowed from a conversation in a code review. I meant regular BSONObj.

The idea was that we could avoid going through mutablebson altogether for 3 out of the 4 update code paths (replacement, pipeline and the new "delta" updates being introduced in 4.6). After doing a bit more reading I realize that there's a decent chance this would cause performance regressions for modifier-style updates (see new description) so I've changed the ticket title to "Consider ...".

Comment by Eric Milkie [ 25/Jun/20 ]

What is regular BSON?

Generated at Thu Feb 08 05:18:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.