[SERVER-70807] Avoid unnecessarily fetching documents while updating Created: 24/Oct/22 Updated: 05/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Chris Harris | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Optimization
|
| Participants: | |
| Case: | (copied to CRM) |
| Description |
|
update operations can currently FETCH the same document multiple times. This happens when the index being used includes a key that the update is modifying and the modification places the new entry later in the index relative to the traversal that is happening. We successfully use the record ids to avoid modifying the document multiple times, but we currently FETCH it more than once. This seems unnecessary as we should be able to use the record id in the index to confirm that no further work is necessary for the document. |
| Comments |
| Comment by Chris Harris [ 11/Nov/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
brenda.rodriguez@mongodb.com please and thank you! My only comment is that the specific name of nDeduplicated may need to be adjusted. Would that one cause some ambiguity with, for example, usage of multikey indexes and some differences in behavior(s) for different "types" of de-duplication? Unclear, but something to think through. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brenda Rodriguez [ 10/Nov/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
christopher.harris@mongodb.com do you agree we should file a new ticket with the new metric Charlie proposed? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 27/Oct/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
christopher.harris@mongodb.com I think a new metric should be pretty straightforward and a pretty good neweng ticket also. Would something like nDeduplicated: X be clear enough? Also while I'm here, I wanted to build a bit on what arun.banala@mongodb.com said above. The UPDATE stage does track its own set of RecordIds for deduplication, and also does assume that the incoming documents are already fetched. Interestingly, the IX_SCAN stage can also track duplicated RecordIds - which we use extensively when scanning multi key indexes. I think an interesting half-measure here to consider would be to have both stages deduplicate. Below I elaborate on why we might need to keep both, but a hesitation with changing anything like this is the memory consumption profile will change. And if we keep multiple sets of RecordIds, we could approximately double that memory footprint. Maybe we could move the de-duplication onto the WorkingSet which is shared by all stages? Food for thought. That may be risky in terms of scoping for OR plans for example. To be more aggressive, it may be OK to only have an IX_SCAN deduplicate in some cases, but I am not convinced by just thinking about it that (1) there is always a single IX_SCAN. If we have an OR or AND_HASH or something in the plan then we need to deduplicate at a higher level. (2) A higher level deduplication has different semantic guarantees. The FETCH stage may have a filter attached to it, and so it would matter if you do the de-duplication before or after that filter. Admittedly, it would only be in some edge cases that this would be noticeable, but for example: The system today deduplicates in (during?) the UPDATE stage, so it would update a document if the first time we see it in the index it fails the FETCH's filter, but the second time it makes it through. If we push the deduplication into the index scan, we would never have updated such a document. I think this is OK from a consistency perspective, but I'm not 100% sure. (christopher.harris@mongodb.com do you think we could make these comments public? I just toggled mine to match yours since it doesn't make sense on its own) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chris Harris [ 24/Oct/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Pulling some additional details provided by arun.banala@mongodb.com:
Perhaps there is also a separate improvement request that could be opened for the log/profiler metrics. As far as I can tell, there isn't currently a metric that more directly indicates that the object being examined was already modified:
I have a vague recollection that we had something like this previously for MMAPv1? Regardless, if that seems like a lightweight diagnosability improvement then I'd be in favor of an additional SERVER for it which could be nominated as a quick win. |