[SERVER-83449] fixDocumentForInsert iterates through each document up to four times Created: 20/Nov/23  Updated: 28/Nov/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Vishnu Kaushik Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File perf-test_phase-0000_flamegraph-1.svg     File perf-test_phase-0000_flamegraph.svg    
Issue Links:
Related
is related to SERVER-83148 Investigate hand parsing bulkWrite co... Closed
Assigned Teams:
Replication
Participants:

 Description   

fixDocumentForInsert goes through each document to insert four times:
1. First to validate the depth here
2. Then we iterate through the document to validate it (check if there are Timestamps needing fixing, if _id is present more than once, etc.)
3. If the _id is not the first element in the BSON, we fetch the _id element, which under the hood iterates through the BSON doc again.
4. We iterate through the BSON doc again to copy elements into the new BSON doc here

We should be needing at most two passes to do this (maybe even less if we are clever). And some of these steps are easy to avoid. For example, step (3) above can be avoided by remembering where the _id field is in step (2).

I also think we can generate UUIDs (b.appendOID() / reserve optimes to fill in timestamps in batches instead of one at a time. We see (look at attached flamegraphs) that it's taking a considerable amount of time.

See comments for more info.


Generated at Thu Feb 08 06:52:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.