Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Critical - P2
Fix Version/s: 8.0.0-rc5
Affects Version/s: None
Component/s: None
Labels:

Assigned Teams:

Query Execution
Backwards Compatibility:
Fully Compatible
Sprint:
QE 2023-11-27, QE 2023-12-11, QE 2023-12-25, QE 2024-01-08, QE 2024-01-22, QE 2024-02-05, QE 2024-02-19, QE 2024-03-04, QE 2024-03-18, QE 2024-04-01, QE 2024-04-15, QE 2024-04-29, QE 2024-05-13, QE 2024-05-27
Linked BF Score:
35
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Right now we have IDHACK as a stage, but it still involves going through the planner and working with a PlanExecutor, WorkingSet and similar. There is at least a 10% win from making the optimization more aggressive and cutting over to dedicated C++ code that directly uses the SortedDataInterface and RecordStore APIs, and calling that rather than directly. You can see my POC patch for find in PERF-4696, but I don't think that is exactly the right path to take. I think it is worth making functions for find/update/remove by _id and using them throughout the codebase when those operations are needed.

I also prototyped a similar change for update in the update command logic. It didn't show a measurable impact on sys-perf because that is running with w:majority, j:1 which adds a ton of extra overhead and noise (some of which I'm filing other tickets about). However local testing with w:1, j:0 shows that it also experiences a 10% end-to-end improvement from using dedicated code for update by _id. The improvement should be even larger for internal update-by-id codepaths such as oplog application and StorageInterface::upsertById.

I did not prototype remove, but I assume it would benefit from a similar treatment.

A productionized version would have a few enhancements:

Improving the detection of "simple _id query" to include things like {_id: {$eq: 7}}
Splitting {_id: 1, a: 2, b: 3} into separate matches on _id (which will be handled by the index) and everything else and apply the remaining filter (if any) to the resulting doc before deciding to return it.
- I believe this is necessary in order to use the fast path for all oplog updates. We may be able to do most updates without this enhancement, but it will need to fallback to the slow path if there is a residual query.
This should also work for clustered collections where it can skip the _id index and just go directly to the record store.
In an ideal world, the RecordStore::Cursor type (or a subclass) would be modified to support update operations rather than having them as methods on RecordStore itself since that better matches the WiredTiger APIs. We could then use a single WT/RecordStore cursor both to fetch the document and to apply the update, which will save some lookup cost.
- This should probably also happen in the UpdateStage logic, but it may be more complicated because we would need to remove calls to WorkingSet::fetch that naturally won't exist in the dedicated IDHACK code since it will just work directly with RecordIds.

is related to

SERVER-87082 Query fast path for indexed single equalities in update or delete

Backlog

related to

SERVER-82865 Lightweight collection acquisition for findOneById

Backlog

1.	Aggressive IDHACK for find	SERVER-83758	Closed	Colin Stolley	8.0.0-rc0
2.	Aggressive IDHACK for update	SERVER-83759	Closed	Colin Stolley	8.1.0-rc0, 8.0.0-rc5
3.	Aggressive IDHACK for delete	SERVER-83760	Closed	Hana Pearlman	8.1.0-rc0, 8.0.0-rc5
4.	Add ExpressPlan as express execution analog for PlanStages	SERVER-88844	Closed	Justin Seyster	8.1.0-rc0, 8.0.0-rc5

Assignee:: Colin Stolley
Reporter:: Mathias Stearn
Participants:: Colin Stolley, Mathias Stearn, Xiaochen Wu
Votes:: 0 Vote for this issue
Watchers:: 23 Start watching this issue

Created:: Oct 04 2023 01:32:01 PM UTC
Updated:: May 20 2024 08:52:52 PM UTC
Resolved:: May 20 2024 08:52:52 PM UTC
Confidence Status Last Update:: 13/Nov/23 11:20 PM

Details

Description

Attachments

Issue Links

Forms

Sub-Tasks

Activity

People

Dates