Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Minor - P4
Fix Version/s: 8.1.0-rc0
Affects Version/s: None
Component/s: Internal Client
Labels:
None

Assigned Teams:

Query Execution
Backwards Compatibility:
Fully Compatible
Sprint:
QE 2024-11-25
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The DBClientCursor class has several methods with an inefficient internal implementation:

The peekFirst and peekError methods always call into peek.
peek internally builds up an std::vector<BSONObj> and returns it. When called from peekFirst or peekError, there is at most a single element inside it the vector. Using a vector here seems a bit wasteful. We can save this heap allocation by making peekFirst and peekError not rely on `peek`, and simply look at the next element from `_batch.objs` if available, or an empty BSONObj otherwise.
fromAggregationRequest builds a cursor from a Command reply, and repeated accesses the "cursor" field of the reply via `ret["cursor"].Obj()`. This will lead to repeated lookups of the "cursor" field in the reply, Those could be avoided by storing the result of `ret["cursor"].Obj()` in a local variable.
fromAggregationRequest also builds up an std::vector of owned BSONObj from the first batch of the Command reply. This vector is built up incrementally from copying the rows out of `ret["cursor"].Obj()["firstBatch"].Array()`. The `Array()` method here already builds an std::vector, but with BSONElements, The code then iterates over that vector only to store the owned copies of the BSONElements in yet another vector. We could save one vector here.
The built-up vector `firstBatch` is then finally passed to the ctor of DBClientCursor by value, so the ctor of DBClientCursor needs to create a copy of it. We should turn `firstBatch` into a prvalue here, so that the ctor can use move construction. All that's missing is an `std::move(firstBatch)` here.
The `_putBack` member of DBClientCursor currently is an `std::stack<BSONObj>`. `std::stack<BSONObj>` will use an `std::deque<BSONObj>` as its backing store here, because the underlying container type for `std::stack` defaults to `std::deque`.
Deques can be quite wasteful with memory allocations, and their provided functionality of accessing both ends of the container is not needed in case of the DBClientCursor. Instead, we only push to the end and pop from the end of the container. We could thus use an `std::vector` as the backing store of `_putBack`. `std::vector` may have a different growth strategy then `std::deque` though, so we potentially want to make an initial bulk allocation once the first element gets pushed into `_putBack`.

It is unclear if any of the changes suggested above matter for the performance of real-world workloads or not.

Given that client cursors have to deal with network requests which can be orders of magnitude more expensive than local response processing, the suggestions may turn out to be premature performance optimizations.

Assignee:: Jan Steemann
Reporter:: Jan Steemann
Participants:: Githook User, Jan Steemann
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Sep 23 2024 04:41:07 PM UTC
Updated:: Nov 15 2024 06:56:18 PM UTC
Resolved:: Nov 15 2024 06:17:12 PM UTC
Confidence Status Last Update:: 12/Nov/24 1:16 PM

Details

Description

Attachments

Activity

People

Dates