Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Optimization
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Motivation

The FieldRef structure is used throughout the QO codebase, in places to modify field paths, but in places for simpler operations, like checking if a field path has multiple components.

Profiling shows this path as hot in code which uses it. Including visiting MatchExpressions (where the path is parsed to check various properties).

This was originally noticed in a benchmark for pipeline optimisation where a $match is pushed down past 200x $addFIelds stages.

It is possible to strategically defer FieldRef parsing in some cases, for example, by checking if path.find('.') -> then FieldRef parse in the code in the visitPathExpression.

However, such performance fixes are brittle. Instead, from my testing it looks like we can swap out the implementation for something simpler and faster

The current implementation uses a cached field length, a preallocated vector of string fields of field parts, a lazy computed final result and a subpath damage vector:

FieldIndex _cachedSize
boost::small_vector<optional<StringView>, 4> _parts
std::string _dotted
std::vector<std::string> _replacements

In my testing we can swap out the internals for a single flat char array while keeping the interface the same, and achieve 1x to 5x performance gains with no losses.

I've tested various scenarios of mutating parts of field paths of various lengths and number of segments and saw virtually no regression.

In the $match pushdown benchmark the fraction of time spent in visiting the match expression three fell by 21% (from 32% of all time to 25% of all time, shaving off 1ms).

Proposal

Replace the FieldRef implementation with a simper flat char array + index. This seems to perform better even on mixed workload benchmarks (add/remove/replace proportion of components then read) and the optimsiation benchmark above, likely due to better cache locality and auto-vectorisation by the compiler.

Additionally, it appears we can extract the index + read-only logic into a separate base class, again without breaking existing uses, while allowing new uses to avoid constructing a FiedlRef structure (copying the path) but still benefit from indexed access (get the first component etc.).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2026-01-12-10-46-27-532.png
1.30 MB
Jan 12 2026 10:46:32 AM UTC

related to

SERVER-115796 Measure the performance overhead of dependency analysis

Closed

Assignee:: Unassigned
Reporter:: Vesko Karaganev
Participants:: Vesko Karaganev
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jan 12 2026 11:21:21 AM UTC
Updated:: Mar 05 2026 11:10:28 AM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates