Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Execution
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Background

Consider a server change like ~~SERVER-91281~~ which wants to add a new option to the $sort stage to understand/accept a new option 'outputSortKeyMetadata: true/false'. This is challenging to roll out in the face of a rolling upgrade/downgrade.

If you want to use this new option for optimizations (for example, $setWindowFields would like to take advantage of this), you need to be careful to only do so when you know that any node (mongod) participating in the query will understand your request. A pipeline may need to be routed across the network in the face of sharded collections, and unknown options are typically rejected, which would fail the query.

Our previous solution to this problem was to either
(a) mistakenly conclude that because the router is the last node upgraded in the upgrade cycle, the router can send new options with the confidence that all other nodes will understand it. This is not correct, but hard to catch in tests. A query or sub-pipeline may be routed from one mongod to another in the case of a $lookup operation acting as a router on a shard to go find the base collection data. In this scenario, there is no guarantee which mongod version is sending the request, and which mongod version is receiving the request.
(b) Use an FCV-gated feature flag to check. This mostly works, or at least has been our answer historically. But it still raises the possibility of edge cases where one node checks the FCV and gets a different answer than another node. For this example, I don't think this poses a real problem, since a mixed-response FCV does imply the binary version is upgraded. But it is ... challenging to reason about.

Proposal

I was speaking with joan.bruguera-mico@mongodb.com about this and we think we can/should copy the approach done in SPM-4048 for a related problem. Namely, have the first router role participating in any operation be the resolver of any/all feature flags, and pass their resolved values across the network to any participating nodes.

In this approach, there is no room for races where nodes get different answers about a flag. It also solves the problems highlighted above.

Sadly, there is one last edge case: an operation can originate from a shard, if it's an internal operation. If originating from the shard, we cannot apply this logic:
> the router is the last node upgraded in the upgrade cycle, the router can send new options with the confidence that all other nodes will understand it.

To solve this, our best answer is: use the FCV to resolve feature flags when it is available (you are a data-bearing node), and default to latest if it is not available (you are a mongos router, which does not track FCV).

(Final note: I have not discussed the other FCV-motivation for such changes: catalog persistence of language features via views or collection validators. I won't discuss here but there are other ideas to improve or at least better document that edge case)

is related to

SERVER-91281 Allow $rank and $denseRank window functions to operate without a SortKeyPattern

Closed

SERVER-110020 Improve FCV README docs for query feature flags

Closed

SERVER-101825 Introduce VersionContext in ExpressionContext class for storing FCV snapshot for the lifetime of the query

Needs Scheduling

Assignee:: Unassigned
Reporter:: Charlie Swanson
Participants:: Charlie Swanson, Denis Grebennicov
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Aug 28 2025 07:56:32 PM UTC
Updated:: Sep 17 2025 03:28:47 PM UTC

Details

Description

Background

Proposal

Attachments

Issue Links

Activity

People

Dates