Priority: Major - P3
Affects Version/s: None
Fix Version/s: Backlog
Component/s: Internal Code
Backport Requested:v4.4, v4.2
Sprint:Service Arch 2019-11-18, Service Arch 2019-12-02, Service Arch 2019-12-16, Service Arch 2019-12-30, Service Arch 2020-01-13, Service Arch 2020-02-24, Service Arch 2020-03-09, Service Arch 2020-03-23, Service Arch 2020-04-06, Service arch 2020-04-20, Service arch 2020-05-04
There have been a number of times where an invariant() in the query planner makes it easy for a client to crash a stable version of the server (see below). While detection of a bug in the query planner should definitely be operation-fatal, it does not need to be process-fatal. In fact, making the exposure of such bugs process-fatal creates security vulnerabilities, especially for systems like the atlas free tier, where any user can read and write data (which makes it quite easy to trigger these kinds of invariants).
None of the assertion macros available in the server today fit into the right box for the query planner on the two dimensions of "indicates a bug OR indicates user error/unexpected situation" and "is process fatal OR operation fatal." That is:
invariant() indicates a bug, but is process fatal.
fassert() indicates a situation the server is not prepared for (not a bug) and is process fatal.
uassert() indicates a user error and is operation fatal.
We should consider adding a nonFatalInvariant() to the server which, when triggered, indicates that there is a bug, but is only operation-fatal. Instead of calling abort(), it could throw an exception with a special code (something like ErrorCodes::ThereIsABug). This would require changing our various fuzzers to be aware of this new error code, and to fail when they encounter it.
It's worth mentioning that we're not suggesting all invariants() in the query system should be non-fatal. We're only arguing that a nonFatalInvariant() should be available to the query system and used for logic checks (e.g. were these index bounds built correctly?) and not for checks about the state of the system (e.g. is this lock held?).
Examples of invariants in the query system which are fairly trivial to reproduce: