-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
None
-
None
-
None
-
None
-
None
-
None
-
None
SERVER-117104 covers validation failures that were previously indexable but no longer are. These fall into two main buckets:
1. Server-side changes (e.g. tightening of key-generation logic across mongod versions)
2. Platform changes (OS, CPU architecture)
It can be difficult to distinguish whether a validation failure falls in the first or second bucket (AF-16732 is a recent example). Specifically regarding the first class of failures, validate's diagnostic output is hard to interpret at scale. The most useful information is often truncated by LOGV2 size limits (this was the case for the AF-16732 investigation) or scattered across events.
These were the biggest pain points for AF-16732:
- Rejection reason is invisible. LOGV2 8411400 builds its message as "Can't extract geo keys: " + recordBson + " " + status.reason(). For records over ~10KB, LOGV2 truncates the document and the trailing reason. We see {{Location16755: Can't extract geo keys: { ... <truncated> }}} with no visible reason.
- Failing multikey sub-element is unrecoverable. The customer's index is multikey on features.geometry. getKeys throws on a specific features[i].geometry, but only the full top-level record is logged. With many sub-elements per record and a ~10KB limit, the failing sub-element is often outside the visible portion.
- Extra / missing keys aren't correlated with recordIds. validate reports extraIndexEntries: 40 but the affected records aren't surfaced. Mapping orphan keys back to records requires grepping adjacent LOGV2 events.
These would be great quality of life improvements that would help diagnosis:
1. Add failureReason (populated from ex.reason()) to LOGV2(8411400) and analogous validate per-record catch sites, separate from the record attribute.
2. In multikey paths, attach the failing leaf path and element to the throw (likely via an ErrorExtraInfo on Location16755) and surface them as failingPath / failingElement in the LOGV2.
3. Add recordId (and identifying keystring info for multikey) to entries in extraIndexEntries / missingIndexEntries.
This would be especially helpful in cases where we can't retrieve production documents from the customer. On AF-16732, the visible portion of each truncated record was well-formed, so I had to hypothesize the failing trigger and test candidate shapes in a sandbox - and still can't definitively confirm the customer's actual invalid shape. Separately, mapping the 40 reported extras back to specific records required grepping LOGV2 events. cc chris.kelly@mongodb.com
- is related to
-
SERVER-107771 Catch all possible exceptions from index key generation in validate to prevent failing collection validation
-
- Closed
-