-
Type:
Investigation
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Tools and Replicator
-
2
This ticket adds structured diagnostics for 2dsphere index key-extraction failures. Downstream-visible changes:
- Error codes changed. Geo key-extraction failures now throw named codes GeoKeyExtractionFailed (510) and GeoKeyExtractionFailedTimeseries (511), replacing the previous unnamed assertion codes 16755 (regular 2dsphere) and 183934 / 183493 (timeseries). A new error category IndexKeyExtractionError was added. Per policy, moving unnamed assertion codes is non-breaking, but anything keying on 16755/183934/183493 for geo failures will now see 510/511.
- New writeError fields (additive). Failed geo inserts/index builds now carry failingPath, underlyingCode, underlyingReason, and failingElement on the writeError. Existing fields are unchanged; consumers that ignore unknown fields are unaffected.
- Error message wording changed. "Can't extract geo keys: ..." is now "Could not extract geo keys at path '...': ...", and the underlying reason string is length-bounded.
- validate output changed. For geo key-extraction failures, validate's res.errors now emits a bounded, self-contained entry ("Could not build key for index at path : ; see log 12565600 for the failing document"), and full per-document detail is logged under new LOGV2 id 12565600. Tooling/automation that parses res.errors should be aware.
The new behavior is not fcv-gated.
Description of Linked Ticket
SERVER-117104 covers validation failures that were previously indexable but no longer are. These fall into two main buckets:
1. Server-side changes (e.g. tightening of key-generation logic across mongod versions)
2. Platform changes (OS, CPU architecture)
It can be difficult to distinguish whether a validation failure falls in the first or second bucket (AF-16732 is a recent example). Specifically regarding the first class of failures, validate's diagnostic output is hard to interpret at scale. The most useful information is often truncated by LOGV2 size limits (this was the case for the AF-16732 investigation) or scattered across events.
These were the biggest pain points for AF-16732:
- Rejection reason is invisible. LOGV2 8411400 builds its message as "Can't extract geo keys: " + recordBson + " " + status.reason(). For records over ~10KB, LOGV2 truncates the document and the trailing reason. We see {{Location16755: Can't extract geo keys: { ... <truncated> }}} with no visible reason.
- Failing multikey sub-element is unrecoverable. The customer's index is multikey on features.geometry. getKeys throws on a specific features[i].geometry, but only the full top-level record is logged. With many sub-elements per record and a ~10KB limit, the failing sub-element is often outside the visible portion.
- Extra / missing keys aren't correlated with recordIds. validate reports extraIndexEntries: 40 but the affected records aren't surfaced. Mapping orphan keys back to records requires grepping adjacent LOGV2 events.
These would be great quality of life improvements that would help diagnosis:
1. Add failureReason (populated from ex.reason()) to LOGV2(8411400) and analogous validate per-record catch sites, separate from the record attribute.
2. In multikey paths, attach the failing leaf path and element to the throw (likely via an ErrorExtraInfo on Location16755) and surface them as failingPath / failingElement in the LOGV2.
3. Add recordId (and identifying keystring info for multikey) to entries in extraIndexEntries / missingIndexEntries.
This would be especially helpful in cases where we can't retrieve production documents from the customer. On AF-16732, the visible portion of each truncated record was well-formed, so I had to hypothesize the failing trigger and test candidate shapes in a sandbox - and still can't definitively confirm the customer's actual invalid shape. Separately, mapping the 40 reported extras back to specific records required grepping LOGV2 events. cc chris.kelly@mongodb.com
- depends on
-
SERVER-125656 Improve validate diagnostic output for unindexable documents
-
- Closed
-