Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- validate

Assigned Teams:

Storage Execution
Operating System:
ALL
Backport Requested:

v8.2, v8.1, v8.0, v7.3, v7.0, v6.0, v5.0
Sprint:
Execution Team 2024-06-10, Execution Team 2024-06-24, Execution Team 2024-07-08, Execution Team 2024-07-22, Brisvegas - 2025-05-27, GoodbyeRSSs - 2025-06-10, SESI - 2025-06-24
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Consider an index on {a: 1} and document {a: [NumberInt(5), NumberLong(5)]}. The two values for a are equivalent but have distinct types. This means that the two index keys generated for this document will be identical but with different type bits. But, since there can only be one index key (ignoring type bits) per record, one of these keys gets discarded. Thus we're left with just one index key with the type information from just one of the values – the type information from the other value gets lost. Query-wise this is okay since we don't allow multikey indexes to serve covered query plans. But it does have some implications when it comes to collection validation, in particular for data which was inserted on v4.4 or earlier and validated on 5.0+.

One implication is that collection validation may need to run both phases when data of this format is present. (Usually, when there is no index corruption then only the first phase is required.) This is because when we hash the index keys, we do so with the type bits included. The implication here is that collection validation ends up relying on the same type bits to be preserved every time index keys are generated for a given document. (In particular this changed in ~~SERVER-47349~~ between v4.4 and v5.0, but more generally this is not a property which should be relied upon.) Connecting this back to the example above, say the document's index keys originally preserved the type for NumberLong(5) upon insertion but the collection validation preserves the type for NumberInt(5) when generating the index keys. Subsequently the first phase of collection validation will report that there are index inconsistencies since they hash differently, requiring the second phase to run. This ends up not being reported to the user since it will be reconciled during the second phase, but it does require validation to do extra work.

The second implication is a caveat to the above in which we do end up falsely reporting an index inconsistency to the user. This requires us to fall into the case where we zero out buckets to remain under the memory limit. If this happens, the aforementioned reconciliation during the second phase cannot happen since one of the incorrectly-non-zero buckets has been cleared. Thus, the one non-zero bucket will end up getting reported as either an extra or a missing index entry. Note that this was previously exacerbated by ~~SERVER-86407~~ since we would incorrectly fall into this case much more easily than we should have.

is related to

SERVER-89911 Collection validation may fail to report index inconsistency on only type bits

Open

Assignee:: Unassigned
Reporter:: Gregory Noma
Participants:: Gregory Noma, Louis Williams
Votes:: 1 Vote for this issue
Watchers:: 13 Start watching this issue

Created:: Apr 25 2024 03:50:52 PM UTC
Updated:: Sep 30 2025 02:18:55 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates