-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Storage Execution
During the review of the SERVER-63251 changes, a few issues with how we check the filters for filtered indexes were noticed that should be cleaned up.
- The filter checking logic occurs in both IndexCatalogImpl and SortedDataIndexAccessMethod, but it should probably be the responsibility of only one of the two
- In particular ICI handles the insert path while SDIAM handles updates. (Removes are unfiltered, I think to ensure we clean up stale index keys if we modify the Matcher so that a document no longer matches)
- There is an extra filter check at the indexKeys/unindexKeys layer (renamed to (un)indexKeysOrWriteToSideTable in that patch) to handle a hybrid index building edge case
- Both of these have comments that say they are deferring to the IndexAccessMethod for filtering in the non-hybrid case, but there doesn't seem to be any check on that path.
- I think the check in indexKeys is fully redundant with the checks on the insert and update paths, so it can just be removed.
- The check in unindexKeys is redundant on updates, but probably needed for removes. I think the best option would be to remove the check from unindexKeys and conditionally check the filter on the remove path only if we are hybrid building (in which case we know there are no stale records to remove from a different version of the matcher.
- If we aren't checking the filter on steady-state removes, should we also not check the filter for the old document in updates?
- I think updates have the same issue that cause us not to filter on removes. You could even imagine an update that changes which keys get generated, but becomes a no-op because neither the old or new doc matches the filter, then when the document is removed, we remove the wrong keys from the index.
- Alternatively, with the Stable API work, maybe we don't need to bypass the filter on removes since we should be able to rely on the matcher semantics not changing. Also, now that MMAPv1 is gone, leaving stale index entries in the rare case where we changed the Matcher semantics is much less harmful. New storage engines don't reuse recordids (except for clustered collections), and they are required to be able to detect when a RecordId refers to a deleted record rather than just accessing bad memory (RecordIds in mmapv1 were basically just pointers).
- This comment is confusing and/or misleading
- "If the document applies to the filter (which means that it should have never been indexed), do not suppress the error."
- What does it mean for a document to "apply" to a filter? I my first guess would be that it matches, but the parenthetical implies that it means that it doesn't match because we would have index it if it did.
- The code suppresses the error if it does not match the filter or if there is no filter. I can't tell if that is correct or not. It seems suspicious that it doesn't follow the normal pattern of no filter being treated the same as a filter that matches every document.