-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
ALL
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
- Background & Impact
When users mistakenly query the `_id` field with invalid types (array, regex, undefined), such as:
db.collection.find({_id: {$type: 'array'}})
db.collection.find({_id: {$type: 4}})*This can cause serious performance issues:*
- The query planner cannot use the `_id` index effectively because it cannot determine valid index bounds for types that can never exist
- This may result in a *collection scan (COLLSCAN)* instead of an index scan (IXSCAN)
- On large collections, this leads to *significant performance degradation* and increased resource consumption
Since the `_id` field can never contain Array, RegEx, or Undefined types (enforced at write time by `storage_validation.cpp`), such queries are meaningless and will always return empty results while potentially causing full table scans.
-
- Solution
Add parse-time validation to reject `$type` queries on the `_id` field when the requested type is invalid. This:
- Fails fast with a clear error message: `"The '_id' field cannot be queried by type <typename>"`
- Prevents users from accidentally running expensive queries
- Maintains consistency with write-time validation in `storage_validation.cpp::storageValidIdField()`
-
- Changes
- Added validation in `expression_parser.cpp::parseType()` to reject invalid types (Array, RegEx, Undefined) for `_id` field
- Returns `InvalidIdField` error code
- Added comprehensive unit tests in `expression_parser_test.cpp`
-
- Test Coverage
*Error cases:*
- `{_id: {$type: 'array'}}` / `{_id: {$type: 4}}`
- `{_id: {$type: 'regex'}}` / `{_id: {$type: 11}}`
- `{_id: {$type: 'undefined'}}` / `{_id: {$type: 6}}`
- `{_id: {$type: ['array', 'objectId']}}` (mixed valid/invalid)
*Success cases:*
- `{_id: {$type: 'string'}}` / `{_id: {$type: 'objectId'}}` / `{_id: {$type: 'number'}}`
- `{a: {$type: 'array'}}` (non-_id field)
- `{_id.a: {$type: 'array'}}` (nested path within _id)
- `{a._id: {$type: 'array'}}` (not top-level _id)