[SERVER-44050] Arrays along 'hashed' index key path are not correctly rejected Created: 16/Oct/19  Updated: 29/Oct/23  Resolved: 28/Oct/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 4.2.1
Fix Version/s: 3.6.15, 4.3.1, 3.4.24, 4.2.2, 4.0.14

Type: Bug Priority: Critical - P2
Reporter: David Storch Assignee: Arun Banala
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
related to SERVER-44571 Documents involved in SERVER-44050 co... Closed
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v4.2, v4.0, v3.6, v3.4
Sprint: Query 2019-11-04
Participants:
Linked BF Score: 20

 Description   
Issue Status as of Nov 25, 2019

ISSUE SUMMARY
Creating a hashed index on a field imposes the constraint that this field cannot contain an array. This constraint is not correctly enforced in some versions of the server if the hashed index is against a dotted field, and the array is present mid-path in the to-be-indexed document.

For example, a hashed index {"a.b": "hashed"} would incorrectly index documents having array at "a", instead of throwing an error and rejecting the write operation. Hashed indexes are typically only used to support a shard key, and validation on mongos prevents these invalid documents from being inserted or created via an update. But there are still plausible cases in which corruption of the hashed index may have occurred:

  • If a pre-existing collection is sharded with a hashed shard key against a dotted path (e.g. {"a.b": "hashed"} in the example above). This requires creating a hashed index on the collection. If the collection already contains documents which violate the array constraint, then those documents will be indexed incorrectly. MongoS validation will not be able to detect this, though future attempts to insert invalid documents will be rejected.
  • If the user created the hashed index on a single replica set, or bypassed the mongoS and wrote documents directly to mongoD in a sharded cluster, then any documents of the invalid form outlined above may be silently mis-indexed.

Users running on a sharded cluster who created their hashed index on an empty collection and who have not bypassed mongoS to write documents directly to a shard will not be affected by this issue.

USER IMPACT
The documents containing arrays along the index key path (except the terminal path component) will be indexed incorrectly and can lead to missing query results.

RECOVERY STEPS
Users can determine if their hashed indexes have been corrupted by this issue by running the validate command on the corresponding collection, after upgrading to a minor version that contains the fix.

To address the existing corruption, users will need to either delete all the illegal documents or update them such that the resulting documents no longer have an array at any point along the index path. Users can find documents which may have an illegal array using a {$type: 'array'} predicate. The documents identified by the {$type: 'array'} query should then be deleted or updated by _id.

Note that users can only update a shard key value on version 4.2. For 4.0 and older versions, users will have to delete the documents. Following deletion, the documents may be reformatted to eliminate the illegal array paths and then re-inserted.

AFFECTED VERSIONS
This issue affects all supported versions prior to 4.2.2, 4.0.14, 3.6.15, and 3.4.24.

FIX VERSION
The fix will be included in 4.2.2, 4.0.14, 3.6.16 and 3.4.24.

Original Description

Creating a hashed index on a field imposes the constraint that this field cannot contain an array:

 > db.c.drop()
 > db.c.createIndex({a: "hashed"})
 > db.c.insert({a: [1]})
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 16766,
		"errmsg" : "Error: hashed indexes do not currently support array values"
	}
})

This constraint is not correctly enforced if the hashed index is against a dotted field, and the array is present mid-path in the to-be-indexed document:

 > db.c.drop()
 > db.c.createIndex({"a.b": "hashed"})
 > db.c.insert({a: [{b: 1}]})
WriteResult({ "nInserted" : 1 }) // Instead of succeeding, this should result in an error!

The key generation implementation calls dotted_path_support::extractElementAtPath(), which returns an empty BSONElement if there is an array along the path. In downstream code, this empty BSONElement causes us to insert a null key into the index. The result is a corrupt index that can lead to missing query results:

 > db.c.find({"a.b": 1})
// This query should return the document, but it returns nothing!
 > db.c.dropIndexes()
{
	"nIndexesWas" : 2,
	"msg" : "non-_id indexes dropped for collection",
	"ok" : 1
}
 > db.c.find({"a.b": 1})
{ "_id" : ObjectId("5da76ea3365c4b34d3b15c76"), "a" : [ { "b" : 1 } ] }

Note that we get the correct query result only after dropping the corrupt index.

Although this is both an index corruption and a query correctness issue, the issue cannot be encountered when the hashed index is supporting the shard key – shard key fields cannot be arrays. The primary use case for hashed indexes is hashed sharding, so this may be an uncommon issue for hashed indexes that exist in the wild.

I have only tested 4.2.0 and a recent version of master, but I suspect that this bug affects all stable versions. The incorrect key generation code has not been substantially altered recently.



 Comments   
Comment by Githook User [ 30/Oct/19 ]

Author:

{'username': 'banarun', 'email': 'arun.banala@10gen.com', 'name': 'Arun Banala'}

Message: SERVER-44050 Arrays are not correctly rejected during key generation for 'hashed' indexes

(cherry picked from commit 888f7e6fc10ccb999be203b8cbad4dbe19d0a5d2)
Branch: v3.4
https://github.com/mongodb/mongo/commit/9034b668f90feb5a0f1ac9fd2a8714ecbe4cd057

Comment by Githook User [ 30/Oct/19 ]

Author:

{'name': 'Arun Banala', 'username': 'banarun', 'email': 'arun.banala@10gen.com'}

Message: SERVER-44050 Arrays are not correctly rejected during key generation for 'hashed' indexes

(cherry picked from commit 888f7e6fc10ccb999be203b8cbad4dbe19d0a5d2)
(cherry picked from commit ffda4b8dd699251f487596ff008133830a5ec392)
(cherry picked from commit ca240d5215ed88ad874c5355528710fb9d3eff37)
Branch: v3.6
https://github.com/mongodb/mongo/commit/d45258b1e76642f14c7a60b1a6c3bb9596cf5ae6

Comment by Githook User [ 29/Oct/19 ]

Author:

{'name': 'Arun Banala', 'username': 'banarun', 'email': 'arun.banala@10gen.com'}

Message: SERVER-44050 Arrays are not correctly rejected during key generation for 'hashed' indexes

(cherry picked from commit 888f7e6fc10ccb999be203b8cbad4dbe19d0a5d2)
(cherry picked from commit ffda4b8dd699251f487596ff008133830a5ec392)
Branch: v4.0
https://github.com/mongodb/mongo/commit/560a25757f56fdb93c018793e8731c2e50bda70b

Comment by Githook User [ 29/Oct/19 ]

Author:

{'name': 'Arun Banala', 'username': 'banarun', 'email': 'arun.banala@10gen.com'}

Message: SERVER-44050 Arrays are not correctly rejected during key generation for 'hashed' indexes

(cherry picked from commit 888f7e6fc10ccb999be203b8cbad4dbe19d0a5d2)
Branch: v4.2
https://github.com/mongodb/mongo/commit/c2f71b097c9d452dcec235725cf4b5e391ef4e56

Comment by Githook User [ 25/Oct/19 ]

Author:

{'name': 'Arun Banala', 'username': 'banarun', 'email': 'arun.banala@10gen.com'}

Message: SERVER-44050 Arrays are not correctly rejected during key generation for 'hashed' indexes
Branch: master
https://github.com/mongodb/mongo/commit/888f7e6fc10ccb999be203b8cbad4dbe19d0a5d2

Generated at Thu Feb 08 05:04:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.