[SERVER-75922] Partial unique indexes created on MongoDB 4.0 can be missing index keys after upgrade to 4.2 and later, leading to uniqueness violations Created: 10/Apr/23  Updated: 22/Jan/24  Resolved: 27/Apr/23

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 4.2.0, 4.4.0, 5.0.0, 6.0.0, 6.3.0-rc3
Fix Version/s: 7.1.0-rc0, 6.0.7, 5.0.19, 4.4.23, 7.0.0-rc3

Type: Bug Priority: Critical - P2
Reporter: Max Hirschhorn Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
related to SERVER-85536 [4.4] removing unindexed unique parti... Closed
related to SERVER-51762 Delete code for old unique index format Closed
is related to SERVER-28546 Documents can erroneously be unindexe... Closed
is related to SERVER-76344 Support multiversion testing with the... Closed
is related to SERVER-32821 Support rolling upgrade to new unique... Closed
is related to SERVER-51762 Delete code for old unique index format Closed
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0, v6.3, v6.0, v5.0, v4.4, v4.2
Sprint: Execution Team 2023-04-17, Execution Team 2023-05-01
Participants:

 Description   
Issue Status as of Dec 04, 2023

ISSUE DESCRIPTION AND IMPACT

Users who upgraded from MongoDB 4.0 to 4.2+ (featureCompatibility 4.2+) may experience index inconsistencies within unique partial indexes (unique indexes which specify a partialFilterExpression), in the form of missing index keys. Unique primary (_id) indexes are not affected.

Affected versions of MongoDB incorrectly remove index entries from the unique partial index when all of the following take place:

  • prior to upgrading to FCV 4.2+, a document has a key in a unique partial index because it matches the index's partialFilterExpression.
  • prior to upgrading to FCV 4.2+, another document with a matching unique field value exists, but is not contained in the unique partial index (because it does not match the index's partialFilterExpression).
  • The un-indexed document is deleted after the upgrade to FCV 4.2+.

When the un-indexed document is deleted, affected versions of MongoDB incorrectly delete the key for the indexed document.

Missing index entries in unique partial index in v4.2+ can have the following effects:

  • Queries using the affected index may return incomplete results.
  • MongoDB will incorrectly allow the insertion of a new document that matches the partialFilterExpression, even though the insert should fail with a duplicate key error. As a result, queries that do not use an affected unique index may return documents with duplicate values that should not have been allowed.
  • Attempts to drop and rebuild the unique partial index may fail due to duplicate keys existing in the collection.

DIAGNOSIS AND AFFECTED VERSIONS

Affected Versions: 4.2.0, 4.4.0, 5.0.0, 6.0.0, 6.3.0-rc3
Fixed Versions: 4.4.23, 5.0.19, 6.0.7, 7.0.0-rc3, 7.1.0-rc0

Users who upgraded to v4.2 and are still running in FCV 4.0 are not impacted. Even after upgrading to a fixed version, users can still be impacted from missing index entries and documents with duplicate index keys.

In FCV 4.2+, users who have collections that rely on unique indexes with partialFilterExpression may be impacted by this bug. Users can verify if a unique partial index is set by calling `getIndexes` on all the collections. Missing unique index entries can be checked by calling validate.

REMEDIATION AND WORKAROUNDS

User action is required in order to remediate this issue. Impacted users should follow the below steps:

  1. upgrade to a fixed version.
  2. Check for “missingIndexEntries” by running the validate() command.
    • If there are no missing index entries, no more remediation measures are needed.
    • If there are missing index entries, drop and rebuild the index

Rebuilding the index may fail due to existing documents with duplicate keys in the collection.

  1. Build a new valid index.
  2. Query using that index to find the duplicate values in the collection
    Example:

    //bad index
    db.coll.createIndex({ “user” : 1 }, { “partialFilterExpression” : { “active” : true }, “unique” : true })
     
    //new index
    db.coll.createIndex({ “user” : 1, “foo” : 1 }, { “partialFilterExpression” : { “active” : true } })
     
    > db.coll.aggregate(
    ... [
    ...     {
    ...         $group: {
    ...                 _id: oldindex, 	//{"user": "x"}
    ...                 count: { $sum: 1 },
    ...             },
    ...     },
    ...     {
    ...         $match: {
    ...             count: { $gt: 1 }
    ...         }
    ...     },
    ... ],
    ... {hint: (newindex)}  			//"user_1_foo_1"
    ... )
     
    Returns:
    { "_id" : { "user" : "x" }, "count" : 2 }
     
    > db.coll.find({ "user" : "x" })
    { "_id" : 2, "user" : "x", "active" : true }
    { "_id" : 3, "user" : "x", "active" : true }
    

  3. The duplicate documents may be removed based on the application logic.
  4. Once the user action is taken for the affected documents, drop and recreate the unique index for all collections

Original description

The index format for unique indexes changed between MongoDB 4.0 and 4.2.

  • In MongoDB 4.0, the key portion of the index entry contains only the [KeyString blob of the indexed value]. The value portion of the index entry contains a list of RecordIds and the type bits of the indexed value. Outside of secondary oplog application when secondary unique index constraints are temporarily relaxed, the value portion of the index entry would be a list with exactly one RecordId.
  • In MongoDB 4.2 and later, the key portion of the index entry contains the combination of the [KeyString blob of the indexed value] + [the RecordId of the indexed document]. The value portion of the RecordId of the indexed document (redundantly) and the type bits of the indexed value.

In FCV 4.2 and greater, a mongod writes the new format for index entries while still supporting the ability to read both formats. The in-place conversion incorrectly assumes that when the document is being unindexed the index entry in the 4.0 format can always be removed. This assumption results in documents having missing index keys in a very similar way to SERVER-28546. The indexing behavior around partial indexes is for SortedDataInterface::unindex() to be called and for the storage glue layer to tolerate how the document may never have been indexed. The in-place conversion should instead be checking the value portion of the 4.0 format's index entry, and, only if the RecordId matches the document being unindexed, proceed to remove the index entry (see also 052345f).



 Comments   
Comment by Noopur Gupta [ 04/Dec/23 ]
Issue Status as of Dec 04, 2023

ISSUE DESCRIPTION AND IMPACT

Users who upgraded from MongoDB 4.0 to 4.2+ (featureCompatibility 4.2+) may experience index inconsistencies within unique partial indexes (unique indexes which specify a partialFilterExpression), in the form of missing index keys. Unique primary (_id) indexes are not affected.

Affected versions of MongoDB incorrectly remove index entries from the unique partial index when all of the following take place:

  • prior to upgrading to FCV 4.2+, a document has a key in a unique partial index because it matches the index's partialFilterExpression.
  • prior to upgrading to FCV 4.2+, another document with a matching unique field value exists, but is not contained in the unique partial index (because it does not match the index's partialFilterExpression).
  • The un-indexed document is deleted after the upgrade to FCV 4.2+.

When the un-indexed document is deleted, affected versions of MongoDB incorrectly delete the key for the indexed document.

Missing index entries in unique partial index in v4.2+ can have the following effects:

  • Queries using the affected index may return incomplete results.
  • MongoDB will incorrectly allow the insertion of a new document that matches the partialFilterExpression, even though the insert should fail with a duplicate key error. As a result, queries that do not use an affected unique index may return documents with duplicate values that should not have been allowed.
  • Attempts to drop and rebuild the unique partial index may fail due to duplicate keys existing in the collection.

DIAGNOSIS AND AFFECTED VERSIONS

Affected Versions: 4.2.0, 4.4.0, 5.0.0, 6.0.0, 6.3.0-rc3
Fixed Versions: 4.4.23, 5.0.19, 6.0.7, 7.0.0-rc3, 7.1.0-rc0

Users who upgraded to v4.2 and are still running in FCV 4.0 are not impacted. Even after upgrading to a fixed version, users can still be impacted from missing index entries and documents with duplicate index keys.

In FCV 4.2+, users who have collections that rely on unique indexes with partialFilterExpression may be impacted by this bug. Users can verify if a unique partial index is set by calling `getIndexes` on all the collections. Missing unique index entries can be checked by calling validate.

REMEDIATION AND WORKAROUNDS

User action is required in order to remediate this issue. Impacted users should follow the below steps:

  1. upgrade to a fixed version.
  2. Check for “missingIndexEntries” by running the validate() command.
    • If there are no missing index entries, no more remediation measures are needed.
    • If there are missing index entries, drop and rebuild the index

Rebuilding the index may fail due to existing documents with duplicate keys in the collection.

  1. Build a new valid index.
  2. Query using that index to find the duplicate values in the collection
    Example:

    //bad index
    db.coll.createIndex({ “user” : 1 }, { “partialFilterExpression” : { “active” : true }, “unique” : true })
     
    //new index
    db.coll.createIndex({ “user” : 1, “foo” : 1 }, { “partialFilterExpression” : { “active” : true } })
     
    > db.coll.aggregate(
    ... [
    ...     {
    ...         $group: {
    ...                 _id: oldindex, 	//{"user": "x"}
    ...                 count: { $sum: 1 },
    ...             },
    ...     },
    ...     {
    ...         $match: {
    ...             count: { $gt: 1 }
    ...         }
    ...     },
    ... ],
    ... {hint: (newindex)}  			//"user_1_foo_1"
    ... )
     
    Returns:
    { "_id" : { "user" : "x" }, "count" : 2 }
     
    > db.coll.find({ "user" : "x" })
    { "_id" : 2, "user" : "x", "active" : true }
    { "_id" : 3, "user" : "x", "active" : true }
    

  3. The duplicate documents may be removed based on the application logic.
  4. Once the user action is taken for the affected documents, drop and recreate the unique index for all collections
Comment by Githook User [ 26/May/23 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-75922 Fall back to old format partial index entry removal if an entry in the new format is not found in the index
Branch: v4.4
https://github.com/mongodb/mongo/commit/c10c746e53c3e7e3e705886a0817c8599d50a10e

Comment by Githook User [ 24/May/23 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-75922 Fall back to old format partial index entry removal if an entry in the new format is not found in the index

(cherry picked from commit 13bcde88db4ec79bbf63053c72bc60bb4ab424b5)
Branch: v6.0
https://github.com/mongodb/mongo/commit/a83608756952e7e167f88ab87bd2ca08535ed0b8

Comment by Githook User [ 24/May/23 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-75922 Fall back to old format partial index entry removal if an entry in the new format is not found in the index

(cherry picked from commit 13bcde88db4ec79bbf63053c72bc60bb4ab424b5)
Branch: v5.0
https://github.com/mongodb/mongo/commit/cc7a745a3277bbe76f986da965ad6a4cda466e6c

Comment by Githook User [ 24/May/23 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-75922 Fall back to old format partial index entry removal if an entry in the new format is not found in the index

(cherry picked from commit 13bcde88db4ec79bbf63053c72bc60bb4ab424b5)
Branch: v7.0
https://github.com/mongodb/mongo/commit/f9d3a607b6898023bcd988006769520b35d86429

Comment by Githook User [ 26/Apr/23 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-75922 Fall back to old format partial index entry removal if an entry in the new format is not found in the index
Branch: master
https://github.com/mongodb/mongo/commit/13bcde88db4ec79bbf63053c72bc60bb4ab424b5

Generated at Thu Feb 08 06:31:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.