Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-75922

Partial unique indexes created on MongoDB 4.0 can be missing index keys after upgrade to 4.2 and later, leading to uniqueness violations

    • Storage Execution
    • Fully Compatible
    • ALL
    • v7.0, v6.3, v6.0, v5.0, v4.4, v4.2
    • Execution Team 2023-04-17, Execution Team 2023-05-01

      Issue Status as of Dec 04, 2023

      ISSUE DESCRIPTION AND IMPACT

      Users who upgraded from MongoDB 4.0 to 4.2+ (featureCompatibility 4.2+) may experience index inconsistencies within unique partial indexes (unique indexes which specify a partialFilterExpression), in the form of missing index keys. Unique primary (_id) indexes are not affected.

      Affected versions of MongoDB incorrectly remove index entries from the unique partial index when all of the following take place:

      • prior to upgrading to FCV 4.2+, a document has a key in a unique partial index because it matches the index's partialFilterExpression.
      • prior to upgrading to FCV 4.2+, another document with a matching unique field value exists, but is not contained in the unique partial index (because it does not match the index's partialFilterExpression).
      • The un-indexed document is deleted after the upgrade to FCV 4.2+.

      When the un-indexed document is deleted, affected versions of MongoDB incorrectly delete the key for the indexed document.

      Missing index entries in unique partial index in v4.2+ can have the following effects:

      • Queries using the affected index may return incomplete results.
      • MongoDB will incorrectly allow the insertion of a new document that matches the partialFilterExpression, even though the insert should fail with a duplicate key error. As a result, queries that do not use an affected unique index may return documents with duplicate values that should not have been allowed.
      • Attempts to drop and rebuild the unique partial index may fail due to duplicate keys existing in the collection.

      DIAGNOSIS AND AFFECTED VERSIONS

      Affected Versions: 4.2.0, 4.4.0, 5.0.0, 6.0.0, 6.3.0-rc3
      Fixed Versions: 4.4.23, 5.0.19, 6.0.7, 7.0.0-rc3, 7.1.0-rc0

      Users who upgraded to v4.2 and are still running in FCV 4.0 are not impacted. Even after upgrading to a fixed version, users can still be impacted from missing index entries and documents with duplicate index keys.

      In FCV 4.2+, users who have collections that rely on unique indexes with partialFilterExpression may be impacted by this bug. Users can verify if a unique partial index is set by calling `getIndexes` on all the collections. Missing unique index entries can be checked by calling validate.

      REMEDIATION AND WORKAROUNDS

      User action is required in order to remediate this issue. Impacted users should follow the below steps:

      1. upgrade to a fixed version.
      2. Check for “missingIndexEntries” by running the validate() command.
        • If there are no missing index entries, no more remediation measures are needed.
        • If there are missing index entries, drop and rebuild the index

      Rebuilding the index may fail due to existing documents with duplicate keys in the collection.

      1. Build a new valid index.
      2. Query using that index to find the duplicate values in the collection
        Example:
        //bad index
        db.coll.createIndex({ “user” : 1 }, { “partialFilterExpression” : { “active” : true }, “unique” : true })
        
        //new index
        db.coll.createIndex({ “user” : 1, “foo” : 1 }, { “partialFilterExpression” : { “active” : true } })
        
        > db.coll.aggregate(
        ... [
        ...     {
        ...         $group: {
        ...                 _id: oldindex, 	//{"user": "x"}
        ...                 count: { $sum: 1 },
        ...             },
        ...     },
        ...     {
        ...         $match: {
        ...             count: { $gt: 1 }
        ...         }
        ...     },
        ... ],
        ... {hint: (newindex)}  			//"user_1_foo_1"
        ... )
        
        Returns:
        { "_id" : { "user" : "x" }, "count" : 2 }
        
        > db.coll.find({ "user" : "x" })
        { "_id" : 2, "user" : "x", "active" : true }
        { "_id" : 3, "user" : "x", "active" : true }
        
      3. The duplicate documents may be removed based on the application logic.
      4. Once the user action is taken for the affected documents, drop and recreate the unique index for all collections

      Original description

      The index format for unique indexes changed between MongoDB 4.0 and 4.2.

      • In MongoDB 4.0, the key portion of the index entry contains only the [KeyString blob of the indexed value]. The value portion of the index entry contains a list of RecordIds and the type bits of the indexed value. Outside of secondary oplog application when secondary unique index constraints are temporarily relaxed, the value portion of the index entry would be a list with exactly one RecordId.
      • In MongoDB 4.2 and later, the key portion of the index entry contains the combination of the [KeyString blob of the indexed value] + [the RecordId of the indexed document]. The value portion of the RecordId of the indexed document (redundantly) and the type bits of the indexed value.

      In FCV 4.2 and greater, a mongod writes the new format for index entries while still supporting the ability to read both formats. The in-place conversion incorrectly assumes that when the document is being unindexed the index entry in the 4.0 format can always be removed. This assumption results in documents having missing index keys in a very similar way to SERVER-28546. The indexing behavior around partial indexes is for SortedDataInterface::unindex() to be called and for the storage glue layer to tolerate how the document may never have been indexed. The in-place conversion should instead be checking the value portion of the 4.0 format's index entry, and, only if the RecordId matches the document being unindexed, proceed to remove the index entry (see also 052345f).

            Assignee:
            dianna.hohensee@mongodb.com Dianna Hohensee (Inactive)
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            1 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated:
              Resolved: